Get my PhD Thesis
Get my PhD Thesis
Get my PhD Thesis
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>PhD</strong> <strong>Thesis</strong><br />
Optimization of Densities in<br />
Hartree-Fock and Density-functional Theory<br />
Atomic Orbital Based Response Theory<br />
and<br />
Benchmarking for Radicals<br />
Lea Thøgersen<br />
Department of Chemistry<br />
University of Aarhus<br />
2005
"Experiments are the only means of knowledge at our disposal.<br />
The rest is poetry, imagination."<br />
Max Planck
Contents<br />
Preface .........................................................................................................................v<br />
List of Publications ....................................................................................................vii<br />
Part 1 Improving Self-consistent Field Convergence.................................................1<br />
1.1 Introduction .....................................................................................................................1<br />
1.2 The Self-consistent Field Method....................................................................................2<br />
1.3 A Survey of Methods for Improving SCF Convergence .................................................5<br />
1.3.1 Energy Minimization.............................................................................................6<br />
1.3.2 Damping and Extrapolation...................................................................................7<br />
1.3.3 Level Shifting......................................................................................................11<br />
1.4 Development of SCF Optimization Algorithms ............................................................12<br />
1.4.1 Dynamically Level Shifted Roothaan-Hall .........................................................13<br />
1.4.1.1 RH Step with Control of Density Change..............................................13<br />
1.4.1.2 The Trust Region RH Level Shift ..........................................................15<br />
1.4.1.3 DIIS and Dynamically Level Shifted RH ..............................................16<br />
1.4.1.4 Line Search TRRH.................................................................................18<br />
1.4.1.5 Optimal Level Shift without MO Information.......................................19<br />
1.4.1.6 The Trace Purification Scheme..............................................................23<br />
1.4.2 Density Subspace Minimization..........................................................................25<br />
1.4.2.1 The Trust Region DSM Parameterization..............................................25<br />
1.4.2.2 The Trust Region DSM Energy Function ..............................................26<br />
1.4.2.3 The Trust Region DSM Minimization ...................................................27<br />
1.4.2.4 Line Search TRDSM..............................................................................29<br />
1.4.2.5 The Missing Term..................................................................................30<br />
1.4.3 Energy Minimization Exploiting the Density Subspace .....................................32<br />
1.4.3.1 The Augmented RH Energy model........................................................33<br />
1.4.3.2 The Augmented RH Optimization .........................................................34<br />
1.4.3.3 Applications ...........................................................................................36<br />
1.5 The Quality of the Energy Models for HF and DFT .....................................................37<br />
1.5.1 The Quality of the TRRH Energy Model............................................................39<br />
1.5.2 The Quality of the TRDSM Energy Model.........................................................42<br />
1.6 Convergence for Problems with Several Stationary Points...........................................44<br />
1.6.1 Walking Away from Unstable Stationary Points ................................................46<br />
1.6.1.1 Theory....................................................................................................46<br />
1.6.1.2 Examples................................................................................................47<br />
i
1.7 Scaling .......................................................................................................................... 48<br />
1.7.1 Scaling of TRRH ................................................................................................ 49<br />
1.7.2 Scaling of TRDSM ............................................................................................. 51<br />
1.8 Applications.................................................................................................................. 51<br />
1.8.1 Calculations on Small Molecules ....................................................................... 52<br />
1.8.2 Calculations on Metal Complexes...................................................................... 54<br />
1.9 Conclusion .................................................................................................................... 56<br />
Part 2 Atomic Orbital Based Response Theory........................................................ 59<br />
2.1 Introduction................................................................................................................... 59<br />
2.2 AO Based Response Equations in Second Quantization .............................................. 60<br />
2.2.1 The Parameterization.......................................................................................... 60<br />
2.2.2 The Linear Response Function ........................................................................... 62<br />
2.2.3 The Time Development of the Reference State.................................................. 63<br />
2.2.4 The First-order Equation .................................................................................... 64<br />
2.2.5 Pairing................................................................................................................. 66<br />
2.3 Solving the Response Equations................................................................................... 68<br />
2.3.1 Preconditioning................................................................................................... 69<br />
2.3.2 Projections .......................................................................................................... 70<br />
2.4 The Excited State Gradient ........................................................................................... 71<br />
2.4.1 Construction of the Lagrangian .......................................................................... 71<br />
2.4.2 The Lagrange Multipliers ................................................................................... 72<br />
2.4.3 The Geometrical Gradient .................................................................................. 73<br />
2.4.4 The First-order Excited State Properties............................................................. 74<br />
2.5 Test Calculations........................................................................................................... 75<br />
2.6 Conclusion .................................................................................................................... 76<br />
Part 3 Benchmarking for Radicals............................................................................ 77<br />
3.1 Introduction................................................................................................................... 77<br />
3.2 Computational Methods................................................................................................ 77<br />
3.3 Numerical Results......................................................................................................... 79<br />
3.3.1 Convergence of CC and CI Hierarchies ............................................................. 79<br />
3.3.2 The Potential Curve for CN................................................................................ 80<br />
3.3.3 Spectroscopic Constants and Atomization Energy for CN................................. 81<br />
3.3.4 The Vertical Electron Affinity of CN................................................................. 82<br />
3.3.5 The Equilibrium Geometry of CCH ................................................................... 83<br />
3.4 Conclusion .................................................................................................................... 84<br />
ii
Summary....................................................................................................................87<br />
Dansk Resumé ...........................................................................................................89<br />
Appendix A................................................................................................................91<br />
Appendix B................................................................................................................93<br />
Acknowledgements....................................................................................................95<br />
References..................................................................................................................97<br />
iii
Preface<br />
The present <strong>PhD</strong> thesis is the outcome of four years of <strong>PhD</strong> studies at the Faculty of Science,<br />
University of Aarhus, Denmark.<br />
The thesis is divided into three distinct parts which can be read independently. Part 1 deals with the<br />
optimization of the one-electron density in Hartree Fock and density functional theory, and Part 2<br />
deals with atomic orbital based response theory for Hartree Fock and density functional theory. Part<br />
2 thus naturally follows after Part 1. In Part 3 benchmark results from FCI calculations on the<br />
radicals CN and CCH are given.<br />
The work presented in Part 1 has resulted in papers I - III as listed in the following List of<br />
Publications and the work presented in Part 3 has resulted in papers V – VI. The work presented in<br />
Part 2 was initialized in the fall 2004 and will result in paper IV. The development of improved<br />
optimization algorithms for self-consistent field calculations is the subject on which I have spent the<br />
most of <strong>my</strong> time, and Part 1 therefore makes up the larger part of this thesis.<br />
The work has been carried out under the supervision of and in collaboration with Dr. Jeppe Olsen<br />
and Professor Poul Jørgensen at the University of Aarhus. Some work was carried out during visits<br />
at The Royal Institute of Technology in Stockholm, Sweden, the University of Trieste, Italy and the<br />
University of Oslo, Norway. The following people have also contributed to the work presented in<br />
this thesis (see List of Publications): Paweł Sałek (The Royal Institute of Technology in<br />
Stockholm), Sonia Coriani (University of Trieste), Trygve Helgaker (University of Oslo), Stinne<br />
Høst (University of Aarhus), Danny Yeager (Texas A&M University), Andreas Köhn (University of<br />
Aarhus), Jürgen Gauss (University of Mainz), Péter Szalay (Eötvös Loránd University) and Mihály<br />
Kállay (University of Mainz).<br />
The outline of the thesis is as follows: Part 1 is based on the published papers I – II and the<br />
unpublished paper III, but can be read independently of the papers. Certain discussions in the papers<br />
I - II are left out of the thesis and only referred to, as they might as well be read in the papers. Other<br />
discussions not published in the papers are presented in this thesis, including the latest<br />
developments of the algorithms. Part 2 is simply paper IV in preparation. Part 3 is based on the<br />
published papers V – VI and is basically a short version of paper V combined with selected results<br />
from paper VI. Also this part can be read independently of the papers.<br />
v
List of Publications<br />
This thesis includes the following papers. Number I, II, V and VI have already been published and<br />
are attached this thesis, whereas III and IV are in preparation.<br />
Part 1<br />
I. The Trust-region Self-consistent Field Method: Towards a Black Box optimization in Hartree-<br />
Fock and Kohn-Sham Theories,<br />
L. Thøgersen, J. Olsen, D. Yeager, P. Jørgensen, P. Sałek, and T. Helgaker,<br />
J. Chem. Phys. 121, 16 (2004)<br />
II. The Trust-region Self-consistent Field Method in Kohn-Sham Density-functional Theory,<br />
L. Thøgersen, J. Olsen, A. Köhn, P. Jørgensen, P. Sałek, and T. Helgaker,<br />
J. Chem. Phys. 123, 074103 (2005)<br />
III. Augmented Roothaan-Hall for converging Densities in Hartree-Fock and Density-functional<br />
Theory,<br />
S. Høst, L. Thøgersen, P. Jørgensen and J. Olsen<br />
Part 2<br />
IV. Atomic Orbital Based Response Theory,<br />
L. Thøgersen, P. Jørgensen, J. Olsen and S. Coriani<br />
Part 3<br />
V. A Coupled Cluster and Full Configuration Interaction Study of CN and CN - ,<br />
L. Thøgersen and J. Olsen,<br />
Chem. Phys. Lett. 393, 36 (2004)<br />
VI. Equilibrium Geometry of the Ethynyl (CCH) Radical,<br />
P. G. Szalay, L. Thøgersen, J. Olsen, M. Kállay and J. Gauss,<br />
J. Phys. Chem. A 108, 3030 (2004).<br />
vii
Part 1<br />
Improving Self-consistent Field Convergence<br />
1.1 Introduction<br />
The Hartree-Fock (HF) self-consistent field (SCF) method has been around in an orbital formulation<br />
since 1951, where it was introduced by Roothaan 1 and Hall 2 , but today it is as significant as ever.<br />
Even though numerous higher correlated methods with superior accuracy have been developed<br />
since then, most of them still use the Hartree-Fock wave function as the reference function, and are<br />
thus still dependent on a functioning Hartree-Fock optimization. When Kohn and Sham 3 recognized<br />
in 1965 that the Roothaan-Hall SCF scheme had a lot to offer the density optimization in density<br />
functional theory (DFT), the DFT methods entered the chemical scene. Now it was in theory also<br />
possible to obtain results at the exact level from SCF calculations; if only the correct functional<br />
could be found. The developments in computer hardware and linear scaling SCF algorithms over<br />
the last decade have made it possible to carry out ab initio quantum chemical calculations on biomolecules<br />
with hundreds of amino acids and on large molecules relevant for nano-science.<br />
Quantum chemical calculations are thus evolving to become a widespread tool for use in several<br />
scientific branches. It is therefore important that the algorithms work as black-boxes, such that the<br />
user outside quantum chemistry does not have to be concerned with the details of the calculations.<br />
Since no scientific results neither from the higher correlated calculations nor from the large-scale<br />
calculations can be achieved if the SCF optimization does not converge, it is necessary to take an<br />
interest in developing a sound, stable optimization scheme that can handle the complexity in the<br />
problems of the future.<br />
This part of <strong>my</strong> thesis is a contribution to the quest for a black-box SCF optimization algorithm with<br />
optimal convergence properties. In Section 1.2, the basic Hartree-Fock/Kohn-Sham theory and<br />
notation of this part of the thesis is stated, and in Section 1.3 the efforts through the years to<br />
1
Part 1<br />
Improving Self-consistent Field Convergence<br />
improve the Roothaan-Hall SCF scheme are reviewed. Our contributions to the development of<br />
stable and physical sound SCF optimization schemes are presented in Section 1.4, and in Section<br />
1.5 we study the quality of the schemes when applied for HF and DFT. Optimization of problems<br />
with several stationary points is discussed in Section 1.6, in Section 1.7 the scaling of the algorithms<br />
is accounted for, and Section 1.8 contains some convergence examples for HF and DFT calculations<br />
using the algorithms presented in Section 1.4. Finally, Section 1.9 contains concluding remarks;<br />
reviewing the results of this part of the thesis.<br />
1.2 The Self-consistent Field Method<br />
In the following we consider a closed-shell system with N/2 electron pairs. The basic theory of the<br />
Hartree-Fock (HF) and the Kohn-Sham (KS) density optimizations will be described<br />
simultaneously, and the differences will be noted as they appear. Since we are interested in<br />
extending the algorithms presented to large scale calculations, a formulation without reference to<br />
the delocalized molecular orbitals (MOs) is essential, and thus the focus will be on the density in the<br />
atomic orbital (AO) basis rather than the MOs themselves. All through the thesis, SCF will be used<br />
as a general term for HF and KS-DFT methods since they have the SCF optimization scheme in<br />
common. The orbital index convention used in this thesis is i, j, k, l for occupied MOs, a, b, c, d for<br />
virtual MOs, p, q for MOs in general, and Greek letters µ, ν, ρ, σ for AOs.<br />
For closed-shell restricted Hartree-Fock or DFT, the electronic energy is given by<br />
E = 2TrhD + Tr DG( D) + h + E ( D ), (1.1)<br />
SCF nuc XC<br />
where h is the one-electron Hamiltonian matrix in the AO basis, h nuc is the nuclear-nuclear repulsion<br />
contribution, and D is the (scaled) one-electron density matrix in the AO basis, D = ½D AO , which<br />
satisfies the symmetry, trace, and idempotency conditions,<br />
D<br />
T<br />
Tr DS =<br />
= D<br />
N<br />
2<br />
DSD = D ,<br />
(1.2)<br />
of a valid one-electron density matrix. S is the AO overlap matrix. The elements of G(D) are given<br />
by<br />
∑<br />
∑<br />
G ( D ) = 2 g D −γ g D , (1.3)<br />
µν µνρσ ρσ µσρν ρσ<br />
ρσ<br />
ρσ<br />
where g µνρσ are the two-electron AO integrals. The first term in Eq. (1.3) represents the Coulomb<br />
contribution, and the second term is the contribution from exact exchange, with γ = 1 in Hartree-<br />
Fock theory, γ = 0 in pure DFT, and γ ≠ 0 in hybrid DFT. The exchange-correlation energy E XC (D)<br />
in Eq. (1.1) is a nonlinear and non-quadratic functional of the electronic density. This term is only<br />
2
The Self-consistent Field Method<br />
present in the energy expression for the DFT level of theory - the Hartree-Fock energy is expressed<br />
only by the first three terms of Eq. (1.1). The form of E XC depends on the DFT functional chosen for<br />
the calculation.<br />
The first derivative of the electronic energy with respect to the density is found as<br />
where<br />
(1) ∂ESCF<br />
( D)<br />
ESCF<br />
( D) = = 2 F( D)<br />
, (1.4)<br />
∂D<br />
1<br />
2<br />
(1)<br />
XC<br />
FD ( ) = h+ GD ( ) + E ( D )<br />
(1.5)<br />
is the Kohn-Sham matrix in DFT and, if the last term is excluded, the Fock matrix in Hartree-Fock<br />
(1)<br />
theory. From now on F(D) is simply referred to as the Fock matrix. E XC ( D ) is the first derivative<br />
of the term E XC expanded in the density.<br />
The Fock matrix is by design an effective one-electron Hamiltonian which is itself dependent on the<br />
eigenfunctions. Optimizing the electronic energy is thus a nonlinear problem and an iterative<br />
scheme must be applied. In 1951 Roothaan and Hall suggested an iterative procedure 1,2 in which a<br />
set of molecular orbitals (MOs) are constructed in each step through a diagonalization of the current<br />
Fock matrix, which in the AO formulation is written as<br />
FC = SCε , (1.6)<br />
where S is the AO overlap matrix, ε is a diagonal matrix containing the orbital energies, and the<br />
eigenvectors C contain the MO coefficients. The MOs, φ p , are linear combinations of a finite set of<br />
one-electron basis functions, χ µ , with C µp as expansion coefficients<br />
ϕ<br />
p<br />
= ∑ χ C . (1.7)<br />
µ<br />
µ µ p<br />
For the closed shell case the MOs can be divided into an occupied (φ occ ) and a virtual (φ virt ) part,<br />
where the occupied MOs each contain two electrons and the virtual orbitals are empty. If the aufbau<br />
ordering rule is applied, the occupied MOs are chosen as those with the lowest eigenvalues.<br />
A new trial density D can then be constructed from the occupied orbitals as<br />
occ<br />
T<br />
occ<br />
D = C C . (1.8)<br />
From this density a new Fock matrix can be evaluated from Eq. (1.5) and diagonalizing it according<br />
to Eq. (1.6) establishes the iterative procedure. The iterative cycle stops when self-consistency is<br />
obtained, that is, when the new density, energy or molecular orbitals do not change within some<br />
convergence threshold compared to the previous ones.<br />
3
Part 1<br />
Improving Self-consistent Field Convergence<br />
In an iterative scheme it is necessary to have a start guess. For the SCF case it should be a one<br />
electron density which fulfils Eq. (1.2), created directly or from a start guess of the molecular<br />
orbitals as in Eq. (1.8). Different approaches are used; a simple and easily applicable possibility is<br />
to obtain the starting orbitals by diagonalization of the one-electron Hamiltonian (H1-core). This is<br />
the start guess most widely used in this thesis since it is always available. Another popular<br />
possibility is to create a semi-empirical start guess where the orbitals resulting from a semiempirical<br />
calculation (e.g. Hückel) on the molecule are fitted to the current basis.<br />
n = n+1<br />
no<br />
D 0<br />
F(D n<br />
)<br />
F(D n<br />
) D n+1<br />
D n+1<br />
≈ D n<br />
yes<br />
The steps of the self-consistent field (SCF) scheme are summarized<br />
from the density point of view in Fig. 1.1: From a density matrix start<br />
guess a Fock matrix is constructed. From this Fock matrix a new density<br />
matrix can be found and so an iteration procedure is established which<br />
continues until self consistency. The step creating a new density from a<br />
Fock matrix will be referred to as the Roothaan-Hall (RH) step<br />
throughout this thesis, regardless if it is a diagonalization of the Fock<br />
matrix or some alternative scheme.<br />
The purpose of an SCF optimization is typically to find the global<br />
D conv<br />
minimum. Since the HF/KS equations are nonlinear, several stationary<br />
Fig. 1.1 Flow diagram of<br />
points might exist, and depending on the start guess and the<br />
the SCF scheme.<br />
optimization procedure, the converged result can be representing a local<br />
minimum as well as a global or even a saddle point. By evaluating the lowest Hessian eigenvalue it<br />
can be realized whether the stationary point is a minimum or a saddle point, but no simple test can<br />
reveal whether a minimum is global or not. The use of the term “convergence” in this thesis will<br />
simply refer to the iterative development from the start guess to a self-consistent density with a<br />
gradient below the convergence threshold. The issues connected with problems where several<br />
stationary points can be found are discussed in Section 1.6.<br />
Since Roothaan and Hall suggested the iterative diagonalization procedure as a means to solve the<br />
Hartree-Fock equations and Kohn and Sham suggested using the same scheme for optimizing the<br />
electron density for density functional theory 3 , the SCF methods have been used extensively in<br />
quantum chemistry. Unfortunately, it turned out that the simple fixed point scheme sketched in Fig.<br />
1.1 converges only in simple cases. Already around 1960 it was recognized that the method<br />
sometimes fails to converge and that divergent behavior in some cases is intrinsic 4,5 .<br />
4
A Survey of Methods for Improving SCF Convergence<br />
1.3 A Survey of Methods for Improving SCF Convergence<br />
Numerous suggestions have been made to improve upon the convergence of Roothaan and Hall’s<br />
original scheme or to replace it with an alternative scheme. The suggestions can be crudely divided<br />
into three different categories; energy minimization, damping/extrapolation, and level shifting.<br />
Furthermore the different suggestions in these categories have been combined in various ways. The<br />
two latter categories are modifications to the Roothaan-Hall scheme, whereas energy minimization<br />
is a means of avoiding the iterative diagonalization scheme and instead use some optimization<br />
scheme on an energy function.<br />
To <strong>my</strong> knowledge these categories embrace all convergence improvements suggested over the<br />
years, except for the method of fractionally occupying orbitals around the Fermi level 6 which does<br />
not fit in any of the categories. As mentioned, the start guess has a great impact on the optimization,<br />
and a poor start guess with the wrong electron configuration can use many iterations changing to a<br />
more optimal electron configuration and in some cases the proper electron configuration is never<br />
found and the calculation diverges. In the methods using fractional occupations, a number of<br />
orbitals around the Fermi level are allowed to have non-integral occupation. The non-integral<br />
occupations are determined from the Fermi-Dirac distribution which is a function of the<br />
temperature. The non-integral occupations are updated in each iteration, and corrected such that the<br />
total number of electrons is constant. During the optimization either the temperature is decreased to<br />
T = 0K or the number of orbitals allowed to have non-integral occupation is decreased, to have only<br />
integer occupations at the end of the optimization. It is thus possible to optimize the electron<br />
configuration in an effective manner in the beginning of the SCF optimization, and when the proper<br />
configuration has been found, the rest of the optimization has a better chance of convergence since<br />
the start guess in a way has been improved.<br />
In the following, the focus will be on the efforts to improve the convergence behavior of the SCF<br />
scheme through optimization algorithm development in the three categories listed above. Other<br />
efforts bear as much significance and should also be acknowledged, in particular should be<br />
mentioned the generalizations of many well-functioning schemes to the unrestricted level of theory<br />
which has its own challenges. Also the quest for construction of an improved start guess is<br />
important. It is obvious that with an improved start guess, less is demanded from the optimization<br />
method and thus some convergence problems inherent in the methods could be avoided. In the last<br />
decade the effort in SCF scheme development has for a large part been put in decreasing the scaling<br />
of the methods to allow calculations on larger molecules. Scaling is a very important subject and it<br />
should not be ignored. Section 1.7 will therefore discuss the scaling of the algorithms presented in<br />
5
Part 1<br />
Improving Self-consistent Field Convergence<br />
this thesis. Despite the importance of these three SCF related subjects, the rest of this section will be<br />
almost solely on efforts to improve convergence through optimization algorithm development.<br />
1.3.1 Energy Minimization<br />
One of the problems in the simple Roothaan-Hall procedure is the lack of guarantees for energy<br />
decrease in the iterative steps. This was pointed out by McWeeny, and he thus introduced a steepest<br />
descent procedure 7,8 as an energy minimization alternative to Roothaan and Hall’s repeated<br />
diagonalizations. Steepest descent optimizations have the benefit that a decrease in energy can be<br />
guaranteed for each step. McWeeny’s scheme suffers, however, from a slow convergence rate 5 as<br />
often seen for steepest descent methods. Fletcher and Reeves proposed the conjugate gradient<br />
optimization method 9 instead, which often is more efficient than steepest descent and is guaranteed<br />
to converge in a number of steps equal to the dimension of the problem.<br />
A decade later Hilliers and Saunders suggested an improvement to the McWeeny scheme called<br />
energy-weighted steepest descent 10 , in which the coordinates in the orbital space are energyweighted.<br />
In 1976 this work was generalized by Seeger and Pople. They realized that another<br />
problem in the simple Roothaan procedure is the possibility for discontinuous changes in the<br />
orbitals which do not necessarily lower the energy. To ensure energy descent it is necessary to be<br />
able to follow such changes continuously, and methods like the steepest descent have the possibility<br />
to do so. Their procedure proceeds in small steps, where the new occupied trial orbitals are selected<br />
based on a criterion of overlap with the previous set. This technique ensures stability and avoids<br />
switching of orbital occupation. The step is found by a univariate search 11 in the energy, on a path<br />
that passes through the point corresponding to the next iteration step of the classical procedure.<br />
Their scheme can therefore also be seen as a polynomial interpolation along a path joining<br />
successive SCF cycles. Half a decade later, Camp and King followed the same strategy of a<br />
univariant cubic fit technique 12 , but with a different parameterization. Stanton also suggested a<br />
similar approach 13 , but whereas the Seeger-Pople approach requires the evaluation of the Fock<br />
matrix at interior points on the interpolative path, Stanton’s scheme uses a cubic interpolation,<br />
where only the end point properties are needed, making it a less expensive method.<br />
Another way of improving the convergence properties is to evaluate the gradient and Hessian of the<br />
electronic energy analytically with respect to some variational parameter, and then optimize the<br />
energy through Newton-Raphson steps resulting in a quadratically convergent 14 scheme, at least in<br />
the region close to the optimized state where a second order approximation is reasonable. These<br />
methods are computationally very expensive since a four index transformation is required to obtain<br />
the Hessian information. In 1981 Bacskay proposed a quadratically convergent SCF (QC-SCF)<br />
method 15 which escapes the four index transformation while requiring four or five micro iterations<br />
6
A Survey of Methods for Improving SCF Convergence<br />
per step (in non-problematic cases), each of which is about as expensive computationally as<br />
building a Fock matrix. His method was inspired from single excitation configuration interaction<br />
(SX-CI) and multi-configurational SCF (MC-SCF). A possible divergence of the scheme can be<br />
overcome by moderating the orbital update step by the augmented Hessian method 16 or trust radius<br />
techniques 17 . Even though it is still quite expensive, the method is also used today for cases with<br />
convergence problems, since a decrease in energy can be ensured step by step and it has quadratic<br />
convergence properties near the optimized state.<br />
Around 1995, the interest for linear scaling SCF methods took on, since the development in<br />
computer hardware had made calculations on large molecules possible. With newly developed<br />
algorithms the evaluation of the Fock matrix, with the formal scaling of N 4 arising from the fourindex<br />
integrals, could now routinely be decreased to a near-linear scaling. The diagonalization with<br />
a N 3 scaling in standard Roothaan-Hall was now the bottle neck. Inspiration was found in tight<br />
binding theory 18-20 , where a number of linear scaling approaches had been suggested earlier 21 . To<br />
obtain linear scaling of the RH step it is necessary to avoid the diagonalization and to ensure<br />
sparsity in the matrices. This is a problem since the convenient canonical MO basis is inherently<br />
delocalized. Some of the well known schemes were reformulated in localized MOs 22 , while others<br />
developed strict AO formulations 20,23-25 . Most of the suggested linear scaling methods did not arise<br />
so much to improve convergence as to improve the scaling, and will therefore not be discussed in<br />
further detail.<br />
Very recently Francisco, Martínez and Martínez introduced their globally convergent trust region<br />
methods for SCF 26 , where the standard fixed-point Roothaan-Hall step is replaced by a trust region<br />
optimization of a model energy function. This algorithm has very nice features since it can be<br />
proved to be globally convergent, and the step sizes are controlled dynamically through a trust<br />
region update scheme. The convergence rate seems rather random though; sometimes perfect and<br />
sometimes hopeless, but only small test examples have been published, so time will show.<br />
1.3.2 Damping and Extrapolation<br />
In his SCF study of atoms, Hartree noted convergence difficulties and suggested a so-called<br />
damping scheme 27 as a modification to the iterative procedure. Instead of using the newly<br />
constructed density D n+1 , which corresponds to a full step, a linear combination of the new density<br />
matrix with the previous one is constructed<br />
damp<br />
Dn+ 1<br />
= Dn + λ( Dn+ 1 − Dn ) = λDn+<br />
1 + ( 1 −λ)<br />
D n , (1.9)<br />
7
Part 1<br />
Improving Self-consistent Field Convergence<br />
where λ – the damping factor - is a scalar chosen between zero and one. The iterative sequence is<br />
then continued with D damp as the new density. Hartree found that this scheme could force<br />
convergence in problematic cases.<br />
To get an idea of the effect of the damping factor, we consider a block-diagonal Fock matrix in the<br />
MO basis<br />
F<br />
MO<br />
⎛ εo<br />
Fov<br />
⎞<br />
= ⎜ ⎟ , (1.10)<br />
⎝Fvo<br />
εv<br />
⎠<br />
where ‘o’ denotes occupied, ‘v’ virtual and [ε o ] ij = δ ij ε i and [ε v ] ab = δ ab ε a . The change in electronic<br />
energy from the first order variation of the occupied orbitals through first-order perturbation theory<br />
is then given as<br />
virtual occupied 2<br />
( 1)<br />
−Fai<br />
SCF<br />
4<br />
a i<br />
εa<br />
− εi<br />
∆ E =<br />
∑ ∑ . (1.11)<br />
( )<br />
If this first order term is negative and sufficiently small such that the higher order contributions are<br />
insignificant, then a decrease in the electronic energy is seen. If the MOs obey the aufbau principle,<br />
then all ε i < ε a and it is clear that the term is negative as desired. The Hartree damping of Eq. (1.9)<br />
roughly corresponds to multiplying the numerator of Eq. (1.11) by the factor λ, which is positive<br />
and less than one<br />
virtual occupied 2<br />
( 1)<br />
−λFai<br />
SCF<br />
4<br />
a i<br />
εa<br />
− εi<br />
∆ E =<br />
∑ ∑ , (1.12)<br />
( )<br />
thus giving the opportunity to obtain a negative first order change of arbitrarily small magnitude,<br />
making the higher order terms insignificant. Though this would seem promising, the aufbau<br />
principle is seldom obeyed all through the optimization.<br />
If λ could be freely chosen, the damping technique would lead to an extrapolation scheme in the<br />
densities. Since SCF generates an iterative sequence where each step only depends upon the<br />
preceding, it was natural to apply the mathematical extrapolation methods (e.g. the Aitken<br />
extrapolation 28 procedures) on SCF to improve in particular the convergence rate close to the<br />
minimum. When the individual MO expansion coefficients are chosen as the extrapolated<br />
parameters, as Winter and Dunning Jr. 29 suggested, unphysical result may be obtained, though they<br />
can be corrected at the end of the calculation. Nielsen used instead the density matrix as the<br />
extrapolated parameter 30 and an eigenvalue extrapolation instead of the Aitken method. This led to a<br />
scheme more similar to Hartree damping, but with λ found within the eigenvalue extrapolation<br />
scheme.<br />
8
A Survey of Methods for Improving SCF Convergence<br />
Different approaches have been taken to dynamically find the damping factor λ. Zerner and<br />
Hehenberger 31 found it based on an extrapolation of the Mulliken gross population. Karlström 32<br />
expressed the electronic energy in the damped density E(D damp ) and used the first derivative with<br />
respect to λ, to choose in each iteration the λ that minimized the electronic energy.<br />
None of these schemes were very successful solving the convergence problems. They all had some<br />
particular problematic cases they could handle better than the predecessors, but in general they did<br />
not catch on. Pulay then suggested in the early 1980s to use the norm of a linear combination of<br />
error vectors e i from the individual iterations, where the vanishing of the error vector is a necessary<br />
and sufficient condition for SCF convergence. The norm is then optimized with respect to the<br />
coefficients c i<br />
n<br />
e ( c)<br />
= ∑ ciei<br />
, (1.13)<br />
where n is the number of previous iterations, and the coefficients are restricted to add up to 1<br />
n<br />
i=<br />
1<br />
i=<br />
1<br />
∑ ci<br />
= 1. (1.14)<br />
The resulting coefficients are used to construct a favorable linear combination of the previous Fock<br />
matrices<br />
n<br />
F = ∑ ciF i , (1.15)<br />
i=<br />
1<br />
which is diagonalized to obtain a new density, and so the iterative procedure is reestablished. This<br />
was the first density subspace minimization scheme that deliberately exploited the information<br />
obtained in the previous iterations and he named the approach DIIS 33 for “Direct Inversion in the<br />
Iterative Subspace”. For the special case of two matrices, the DIIS density corresponds to the<br />
damped density of Eq. (1.9), but with no restrictions on λ. A decade later the DIIS algorithm was a<br />
standard option in most ab initio programs and had effectively solved a number of the convergence<br />
problems. The orbital rotation gradient was typically used as the error vector for wave function<br />
optimizations, and Sellers pointed out 34 that the DIIS algorithm exploits the second-order<br />
information contained in a set of gradients to obtain quadratic convergence behavior. Some<br />
numerical problems were seen though, where numerical instabilities appeared because of linear<br />
dependencies in the space of error vectors. Sellers introduced the C2-DIIS method 34 , which is<br />
similar to DIIS except the restriction is on the squares of the coefficients<br />
n<br />
2<br />
∑ ci<br />
= 1 , (1.16)<br />
i=<br />
1<br />
9
Part 1<br />
Improving Self-consistent Field Convergence<br />
with a renormalization at the end. This gives an eigenvalue problem to be solved instead of the set<br />
of linear equations in normal DIIS, and thus singularities are more easily handled. However, one of<br />
the examples (Pd 2 in the Hyla-Kripsin basis set 35 ) given in ref. 34 , where DIIS supposedly diverges,<br />
converges for our plain DIIS implementation to 10 -7 in the energy in 14 iterations.<br />
Even though DIIS is successful, examples of divergence with no relation to numerical instabilities<br />
have been encountered over the years. In the year 2000 Cancès and Le Bris presented a damping<br />
algorithm named the Optimal damping Algorithm 36 (ODA) that ensures a decrease in energy at each<br />
iteration and converges toward a solution to the HF equations. In ODA the damping factor λ is<br />
found based on the minimum of the Hartree-Fock energy for the damped density in Eq. (1.9)<br />
E<br />
damp<br />
( Dn+<br />
1<br />
, λ) = E ( Dn ) + 2λTrF( Dn )( Dn+<br />
−Dn<br />
)<br />
HF HF 1<br />
2<br />
+ λ Tr ( D −D ) G( D − D ) + h ,<br />
n+ 1 n n+<br />
1 n nuc<br />
(1.17)<br />
much like Karlström did it in 1979. The damping factor is thus optimized in each iteration, hence<br />
the name of the algorithm.<br />
Recently Kudin, Scuseria, and Cancès proposed a method in which the gradient-norm minimization<br />
in DIIS is replace by a minimization of an approximation to the true energy function and they<br />
named it the energy DIIS (EDIIS) method 37 . Where the ODA used the energy expression of Eq.<br />
(1.17) to find the optimal λ, EDIIS uses an approximation of the Hartree-Fock energy for the<br />
averaged density<br />
n<br />
EDIIS 1<br />
n<br />
D = ∑ ciD i , (1.18)<br />
i=<br />
1<br />
( , ) = ∑ i SCF ( i ) −<br />
2 ∑ i j Tr( ( i − j ) ⋅( i − j ))<br />
i= 1 i, j=<br />
1<br />
n<br />
E Dc c E D c c F F D D , (1.19)<br />
where the sum of the coefficients c i is still restricted to 1. They combine the scheme with DIIS, such<br />
that the EDIIS optimized coefficients are used to construct the averaged Fock matrix if all<br />
coefficients fall between 0 and 1. If not, the coefficients from the DIIS scheme are used instead. The<br />
EDIIS scheme introduces some Hessian information not found in DIIS and thus improves<br />
convergence in cases where the start guess has a Hessian structure far from the optimized one. For<br />
non-problematic cases and near the optimized state EDIIS has a slower convergence rate than DIIS,<br />
but it has been demonstrated that EDIIS can converge cases where DIIS diverges.<br />
Recently, we suggested another subspace minimization algorithm along the same line as EDIIS, but<br />
with a smaller idempotency error in the energy model and the same orbital rotation gradient in the<br />
subspace as the SCF energy (the EDIIS energy model actually has a different gradient). We named<br />
it TRDSM 38 for trust region density subspace minimization since a trust region optimization is<br />
10
A Survey of Methods for Improving SCF Convergence<br />
carried out of the energy model in the subspace of previous densities. In the second paper on<br />
TRDSM 39 , a comparison with the EDIIS and DIIS models can be found stating explicitly that the<br />
EDIIS energy model does not have the correct gradient and is wrong for other reasons as well at the<br />
DFT level of theory.<br />
Many of the energy minimization techniques can be combined with a damping or extrapolation<br />
scheme to improve the convergence. Typically, DIIS has been the choice 24,40,41 , but TRDSM could<br />
be used just as well.<br />
1.3.3 Level Shifting<br />
In 1973 Saunders and Hillier introduced the level shift concept 42 . They suggested adding a positive<br />
scalar µ to the diagonal of the virtual-virtual block of the Fock matrix in the MO basis, Eq. (1.10),<br />
before diagonalizing<br />
MO<br />
MO<br />
( µ ( ) )<br />
F + I− D C = Cε , (1.20)<br />
where I is the identity matrix and D MO is the scaled one-electron density matrix in the MO basis<br />
with 1 in the diagonal of the occupied-occupied block and zeros for the rest.<br />
To compare level shifting with the damping scheme of Hartree 27 , consider the first order variation in<br />
the energy change as in Eq. (1.11); the level shift µ then corresponds to adding a positive constant to<br />
the denominator<br />
virtual occupied 2<br />
( 1)<br />
−Fai<br />
SCF<br />
4<br />
a i a i<br />
∆ E =<br />
∑ ∑ . (1.21)<br />
( ε − ε + µ )<br />
The level shift thus has, as the damping factor, the possibility to decrease the magnitude of the term.<br />
The problems with respect to the aufbau principle mentioned in connection with the damping can be<br />
overcome with the level shift. The level shift can separate the occupied orbitals from the virtuals<br />
and thereby ensure a positive denominator and an overall decrease in energy. As the level shift is<br />
increased towards infinity, the obtained decrease in energy will correspond to that of the steepest<br />
descent method as explained in Section 1.4.1.4, and thus the convergence will be slow. This<br />
connection between a large gap between the occupied and the virtual orbitals (HOMO-LUMO gap)<br />
and slow convergence was exploited by Bhattacharyya in 1978 to accelerate convergence for cases<br />
with large HOMO-LUMO gaps. His “reverse level shift” technique 43 uses a negative level shift<br />
instead of a positive, thus decreasing the gap and accelerating the convergence.<br />
In 1977, Carbó, Hernández and Sanz claimed unconditional convergence for an SCF process with a<br />
properly used level shift 44 , and two decades later, Cancès and Le Bris 45 made a formal proof that for<br />
11
Part 1<br />
Improving Self-consistent Field Convergence<br />
any initial guess D 0 , there exists a level shift µ 0 > 0 such that for level shift parameters µ > µ 0 , the<br />
energy decreases at each step and converges towards a stationary value.<br />
The level shift technique is still routinely used for cases where the DIIS scheme has problems. The<br />
level shifts are typically found on a trial and error basis. Recently, we advocated the use of a level<br />
shift to control the changes introduced in the Roothaan-Hall step 38 , and we suggested a way of<br />
optimizing the level shift at each iteration based on physical arguments and without guesswork. The<br />
algorithm is based on the trust region philosophy in which a model energy function is optimized,<br />
but restricted with respect to the step length. We thus named the algorithm trust region Roothaan-<br />
Hall (TRRH), even though it is not a true trust region optimization scheme like e.g. the energy<br />
minimization of Francisco, Martínez, and Martínez 26 or our TRDSM scheme 38 .<br />
Level shifting can be combined with a damping or extrapolation scheme. When the TRRH approach<br />
is combined with the subspace minimization method TRDSM it seems to outperform DIIS in<br />
stability and to have a better or similar convergence rate, as will be illustrated in the following<br />
sections. Combining level shifting with DIIS can occasionally be a benefit, but typically DIIS and<br />
level-shifting does not work well together, and in Section 1.4.1.3 we will try to justify this.<br />
1.4 Development of SCF Optimization Algorithms<br />
The SCF scheme as it typically looks today is sketched in Fig. 1.2. Compared to Fig. 1.1, the step <br />
is inserted, illustrating a density subspace minimization, where<br />
some function f is minimized with respect to the coefficients c i<br />
which expand the previous densities D i . The function f could<br />
be the gradient norm as in DIIS or some energy model<br />
D 0<br />
F(D n<br />
)<br />
n<br />
approximating the SCF energy in the subspace of the previous<br />
D = ∑ciDi,minf<br />
( c)<br />
densities as in EDIIS and TRDSM. In the Roothaan-Hall step<br />
i=<br />
1<br />
<br />
, the averaged Fock matrix F found from the optimization in<br />
n<br />
n = n+1 F =<br />
is then used instead of the most recent Fock matrix F(D n ) to<br />
∑ciF( Di)<br />
i=<br />
1<br />
find a new trial density D n+1 . In general, the averaged density<br />
matrix D is not idempotent and therefore does not represent a<br />
valid density matrix; moreover, since the Kohn-Sham matrix<br />
F D n+1 <br />
(unlike the Fock matrix) is nonlinear in the density matrix, the<br />
averaged Kohn-Sham matrix F is different from FD. ( ) For<br />
these reasons, the averaged Fock matrix F cannot be<br />
no<br />
D n+1<br />
≈ D n<br />
yes<br />
D conv<br />
associated uniquely with a valid Fock matrix. Usually, this<br />
Fig. 1.2 Flow diagram of the SCF<br />
does not matter much since the subsequent diagonalization of scheme including the density<br />
the Fock matrix nevertheless produces a valid density matrix subspace minimization step.<br />
12
Development of SCF Optimization Algorithms<br />
according to Eq. (1.8). The complications arising from the use of the averaged Fock matrix is<br />
disregarded in the following, noting that the errors introduced by this approach may easily be<br />
corrected for, if necessary.<br />
The rest of this part of the thesis will focus on the work we have done over the last couple of years<br />
to improve SCF convergence. We have made developments in all of the three categories of the<br />
previous section. The density subspace minimization scheme TRDSM and the level shift scheme in<br />
TRRH, both briefly described in the previous section, make up a total scheme we have named<br />
TRSCF, where each SCF iteration contains a TRDSM and a TRRH step. The first subsection will<br />
go into further detail on TRRH and will thus be concerned with our modifications to step in Fig.<br />
1.2. The second subsection will likewise go into further detail on TRDSM and will describe the<br />
scheme we apply in step . In the third subsection, a recently developed energy minimization<br />
procedure will be presented. The procedure merges step and integrating a subspace<br />
minimization in the optimization of a new trial density.<br />
This section will primarily take the Hartree-Fock point of view, acknowledging that with small<br />
adjustments and the word Fock replaced by Kohn-Sham, it would describe the DFT situation as<br />
well. In Section 1.5 the differences appearing when the algorithms are applied to the HF and DFT<br />
cases, respectively, will be discussed.<br />
1.4.1 Dynamically Level Shifted Roothaan-Hall<br />
The problems inherent to the RH diagonalization method are the discontinuous changes in the<br />
density and the lack of guarantees for energy decrease. To overcome these problems, we introduced<br />
in 2004 a means to restrict the RH step to the trust region of the RH energy model, with the purpose<br />
of both controlling the changes in the density and ensuring an energy decrease. Since then, the same<br />
ideas have been put forward by Francisco et. al. 26 as well, suggesting a trust region optimization of<br />
a RH energy model.<br />
In this section, our trust region Roothaan-Hall scheme and related subjects are discussed. In<br />
particular, we present two different schemes for dynamic level shifting and an alternative to<br />
diagonalization.<br />
1.4.1.1 RH Step with Control of Density Change<br />
The solution of the traditional Roothaan–Hall eigenvalue problem Eq. (1.6) may be regarded as the<br />
minimization of the sum of the energies of the occupied MOs 8,46<br />
RH<br />
subject to MO orthonormality constraints<br />
E<br />
∑<br />
( D) = 2 ε = 2TrF D (1.22)<br />
i<br />
i<br />
0<br />
13
Part 1<br />
Improving Self-consistent Field Convergence<br />
T<br />
occ occ = N<br />
C SC I , (1.23)<br />
where F 0 is typically obtained as a weighted sum of the previous Fock matrices such as F in Eq.<br />
(1.15). Since Eq. (1.22) represents a crude model of the true Hartree-Fock energy (with the same<br />
first-order term, but different zero- and second-order terms), it has a rather small trust radius. A<br />
global minimization of E RH (D), as accomplished by the solution of the Roothaan–Hall eigenvalue<br />
problem Eq. (1.6), may therefore easily lead to steps that are longer than the trust radius and hence<br />
unreliable. To avoid such steps, we shall impose on the optimization of Eq. (1.22) the constraint that<br />
the new density matrix D does not differ much from the old D 0 , that is, the S-norm of the density<br />
difference should be equal to a small number ∆<br />
2<br />
2<br />
D− D0 S<br />
= Tr ( D−D0 ) S( D− D0 ) S = − 2Tr D0SDS + N = ∆, (1.24)<br />
where N is the number of electrons – see Eq. (1.2) – and the S-norm used throughout this thesis is<br />
defined as<br />
2<br />
S<br />
A = Tr ASAS (1.25)<br />
for symmetric A. The optimization of Eq. (1.22) subject to the constraints Eq. (1.23) and Eq. (1.24)<br />
may be carried out by introducing the Lagrangian<br />
1<br />
T<br />
L = 2TrFD 0 −2µ<br />
( TrDSDS 0 − ( N −∆)<br />
) −2Trη( CoccSCocc<br />
−I N ) , (1.26)<br />
2<br />
where µ is the undetermined multiplier associated with the constraint Eq. (1.24), whereas the<br />
symmetric matrix η contains the multipliers associated with the MO orthonormality constraints.<br />
Differentiating this Lagrangian with respect to the MO coefficients and setting the result equal to<br />
zero, we arrive at the level-shifted Roothaan–Hall equations:<br />
( F − µ SD S) C ( µ ) = SC ( µ ) λ ( µ ). (1.27)<br />
0 0 occ occ<br />
Since the density matrix, Eq. (1.8), is invariant to unitary transformations among the occupied MOs<br />
in C occ ( µ ), we may transform this eigenvalue problem to the canonical basis:<br />
( F − µ SD S) C ( µ ) = SC ( µ ) ε ( µ ) , (1.28)<br />
0 0 occ occ<br />
where the diagonal matrix ε(µ) contains the orbital energies. Note that, since D 0 S projects onto the<br />
part of C occ that is occupied in D 0 (see ref. 46 ), the level-shift parameter µ shifts only the energies of<br />
the occupied MOs. Therefore, the role of µ is to modify the difference between the energies of the<br />
occupied and virtual MOs - in particular, the HOMO–LUMO gap.<br />
Clearly, the success of the trust region Roothaan–Hall (TRRH) method will depend on our ability to<br />
make a judicious choice of the level-shift parameter µ in Eq. (1.28). In our standard TRRH<br />
implementation, we determine µ by requiring that D(µ) does not differ much from D 0 in the sense of<br />
2<br />
14
Development of SCF Optimization Algorithms<br />
Eq. (1.24), thereby ensuring a continuous and controlled development of the density matrix from the<br />
initial guess to the converged one.<br />
1.4.1.2 The Trust Region RH Level Shift<br />
The constraint on the change in the AO density Eq. (1.24) refers to a change which may arise not<br />
only from small changes in many MOs but also from large changes in a few MOs or even in a<br />
single MO. To obtain a high level of control, we shall require that the changes in the individual<br />
new<br />
MOs are all small. Expanding the MOs ϕ i , obtained by diagonalization of Eq. (1.28), in the old<br />
MOs, we obtain<br />
occ<br />
virt<br />
new old new old old new old<br />
i = j i j + a i a<br />
j<br />
a<br />
∑ ∑ , (1.29)<br />
ϕ ϕ ϕ ϕ ϕ ϕ ϕ<br />
where the first summation is over the occupied MOs and the second over the virtual MOs. The<br />
new<br />
squared norm of the projection of ϕ i onto the MO space associated with D 0 is therefore<br />
orb old new<br />
i j i<br />
j<br />
2<br />
a = ∑ ϕ ϕ . (1.30)<br />
To ensure small individual MO changes in each iteration (to within a unitary transformation of the<br />
occupied MOs), we shall therefore require<br />
orb orb orb<br />
min<br />
min i<br />
i<br />
min<br />
a = a ≥ A , (1.31)<br />
orb<br />
where Amin<br />
is close to one (0.98 or 0.975 in practice). This way of controlling the changes in the<br />
density was also used by Seeger and Pople in their steepest descent method 11 .<br />
To illustrate how this scheme is used in practice, detailed<br />
information from the TRRH step in iteration 7 of a HF/6-31G and<br />
an LDA/6-31G calculation on the zinc complex depicted in Fig.<br />
1.3 is displayed in Fig. 1.4 and Fig. 1.5, respectively. In the upper<br />
orb orb<br />
panels is illustrated how a search for amin<br />
= Amin<br />
determines the<br />
optimal level shift µ for the TRRH step. The TRRH energy model<br />
is more accurate for HF than for DFT (see Section 1.5.1), and<br />
consequently larger changes can be handled in the TRRH step for Fig. 1.3 Zn 2+ in complex with<br />
orb<br />
ethylenediamine-N,N'-disuccinic<br />
HF than for DFT. A<br />
min<br />
is thus set to 0.975 for HF and 0.98 for<br />
acid (EDDS).<br />
DFT. In the lower panels is seen that the chosen level shifts avoid<br />
an increase in the energy which would have been the case if the Roothaan-Hall step was not level<br />
shifted (µ = 0). Notice also that an even lower energy would have been obtained by reducing the<br />
level shift, but then the restrictions on the overlap should be loosened, and this would result in<br />
15
Part 1<br />
Improving Self-consistent Field Convergence<br />
energy increase in other iterations. In short, the identification of µ from the overlap requirement<br />
a<br />
orb<br />
min<br />
orb<br />
min<br />
= A appears to be a good and secure way to control the step sizes in the optimization.<br />
orb<br />
a min<br />
1.0<br />
0.8<br />
orb<br />
A min = 0.975<br />
orb<br />
a min<br />
1.0<br />
0.8<br />
orb<br />
A min = 0.98<br />
0.6<br />
0.6<br />
0.4<br />
0.2<br />
0.0<br />
A<br />
0 2 4 6 8 10<br />
µ<br />
0.4<br />
0.2<br />
0.0<br />
A<br />
0 2 4 6 8 10<br />
µ<br />
40.0<br />
20.0<br />
RH<br />
∆E HF<br />
40.0<br />
20.0<br />
RH<br />
∆E LDA<br />
∆E / a.u.<br />
0.0<br />
-20.0<br />
-40.0<br />
RH<br />
∆E<br />
0 2 4 6 8 10<br />
µ<br />
Fig. 1.4 HF/6-31G, iteration 7. (A) The overlap<br />
orb<br />
RH<br />
a<br />
min<br />
and (B) the changes in the HF energy ∆ E HF<br />
RH<br />
and in the RH energy model ∆ E as a function of<br />
the level shift µ.<br />
B<br />
∆E / a.u.<br />
0.0<br />
-20.0<br />
-40.0<br />
∆E RH<br />
0 2 4 6 8 10<br />
µ<br />
Fig. 1.5 LDA/6-31G, iteration 7. (A) The overlap<br />
orb<br />
a<br />
min<br />
and (B) the changes in the LDA energy<br />
RH<br />
RH<br />
∆ E LDA<br />
and in the RH energy model ∆ E as a<br />
function of the level shift µ.<br />
B<br />
1.4.1.3 DIIS and Dynamically Level Shifted RH<br />
For accelerating the SCF convergence, DIIS is a simple and in general very successful scheme. We<br />
would expect to get an even better performance and improve the stability of the scheme if DIIS was<br />
combined with a dynamically level shifted RH step like TRRH instead of the standard RH with no<br />
control of the step. To investigate how a combination of DIIS and TRRH performs, we carried out a<br />
number of DIIS-TRRH optimizations. A typical example is seen in Fig. 1.7 and an extraordinary<br />
example is seen in Fig. 1.8.<br />
Fig. 1.6 Cd 2+ complexed with an<br />
imidazole ring.<br />
16
Development of SCF Optimization Algorithms<br />
Error in energy / E h<br />
1.E+02<br />
1.E+00<br />
1.E-02<br />
1.E-04<br />
1.E-06<br />
1.E-08<br />
DIIS<br />
DIIS-TRRH<br />
TRSCF<br />
0 5 10 15 20 25<br />
Iteration<br />
Fig. 1.7 LDA/STO-3G calculations with a H1-core<br />
start guess on the cadmium complex in Fig. 1.6.<br />
Error in energy / E h<br />
1.E+02<br />
1.E+00<br />
1.E-02<br />
1.E-04<br />
1.E-06<br />
TRSCF<br />
DIIS-TRRH<br />
DIIS<br />
0 5 10 15 20 25 30<br />
Iteration<br />
Fig. 1.8 LDA/STO-3G calculations with a Hückel<br />
start guess on the zinc complex in Fig. 1.3.<br />
Somewhat surprisingly the calculations rarely converge with the DIIS-TRRH method. To<br />
understand this behavior, we note that, in the global region, the TRRH method typically produces<br />
gradients that do not change much, even though large changes may occur in the energy. In such<br />
cases, the DIIS method may stall, not being able to identify a good combination of density matrices.<br />
This behavior is illustrated in Table 1-1, where the gradient norm and Kohn–Sham energy of the<br />
first six iterations of the cadmium complex calculations in Fig. 1.7 are listed.<br />
Table 1-1. The Gradient norm ||g||=||4(SDF-FDS)|| in the first six<br />
iterations of the cadmium complex calculations of Fig. 1.7.<br />
DIIS DIIS-TRRH TRSCF<br />
It. E KS ||g|| E KS ||g|| E KS ||g||<br />
1 -5597.0 7.8 -5597.0 7.8 -5597.0 7.8<br />
2 -5502.3 14.9 -5598.4 7.2 -5598.3 7.1<br />
3 -5602.1 9.7 -5600.3 8.5 -5603.7 9.3<br />
4 -5628.5 2.1 -5599.9 7.7 -5611.1 9.1<br />
5 -5627.4 3.5 -5599.9 7.8 -5616.8 7.7<br />
6 -5628.8 0.8 -5600.2 8.1 -5622.7 7.5<br />
conv no conv conv<br />
The TRSCF and DIIS-TRRH gradients stay almost the same during these iterations, stalling the<br />
DIIS-TRRH optimization but not the TRSCF optimization, whose energy decreases in each<br />
iteration. In the pure DIIS optimization, by contrast, the gradient changes significantly from<br />
iteration to iteration; at the same time, the energy decreases at each iteration except the second and<br />
fifth, where also the gradient norms increase. Eventually, DIIS enters the local region with its rapid<br />
rate of convergence although we note a sudden, large increase in the energy in iterations 10 and 11.<br />
However, these changes are accompanied with large increases in the gradient norm, allowing DIIS<br />
to recover safely.<br />
17
Part 1<br />
Improving Self-consistent Field Convergence<br />
In the example Fig. 1.8 standard DIIS diverges. TRSCF converges, but a minimum level shift of 0.1<br />
is used all through the calculation. When DIIS is combined with TRRH in this case, also using a<br />
minimum level shift of 0.1, it converges as well as TRSCF. Table 1-2 contains the gradient norm<br />
and Kohn-Sham energy of the first six iterations of the calculations in Fig. 1.8.<br />
Table 1-2. The gradient norm ||g||=||4(SDF-FDS)|| in the first six<br />
iterations of the zinc complex calculations of Fig. 1.8.<br />
DIIS DIIS-TRRH TRSCF<br />
It. E KS ||g|| E KS ||g|| E KS ||g||<br />
1 -2826.95 11.6 -2826.95 11.6 -2826.95 11.6<br />
2 -2745.49 24.0 -2830.11 3.3 -2830.06 3.4<br />
3 -2809.38 13.6 -2831.04 1.6 -2831.11 1.5<br />
4 -2819.16 9.7 -2831.44 0.8 -2831.42 1.1<br />
5 -2776.74 15.4 -2831.34 1.5 -2831.40 1.5<br />
6 -2826.55 7.0 -2831.41 1.5 -2831.47 0.9<br />
no conv conv conv<br />
In this case the gradient norms for the TRSCF calculation change significantly and a decrease in<br />
gradient relates directly to a decrease in the energy, where in the first example there were no direct<br />
connection between the gradient norm and the energy. The DIIS-TRRH calculation follows the<br />
same gradient behavior as TRSCF, just as in the first example, and they both converge. The DIIS<br />
gradient norm changes, but does not decrease as in the first example. There is still the connection<br />
between small gradients and low energies though, so why DIIS cannot find the proper directions in<br />
this case is not evident.<br />
In our experience DIIS should not be used in connection with a dynamic level shift scheme like<br />
TRRH, since for all but the simplest cases DIIS-TRRH diverged if DIIS converged. We<br />
encountered, however, the example in Fig. 1.8 where DIIS does not converge and DIIS-TRRH does,<br />
but it was the exception.<br />
1.4.1.4 Line Search TRRH<br />
In view of the relative crudeness of the E RH (D) model, a more robust approach for choosing the<br />
level shift µ than the one presented in Section 1.4.1.2 consists of performing a line search along the<br />
RH<br />
path defined by µ to obtain the minimum of the energy E SCF ( D ( µ )). Strictly speaking, this<br />
optimization is not a line search but rather a univariate search. A univariate search has previously<br />
been used by Seeger and Pople 11 to stabilize convergence of the RH procedure.<br />
For µ → ∞ Eq. (1.28) becomes equivalent to solving the eigenvalue equation<br />
0 0<br />
0 occ = occ<br />
SD SC SC η , (1.32)<br />
18
Development of SCF Optimization Algorithms<br />
where η has eigenvalues 1 for the set of orbitals that are occupied in D 0 and eigenvalues 0 for the<br />
set of virtual orbitals. Eq. (1.32) thus effectively divides the molecular orbitals into a set that is<br />
occupied and a set that is unoccupied. If D 0 is idempotent, it can be reconstructed from the occupied<br />
0<br />
set of eigenvectors C occ . If D 0 is not idempotent, a purification of D 0 is obtained<br />
( ) T<br />
occ<br />
idem 0 0<br />
0<br />
= occ<br />
D C C . (1.33)<br />
Since F 0 is the gradient of E(D 0 ), the step from Eq. (1.28) corresponding to a large µ is in the<br />
steepest descent direction, and will therefore give a decrease in the Hartree-Fock energy compared<br />
to the energy at D 0 . Thus a µ exists for which the energy decreases and a line search can then find<br />
the µ leading to the largest decrease in the energy. Using the same example as in Section 1.4.1.2,<br />
Fig. 1.9 and Fig. 1.10 illustrate how the optimal µ is chosen for the line search TRRH (TRRH-LS)<br />
algorithm. A simple search in the energy change for the RH step is carried out, where the energy<br />
change is found as<br />
( ) SCF ( )<br />
RH<br />
idem<br />
∆ E ( µ ) = E D( µ ) − E D , (1.34)<br />
SCF SCF<br />
0<br />
and the µ leading to the largest decrease in energy is chosen as marked on the figures.<br />
40.0<br />
20.0<br />
RH<br />
∆E HF<br />
40.0<br />
20.0<br />
∆E / a.u.<br />
0.0<br />
-20.0<br />
-40.0<br />
RH<br />
∆E<br />
0 2 4 µ 6 8 10<br />
Fig. 1.9 HF/6-31G, iteration 7. The changes in the<br />
RH<br />
HF energy ∆ E HF<br />
and in the RH energy model<br />
RH<br />
∆ E as a function of the level shift µ.<br />
∆E / a.u.<br />
0.0<br />
-20.0<br />
-40.0<br />
RH<br />
∆E LDA<br />
∆E RH<br />
0 2 4 µ 6 8 10<br />
Fig. 1.10 LDA/6-31G, iteration 7. The changes in<br />
RH<br />
the LDA energy ∆ E LDA<br />
and in the RH energy<br />
RH<br />
model ∆ E as a function of the level shift µ.<br />
The TRRH-LS algorithm thus ensures an energy decrease in the RH step, but is of course much<br />
more expensive than the standard method, requiring the repeated construction of the Fock matrix for<br />
a single RH step. However, the first derivative dE<br />
SCF dµ can be evaluated from the Fock matrix,<br />
RH<br />
and a cubic spline interpolation can thus be made from only two points on the ∆ E SCF<br />
curve.<br />
1.4.1.5 Optimal Level Shift without MO Information<br />
As seen from Eq. (1.29) the individual MOs are used to find a suitable level shift in the TRRH<br />
scheme. We are very much aware that this is the most import point to improve on in our scheme. To<br />
obtain this MO information, the cubically scaling diagonalization of the Fock matrix is necessary,<br />
19
Part 1<br />
Improving Self-consistent Field Convergence<br />
and furthermore the MO coefficient matrices C are inherently non-sparse. Several linear or nearlinear<br />
scaling alternatives to diagonalization have been suggested in the literature 18-20 . These<br />
methods could be reformulated with a dynamical level shift scheme like ours if the scheme could do<br />
without the MO information, but it is not an easy task to find a good dynamic level shift scheme<br />
with a high level of control without the knowledge of the developments in the individual MOs. The<br />
search used to find the level shift in TRRH-LS is directly applicable since it is not dependent on the<br />
MO information; the problem is only the number of Fock evaluations. The Fock evaluation is still<br />
expensive even though algorithms which make the evaluation of the Fock matrix cheaper are<br />
continually developed.<br />
This section describes a very recently developed approach to find the optimal level shift in the<br />
TRRH step without the use of individual MOs or knowledge of the HOMO-LUMO gap. So far it<br />
has proven to be the most successful level shift scheme we have studied. The scheme is build on the<br />
assumption that the TRRH step is taken in connection with a TRDSM step (or some other density<br />
subspace minimization method). In this case it can be exploited that TRDSM is a very good energy<br />
model (see Section 1.4.2.2) and can be trusted with the responsibility to find the best direction as<br />
long as not too much new information is introduced to the density subspace in each step.<br />
A new density, found by diagonalization of a level shifted Fock matrix or by some alternative, can<br />
be split in a part D ⊥<br />
that can be described in the previous densities and a part D with new<br />
information orthogonal to the existing subspace<br />
D can be expanded in the previous densities as<br />
⊥<br />
D( µ ) = D + D . (1.35)<br />
n<br />
<br />
D = ∑ωiDi<br />
, (1.36)<br />
i=<br />
1<br />
where n is the number of previously stored densities D i and the expansion coefficients ω i are<br />
dependent on µ and determined in a least-squares manner<br />
n<br />
−1<br />
ω i ( µ ) = ∑ ⎡⎣M ⎤⎦<br />
Tr D jSD( µ ) S, Mij = Tr DiSD jS . (1.37)<br />
j=<br />
1<br />
ij<br />
⊥<br />
It is obvious that when µ → ∞ then D → 0 since the new density then approaches the initial<br />
density D 0 , see Eq. (1.32) and (1.33), which belongs to the set of previous densities. Thus, there is a<br />
⊥<br />
connection between D and µ which we can exploit. If the ratio d orth ⊥ 2<br />
of the square norm D<br />
S<br />
2<br />
relative to D<br />
S<br />
is small, only small changes to the density subspace are introduced;<br />
20
Development of SCF Optimization Algorithms<br />
d<br />
orth<br />
⊥ 2<br />
S<br />
2<br />
S<br />
D<br />
⊥ ⊥<br />
Tr D SD S<br />
= = < δ , (1.38)<br />
D Tr DSDS<br />
⊥<br />
where δ is some small number and D can be found as D ⊥ = D−<br />
D . To illustrate how this is used<br />
in a dynamic level shift scheme, the examples from the previous sections are again seen in Fig. 1.11<br />
and Fig. 1.12.<br />
In the rest of the thesis the level shift scheme described in Section 1.4.1.2 will be referred to as the<br />
C-shift scheme since it involves the eigenvectors C from the diagonalization of the Fock matrix,<br />
and the level shift scheme described in this section will be referred to as the d orth -shift scheme. If<br />
nothing is mentioned about the level shift scheme, the C-shift is implied.<br />
1.0<br />
0.8<br />
A<br />
1.0<br />
0.8<br />
A<br />
d orth<br />
0.6<br />
0.4<br />
d orth<br />
0.6<br />
0.4<br />
0.2<br />
δ = 0.08<br />
0.2<br />
δ = 0.03<br />
0.0<br />
0 2 4 6 8 10<br />
µ<br />
0.0<br />
0 2 4 6 8 10<br />
µ<br />
40.0<br />
20.0<br />
RH<br />
∆E HF<br />
B<br />
40.0<br />
20.0<br />
RH<br />
∆E LDA<br />
B<br />
∆E / a.u.<br />
0.0<br />
-20.0<br />
-40.0<br />
RH<br />
∆E<br />
0 2 4 µ 6 8 10<br />
Fig. 1.11 HF/6-31G iteration 7. (A) The ratio d orth<br />
RH<br />
and (B) the changes in the HF energy ∆ E HF<br />
and in<br />
RH<br />
the RH energy model ∆ E as a function of the<br />
level shift µ.<br />
∆E / a.u.<br />
0.0<br />
-20.0<br />
-40.0<br />
RH<br />
∆E<br />
0 2 4 µ 6 8 10<br />
Fig. 1.12 LDA/6-31G iteration 7. (A) The ratio d orth<br />
RH<br />
and (B) the changes in the LDA energy ∆ E LDA<br />
and<br />
RH<br />
in the RH energy model ∆ E as a function of the<br />
level shift µ.<br />
The upper panels now display the search made in d orth , and it is clearly seen that d orth → 0 for µ → ∞<br />
as expected, and increases for µ → 0. As for the C-shift scheme we can allow larger changes in the<br />
HF method than in DFT, and thus δ is set to 0.08 for HF and 0.03 for DFT. In the lower panels are<br />
seen that this level shift avoids an increase in the energy just as the C-shift scheme, but the level<br />
shift chosen here is closer to the optimal line search level shift, and thus leads to a larger decrease in<br />
the energy than was the case for the C-shift scheme.<br />
21
Part 1<br />
Improving Self-consistent Field Convergence<br />
In the C-shift scheme seen in Eq. (1.31) the changes introduced are controlled compared to the<br />
previous density, whereas in the d orth -shift scheme the changes are controlled compared to the<br />
subspace of all the previous densities. This scheme is thus less restrictive than the C-shift scheme,<br />
but it seems that the C-shift scheme is too restrictive, ignoring the stability gained from the<br />
subspace information. To compare the overall effect of the two level shift schemes on the SCF<br />
convergence, calculations are given in Fig. 1.13 and Fig. 1.14, for HF and LDA, respectively. The<br />
HF calculations are on CrC with bond distance 2.00Å in the STO-3G basis and the LDA<br />
calculations are on the zinc complex seen in Fig. 1.3 in the 6-31G basis, both cases for which DIIS<br />
diverges. The starting orbitals have been obtained by diagonalization of the one-electron<br />
Hamiltonian (H1-core start guess).<br />
Error in energy / E h<br />
1.E+02<br />
1.E+00<br />
1.E-02<br />
1.E-04<br />
1.E-06<br />
TRSCF<br />
d orth -shift<br />
DIIS<br />
TRSCF<br />
C-shift<br />
Error in energy / E h<br />
1.E+02<br />
1.E+00<br />
1.E-02<br />
1.E-04<br />
1.E-06<br />
TRSCF<br />
d orth -shift<br />
DIIS<br />
TRSCF<br />
C-shift<br />
1.E-08<br />
0 4 8 12 16<br />
Iteration<br />
Fig. 1.13 SCF convergence for HF/STO-3G calculations<br />
on CrC.<br />
1.E-08<br />
0 8 16 24 32<br />
Iteration<br />
Fig. 1.14 SCF convergence for LDA/6-31G calculations<br />
on the zinc complex in Fig. 1.3.<br />
The only difference in the “TRSCF/d orth -shift” and the “TRSCF/C-shift” optimizations is the way<br />
the level shift is found in the TRRH step. Since DIIS diverges, the examples display the stability of<br />
the TRSCF algorithm, and the ability of the two level shifting schemes to handle problematic cases.<br />
In all examples studied so far, both problematic and simple, the d orth -shift has proven as good as or<br />
better than the C-shift. The cost of the level shift search process is similar in the two schemes; the<br />
matrix M in Eq. (1.37) is updated in each iteration as a part of TRDSM and is then reused for the<br />
d orth -shift scheme in TRRH.<br />
In Table 1-3 The SCF energy change in each iteration is divided in the part of the change obtained<br />
from the RH and DSM step, respectively, and it is seen how the RH step is now allowed to accept<br />
larger changes in the density, but still in a controlled manner, thus leading to larger decreases in the<br />
energy and improved convergence.<br />
22
Development of SCF Optimization Algorithms<br />
Table 1-3. The SCF energy change for each RH and DSM step<br />
in the TRSCF calculations in Fig. 1.13.<br />
C-shift<br />
d orth -shift<br />
It.<br />
RH<br />
DSM<br />
RH DSM<br />
∆ E HF ∆ E HF<br />
∆ E HF ∆ E HF<br />
2 -1.1768 0.0000 -1.3976 0.0000<br />
3 -1.8964 -3.8998 -4.1319 -4.5865<br />
4 -1.6764 -1.9603 -1.8021 -1.0448<br />
5 -0.3655 -1.7543 -0.2103 -0.1200<br />
6 -0.1881 -0.1624 -0.0111 -0.0463<br />
7 -0.0932 -0.1505 -0.0036 -0.0037<br />
8 0.0065 -0.0212 -0.0001 -0.0008<br />
9 -0.0039 -0.0154<br />
10 0.0002 -0.0009<br />
1.4.1.6 The Trace Purification Scheme<br />
The dynamic level shift scheme described in the previous section has no reference to the MO basis.<br />
This opens the possibility to replace the diagonalizations in the TRRH step with some alternative<br />
scheme without affecting the overall result.<br />
There have been many suggestions as to how the diagonalization can be replaced by a linear scaling<br />
algorithm 47 . The trace purification (TP) scheme 19,48 , however, is a simple and useful approach and it<br />
has thus been implemented in our SCF program in a local version of DALTON 38,49 . The trace<br />
purification scheme was originally formulated for tight binding theory by Palser and<br />
Manolopoulos 19 and later improved by Niklasson 48 , and is linear scaling when formulated in an<br />
orthogonal basis. The scheme uses the trace and idempotency properties of the density to iteratively<br />
find the new density from a suitable start guess constructed from the Fock matrix.<br />
Since the SCF optimization is formulated in the non-orthogonal AO basis to avoid the delocalized<br />
MO basis, it is necessary to transform the matrices to an orthogonal basis. This is done by a<br />
Cholesky decomposition 50 of the AO overlap matrix S<br />
T<br />
S = LL , (1.39)<br />
where L then is used to transform the Fock matrix to an orthogonal basis<br />
orth -1 −T<br />
F = L FL . (1.40)<br />
The density resulting from the trace purification scheme will also be in the orthogonal basis and<br />
should be transformed back as<br />
−T orth -1<br />
D = L D L . (1.41)<br />
Since the AO overlap matrix does not change during the optimization, the Cholesky decomposition<br />
and the inversion of L can be done once and for all in the beginning of the calculation.<br />
23
Part 1<br />
Improving Self-consistent Field Convergence<br />
F orth<br />
R<br />
λ min<br />
Estimate and<br />
for F orth<br />
λ max<br />
0<br />
orth<br />
( λ<br />
max<br />
I<br />
−<br />
F<br />
)<br />
=<br />
( λ<br />
−<br />
λ<br />
)<br />
max<br />
min<br />
1<br />
x n +1 = 2x n - x n<br />
2<br />
n = n + 1<br />
Tr Rn > N<br />
yes<br />
R<br />
n+ 1 =<br />
R<br />
2<br />
n<br />
no<br />
2<br />
n+ 1 = 2 n − n<br />
R R R<br />
x n +1<br />
no<br />
Tr Rn N ε<br />
+ 1 − <<br />
yes<br />
D orth = R n+1<br />
Fig. 1.15 Flow diagram for the trace purification (TP)<br />
scheme. N is the number of electrons.<br />
0<br />
x n +1 = x n<br />
2<br />
0 x n<br />
1<br />
Fig. 1.16 The purifying polynomials used in<br />
the trace purification scheme. The orange line<br />
is the McWeeny purification polynomial<br />
x n+1 = 3x n 2 – 2x n 3 .<br />
The trace purification is carried out by the Niklasson model with second order purification<br />
polynomials, and is schematized in Fig. 1.15. The initial density guess R 0 is obtained by<br />
normalizing the Fock matrix such that it only has eigenvalues between 0 and 1. To do this, the<br />
bounds for the Fock eigenvalues, λ min and λ max , must be found. They can be estimated using<br />
Gerschgorin’s theorem or the Lanczos algorithm for eigenvalues 51 with only a small extra<br />
computational cost. R is then iteratively purified, and the purification function applied in each<br />
iteration is chosen based on the trace of the matrix R, always keeping the direction towards the<br />
correct trace condition. The purification functions are sketched in Fig. 1.16 including the McWenny<br />
purification function 8 . One of the functions used in the scheme has a stationary point for x = 1 and<br />
the other has a stationary point for x = 0; depending of the function chosen we thus go towards a<br />
larger or smaller trace. When R fulfils the trace and/or idempotency conditions Eq. (1.2) of the one<br />
electron density within some threshold ε, the new density D orth = R has been found and the density<br />
to use in the next TRSCF iteration can be evaluated from Eq. (1.41).<br />
The number of purification iterations required to obtain a new density depends on the threshold ε.<br />
For the test calculations carried out so far, the threshold has been an error of 10 -7 in the trace, and<br />
the number of iterations ranges from 30 to 70 for a single RH step, with the typical number being<br />
closer to 30 than 70. Still, it is less expensive than the diagonalization as soon as more than a couple<br />
24
Development of SCF Optimization Algorithms<br />
of thousand basis functions are needed. The scaling of the TRRH step in general and the trace<br />
purification scheme in particular is illustrated and discussed in Section 1.7.1.<br />
1.4.2 Density Subspace Minimization<br />
The DIIS scheme seems to have been the overall most successful of all the suggestions on how to<br />
improve SCF convergence described in Section 1.3. DIIS was the first scheme to take advantage of<br />
the information contained in the densities and Fock matrices of the previous iterations, and this<br />
made the difference.<br />
This is also exploited in the EDIIS scheme by Kudin et. al. 37 in which an energy model is optimized<br />
with respect to the linear combination of previous densities. The density subspace minimization<br />
presented in this section is an improvement to EDIIS with a smaller idempotency error in the<br />
density, the correct gradient compared to SCF, and thus better convergence properties in both the<br />
local and global region of the optimization.<br />
1.4.2.1 The Trust Region DSM Parameterization<br />
After a sequence of Roothaan-Hall iterations, we have determined a set of density matrices D i and a<br />
corresponding set of Fock matrices F i = F(D i ). An improved density D and Fock matrix F should<br />
now be found as a linear combination of the previous n + 1 stored matrices. Taking D 0 as the<br />
reference density matrix, the improved density matrix can be written<br />
n<br />
= 0 +∑ ci<br />
i=<br />
0<br />
D D D , (1.42)<br />
which, ideally, should satisfy the symmetry, trace and idempotency conditions Eq. (1.2) of a valid<br />
one-electron density matrix. Whereas the symmetry condition is trivially satisfied for any such<br />
linear combination, the trace condition holds only for combinations that satisfy the constraint<br />
n<br />
i=<br />
0<br />
i<br />
∑ ci<br />
= 0 , (1.43)<br />
leading to a set of n + 1 constrained parameters c i with 0 ≤ i ≤ n. Alternatively, an unconstrained set<br />
of n parameters c i with 1 ≤ i ≤ n can be used, with c 0 defined so that the trace condition is fulfilled:<br />
c<br />
0<br />
n<br />
=−∑ c . (1.44)<br />
i=<br />
1<br />
i<br />
In terms of these independent parameters, the density matrix D becomes<br />
where we have introduced the notation<br />
D = D0 + D + , (1.45)<br />
25
Part 1<br />
Improving Self-consistent Field Convergence<br />
D<br />
+<br />
=<br />
n<br />
∑<br />
i=<br />
1<br />
c D<br />
i0<br />
D = D −D<br />
i0 i 0 .<br />
i<br />
(1.46)<br />
Unlike the symmetry and trace conditions in Eq. (1.2), the idempotency condition is in general not<br />
fulfilled for linear combinations of D i . Still, for any averaged density matrix D in Eq. (1.45) that<br />
does not fulfill the idempotency condition, we may generate a purified density matrix with a smaller<br />
idempotency error by the transformation 8<br />
D = 3DSD−2DSDSD. (1.47)<br />
Introducing the idempotency correction<br />
Dδ = D − D, (1.48)<br />
we may then write the purified averaged density matrix in the form<br />
D = D + D + D . (1.49)<br />
0 + δ<br />
1.4.2.2 The Trust Region DSM Energy Function<br />
Having established a useful parameterization of the averaged density matrix Eq. (1.45) and having<br />
considered its purification Eq. (1.47), let us now consider how to determine the best set of<br />
coefficients c i . Expanding the energy in the purified averaged density matrix, Eq. (1.49), around the<br />
reference density matrix D 0 , we obtain to second order<br />
T<br />
( ) ( ) ( ) (1) 1<br />
T<br />
D = D + D+ + D E + ( D+ + D ) E (2) ( D+<br />
+ D )<br />
E E δ δ δ . (1.50)<br />
SCF(2) SCF 0 0 2<br />
0<br />
To evaluate the terms containing<br />
(1)<br />
E<br />
0<br />
and<br />
(2)<br />
E<br />
0<br />
we make the identifications<br />
(1)<br />
0<br />
= 2 0<br />
2 2<br />
0 + = 2 + + +<br />
E F (1.51)<br />
( )<br />
( )<br />
E D F O D , (1.52)<br />
which follow from Eq. (1.4) and from the second-order Taylor expansion of about D 0 . The<br />
n<br />
notation Eq. (1.46) has now been generalized to the Fock matrix F+ = ∑ c<br />
i=<br />
1 iF i0<br />
. Ignoring the<br />
terms quadratic in D δ in Eq. (1.50) and quadratic in D + in Eq. (1.52), we then obtain the DSM<br />
energy<br />
DSM<br />
E () = ESCF ( 0 ) + 2Tr + 0 + Tr + + + 2Tr δ 0 + 2Tr δ +<br />
(1)<br />
E0<br />
c D DF DF DF DF. (1.53)<br />
Finally, for a more compact notation, we introduce the weighted Fock matrix<br />
n<br />
0 + 0 ci<br />
i0<br />
i=<br />
1<br />
and find that the DSM energy may be written in the form<br />
F = F + F = F +∑ F , (1.54)<br />
26
Development of SCF Optimization Algorithms<br />
DSM<br />
( ) ( )<br />
where the first term is quadratic in the expansion coefficients c i<br />
E c = E D + 2TrDδ<br />
F, (1.55)<br />
( ) SCF 0 0<br />
E D = E ( D) + 2TrDF + + TrDF, + +<br />
(1.56)<br />
and the second, idempotency-correction term is quartic in these coefficients:<br />
( )<br />
2TrDδ F = Tr 6DSD −4DSDSD −2D F . (1.57)<br />
The derivatives of E DSM (c) are straightforwardly obtained by inserting the expansions of F and D ,<br />
using the independent parameter representation. The expressions are given in Error! Reference<br />
source not found..<br />
The energy function E DSM (c) in Eq. (1.55) provides an excellent approximation to the exact SCF<br />
energy E SCF (c) about D 0 , with an error quadratic in D δ (see Section 1.5.2). The EDIIS energy model<br />
corresponds to the first term E( D ) in Eq. (1.55) and has thus an error linear in D δ .<br />
1.4.2.3 The Trust Region DSM Minimization<br />
The DSM energy, Eq. (1.55), is minimized with respect to the independent parameters c i with 1 ≤ i<br />
≤ n. The vector containing the parameters is initialized to zero c (0) = 0 such that D = D 0 , where D 0<br />
is chosen as the density matrix with the lowest energy E SCF (D i ), usually the one from the latest<br />
TRRH step. The minimization is then carried out by the trust region method 52 , taking a number of<br />
steps from the initial parameters c (0) to the final optimized parameters c* as illustrated in Fig. 1.17.<br />
c (0) = 0 c*<br />
c (1) c (2) c (3) ....<br />
Fig. 1.17 Steps in the trust region minimization of the DSM energy.<br />
We thus consider in each step the second-order Taylor expansion of the DSM energy in Eq. (1.55).<br />
Introducing the step vector<br />
( i+<br />
1) ( i)<br />
∆c = c −c , (1.58)<br />
we obtain<br />
E<br />
i<br />
( )<br />
DSM ( ) T 1 T<br />
(2)<br />
+ = E0<br />
+ +<br />
2<br />
c ∆c ∆c g ∆c H∆c , (1.59)<br />
where the energy, gradient, and Hessian at the expansion point are given by<br />
E<br />
DSM 2 DSM<br />
DSM ( i)<br />
∂E ( c) ∂ E ( c)<br />
= E ( c ), g = , H =<br />
∂c<br />
i<br />
∂c<br />
0 2<br />
c= c<br />
c=<br />
c<br />
() () i<br />
. (1.60)<br />
27
Part 1<br />
Improving Self-consistent Field Convergence<br />
DSM ( i)<br />
We then introduce a trust region of radius h for E ( c + )<br />
(2)<br />
∆c and require that steps are always<br />
taken inside or to the boundary of this region. To determine a step to the boundary, we restrict the<br />
step to have the length h in the S metric norm M <br />
n<br />
2 2<br />
S<br />
= ∑ ∆cM i ij∆ cj<br />
= h<br />
ij=<br />
1<br />
∆c . (1.61)<br />
In the unconstrained formulation defined by Eq. (1.44), the metric M of Eq. (1.37), is found as<br />
M = Tr DSDS−Tr DSDS− Tr DSDS+ Tr DSDS, i, j ≠ 0 , (1.62)<br />
ij i j i 0 0 j<br />
0 0<br />
Introducing the undetermined multiplier ν for the step-size constraint, we arrive at the following<br />
Lagrangian for minimization on the boundary of the trust region:<br />
L E h . (1.63)<br />
T T T 2<br />
( ∆c,<br />
ν ) = + ∆c g+ 1 ∆c H∆c − 1 ν ( ∆c M∆c − )<br />
0 2 2<br />
Differentiating this Lagrangian and setting the derivatives equal to zero, we obtain the equations<br />
∂L<br />
= g+ H∆c− ν M∆c = 0<br />
∂∆c<br />
(1.64)<br />
∂ L 1 T 2<br />
2 ( ∆c M∆c − h ) 0 .<br />
∂ν<br />
(1.65)<br />
The optimization of the Lagrangian thus corresponds to the solution of the following set of linear<br />
equations:<br />
H− M ∆c =−g<br />
(1.66)<br />
( ν )<br />
where the multiplier ν is iteratively adjusted until the step is to the boundary of the trust region Eq.<br />
(1.65). The step length restriction may be lifted by setting ν = 0 as needed for steps inside the trust<br />
region.<br />
To illustrate how the level shift parameter ν in Eq. (1.66) is determined, we consider in Fig. 1.18<br />
and Fig. 1.19 the third and fourth DSM step respectively, in iteration five of the HF/STO-3G<br />
calculation on CrC seen in Fig. 1.13. The step length ||∆c|| S is plotted as a function of ν. The plots<br />
consist of branches between asymptotes where ν makes the matrix on the left hand side of Eq.<br />
(1.66) singular. This happens whenever ν equals one of the Hessian eigenvalues. The lowest<br />
eigenvalue ω 1 of the Hessian H is found, and the level shift parameter is chosen in the interval -∞ <<br />
ν < min(0,ω 1 ). The proper value is found where the step length function crosses the line<br />
DSM<br />
representing the trust radius h, as marked in Fig. 1.18. If the step that minimizes E<br />
(2)<br />
is inside the<br />
trust region, ν = 0 is chosen as is the case in Fig. 1.19. The trust region is updated during the<br />
iterative procedure and therefore h is different in the two steps.<br />
28
Development of SCF Optimization Algorithms<br />
3<br />
3<br />
2<br />
2<br />
1<br />
h = 0.34<br />
1<br />
h = 0.44<br />
0<br />
-5 -2.5 0 2.5 5 7.5<br />
ν<br />
Fig. 1.18 The step length as a function of the<br />
multiplier ν in the third DSM step.<br />
0<br />
-5 -2.5 0 ν 2.5 5 7.5<br />
Fig. 1.19 The step length as a function of the<br />
multiplier ν in the fourth DSM step.<br />
Each of the trust region steps require the construction of the gradient g and the Hessian H in the<br />
density subspace, and the solution of the level shifted Newton equations Eq. (1.66). Since E DSM is a<br />
local model of the true energy function E SCF , it resembles E SCF only in a small region about the<br />
initial point c (0) . The DSM iterations are therefore terminated if the total step length after p iterations<br />
||c (p) – c (0) || S exceeds some preset value k. If a minimum of E DSM is found inside the trust region ||c (p)<br />
– c (0) || S < k, then the step ||c* - c (0) || S to the minimum is taken and the iterations are terminated. This<br />
is the typical situation.<br />
When the trust region minimization has terminated, an improved density matrix D can be<br />
constructed. However, to avoid the expensive calculation of the Fock matrix from D we use instead<br />
the averaged density matrix from eq. (1.45) and exploit that the Fock matrix is linear in the density<br />
for Hartree-Fock such that F( D ) is simply the averaged Fock matrix of Eq. (1.54). For DFT this is<br />
an approximation, but typically insignificant improvements are obtained by evaluating the correct<br />
Kohn-Sham matrix. The improved Fock matrix and density matrix then enters the TRRH step as F 0<br />
and D 0 , respectively.<br />
By construction E DSM (c) is lowered at each iteration of the trust region minimization. Since E DSM is<br />
a local model to the true energy E SCF , the lowering of E DSM will also lead to a lowering of E SCF<br />
provided the total step is sufficiently short and thus stays in the local region.<br />
1.4.2.4 Line Search TRDSM<br />
As in the TRRH step, the averaged density matrix D may also be determined by a line search and<br />
we denote this line search algorithm TRDSM-LS. Here, the line search is made in the direction<br />
defined by the first step c (1) of the TRDSM algorithm—that is, the step at the expansion point D 0 .<br />
As in the TRRH step, such a line search is guaranteed to reduce the energy. The first step is scaled<br />
by a parameter α,<br />
29
Part 1<br />
Improving Self-consistent Field Convergence<br />
tot<br />
(1)<br />
∆c = α ⋅ c (1.67)<br />
DSM<br />
and a search is made in ∆ E SCF<br />
to find the step ∆c tot that leads to the largest decrease in energy.<br />
E SCF (α) is found by evaluating the averaged density of Eq. (1.45) for the coefficients (c 0 + ∆c tot ),<br />
purifying it as in Eq. (1.32)–(1.33) and inserting it in the energy expression of Eq. (1.1). Then<br />
DSM<br />
∆ E SCF ( α)<br />
can be found as DSM<br />
∆ E ( α ) = E ( α ) − E ( D ). (1.68)<br />
SCF<br />
SCF SCF 0<br />
Fig. 1.20 and Fig. 1.21 illustrate the search in α, again for iteration seven of the HF and LDA<br />
calculations on the zinc complex in Fig. 1.3. For α = 0, no step is taken and hence no energy<br />
decrease is seen. For the marked choice of α, the optimal step length is obtained.<br />
0<br />
-5<br />
-10<br />
-15<br />
-20<br />
-25<br />
-30<br />
-35<br />
0 4 8 12 16 20<br />
α<br />
Fig. 1.20 Decrease in HF energy as a function of<br />
the step length α.<br />
0<br />
-5<br />
-10<br />
-15<br />
-20<br />
-25<br />
0 4 8 12 16 20<br />
α<br />
Fig. 1.21 Decrease in LDA energy as a function of<br />
the step length α.<br />
1.4.2.5 The Missing Term<br />
In the construction of the TRDSM energy model Eq. (1.55), the term of second order in the<br />
idempotency correction D δ was neglected from Eq. (1.50), since this term required a new Fock<br />
evaluation F(D δ ), which would increase the expenses of the scheme considerably. This section will<br />
be concerned with this neglected term and how a part of it can be described without the evaluation<br />
of a new Fock matrix, leading to an improved energy model for TRDSM at no considerable extra<br />
cost. The actual effect of this improvement to the energy model will then be discussed through a<br />
case study. This section will only be concerned with Hartree-Fock theory and examples, but it might<br />
equally well be done for DFT even though the improvement should be less significant since for<br />
DFT, also terms of order ||D + || 3 are neglected. These are of the same size as the neglected term<br />
quadratic in D δ . In Section 1.5.2 these errors are discussed.<br />
Since the only neglect in the DSM energy model Eq. (1.55) for Hartree-Fock is the term quadratic<br />
in D δ , and since the only term quadratic in the density is TrDG(D), the HF energy for the density D <br />
can be written as<br />
30
Development of SCF Optimization Algorithms<br />
( D) = ( D) + D F+<br />
D G( D )<br />
E HF E 2Tr δ Tr δ δ , (1.69)<br />
where E ( D ) is seen in Eq. (1.56). Even though a new Fock matrix h + G(D δ ) should be evaluated<br />
to describe the last term exactly, a part of the term can be described in the subspace of the previous<br />
densities.<br />
As exploited in the level-shift scheme Section 1.4.1.5, a density or density difference, in this case<br />
D δ , can be divided in a part that can be described in the subspace of the previous densities D <br />
δ<br />
and<br />
an unknown part orthogonal to the space<br />
D <br />
δ<br />
D<br />
⊥<br />
δ<br />
δ = <br />
δ<br />
+<br />
⊥<br />
δ<br />
D D D<br />
is expanded in the previous densities D i as<br />
. (1.70)<br />
D<br />
<br />
δ<br />
n<br />
= ∑ωiD<br />
i=<br />
0<br />
i<br />
, (1.71)<br />
where the expansion coefficients ω i are determined in a least-squares manner<br />
ω<br />
n<br />
i =<br />
−1<br />
⎡⎣<br />
⎤⎦<br />
Tr<br />
ij<br />
j=<br />
0<br />
j δ , Mij = Tr i j<br />
∑ M D SD S D SD S . (1.72)<br />
Inserting Eq. (1.70) for D δ in Eq. (1.69), an improved DSM energy model can be written<br />
DSM <br />
( c) = ( D) + D F+ ( D −D ) G( D )<br />
Eimp E 2Tr δ Tr 2 δ δ δ<br />
where only previous density and Fock matrices enter. The relation<br />
, (1.73)<br />
Tr AG( B) = Tr BG( A )<br />
(1.74)<br />
⊥ ⊥<br />
for symmetric matrices A and B is used and the term ( )<br />
Tr Dδ G D<br />
δ<br />
is neglected. A second order<br />
Taylor expansion of the improved DSM energy can then be made as in Eq. (1.59) and a trust region<br />
minimization carried out.<br />
To study the improvement to the energy function, two TRSCF calculations are carried out on the<br />
cadmium complex seen in Fig. 1.6 in the STO-3G basis and with a H1-core start guess. The<br />
convergence profiles of the calculations are displayed in Fig. 1.22, the one denoted “Improved<br />
TRDSM” is a TRSCF calculation just as the one denoted “TRSCF” with the only difference that the<br />
improved energy model in Eq. (1.73) is used for TRDSM instead of the one in Eq. (1.55). To<br />
illustrate the impact of the improvement in a single TRDSM step, a line search like the one in Fig.<br />
1.20 is made in iteration 7 of the same TRSCF calculation as in Fig. 1.22. Apart from displaying the<br />
change in SCF energy as a function of the step length α, also the DSM energy of Eq. (1.55) and the<br />
improved DSM energy of Eq. (1.73) are evaluated for the different choices of α, and their energy<br />
changes found as well.<br />
31
Part 1<br />
Improving Self-consistent Field Convergence<br />
Error in energy / E h<br />
1.E+02<br />
1.E+00<br />
1.E-02<br />
1.E-04<br />
1.E-06<br />
1.E-08<br />
TRSCF<br />
Improved TRDSM<br />
0 5 10 15 20<br />
Iteration<br />
Fig. 1.22 Convergence for the cadmium complex in<br />
Fig. 1.6, both for TRSCF with no improvements,<br />
and for TRSCF where E is used in TRDSM.<br />
DSM<br />
imp<br />
∆E / E h<br />
1.0<br />
0.0<br />
-1.0<br />
-2.0<br />
-3.0<br />
-4.0<br />
DSM<br />
∆E<br />
DSM<br />
∆E HF<br />
DSM<br />
∆E imp<br />
0 2 4 6 8 10 12<br />
α<br />
Fig. 1.23 TRDSM line search for iteration 7 in the<br />
TRSCF optimization Fig. 1.22. For different α in<br />
DSM<br />
Eq. (1.67), the changes in E<br />
HF<br />
, E DSM DSM<br />
and E<br />
imp<br />
compared to E HF (D 0 ) are found.<br />
It is seen in Fig. 1.23 that the improved DSM energy describes the HF energy better than the<br />
standard DSM energy does, just as expected. As the step moves away from the expansion point, the<br />
part of the energy which cannot be described in the old densities grows and both the DSM energy<br />
models become poor.<br />
The improvements presented in this section add complexity to the TRDSM algorithm, even though<br />
the computational cost is not significant. As seen in Fig. 1.22 and Fig. 1.23, the improvements to the<br />
TRSCF calculation are minor. The overall gain does not justify the extra complexity added to the<br />
TRDSM algorithm.<br />
1.4.3 Energy Minimization Exploiting the Density Subspace<br />
Section 1.3.1 describes how different approaches have been taken to avoid the diagonalization in<br />
the Roothaan-Hall step. Replacing the standard diagonalization of the Fock matrix can be done for<br />
the purpose of improving either the convergence properties or the scaling of the algorithm or for<br />
both reasons. With the purpose of improving both, a newly developed scheme is presented in this<br />
section, in which an energy minimization replaces the standard diagonalization in the SCF<br />
optimization.<br />
When the RH energy model is minimized, the density subspace information used with great success<br />
in TRDSM is ignored. The novel idea is thus to exploit the valuable information saved in the<br />
density subspace of the previous densities to construct an improved RH energy model and minimize<br />
this model instead of the RH model. This makes the TRDSM step redundant since a density<br />
subspace minimization now is included in the RH energy model minimization.<br />
The Hessian update methods 40,53 , in which an approximate Hessian is updated in each iteration and<br />
an approximate Newton step is taken, exploit some of the same ideas, but they are all based on<br />
32
Development of SCF Optimization Algorithms<br />
approximate second order energy expansions in the orbital rotation parameters and therefore do not<br />
include the third and higher order terms included in the RH energy.<br />
In the following subsections the improved RH energy model and its minimization will be described.<br />
The SCF convergence of a test case is then displayed, in which the new energy minimization<br />
approach is compared to standard DIIS and the TRSCF schemes. As the scheme has not yet been<br />
extended to DFT, this section will only consider HF theory and calculations.<br />
1.4.3.1 The Augmented RH Energy model<br />
If the Hartree-Fock energy, Eq. (1.1), is expanded through second order around some reference<br />
density D 0<br />
E ( D) = E ( D ) + 2TrF( D )( D− D ) + Tr( D−D ) G( D−D ) , (1.75)<br />
HF HF 0 0 0 0 0<br />
the first two terms are recognized as E RH (D) from Eq. (1.22) plus the terms of zeroth order E HF (D 0 )<br />
and - E RH (D 0 )<br />
( ) ( ) ( )<br />
RH<br />
RH<br />
E ( D) = E ( D) + E ( D ) − E ( D ) + Tr D−D G D−D . (1.76)<br />
HF HF 0 0 0 0<br />
In a standard RH step, the energy function to minimize is the RH energy, neglecting the last term<br />
which contains the Hessian information, because it is too expensive to evaluate. Since Hessian<br />
information is very valuable to an optimization, the scheme presented in this section will replace the<br />
diagonalization in the RH step by an energy minimization of an augmented RH (ARH) energy<br />
model, where as much Hessian information as possible is included without directly evaluating new<br />
Fock matrices. This is done by exploiting the information contained in the density and Fock<br />
matrices of the previous iterations.<br />
As previously exploited, a density or density difference, in this case ∆ = D – D 0 , can be split in a<br />
part that can be described in the subspace of the n + 1 previous densities ∆ and an unknown part<br />
orthogonal to the space<br />
⊥<br />
∆<br />
∆ is expanded in the previous densities D i as<br />
D− D = ∆ = ∆ + ∆<br />
0<br />
n<br />
i=<br />
0<br />
⊥<br />
. (1.77)<br />
<br />
∆ = ∑ωiDi<br />
, (1.78)<br />
where n is the number of previously stored densities and the expansion coefficients ω i are<br />
determined in a least-squares manner<br />
ω<br />
n<br />
i =<br />
−1<br />
⎡⎣<br />
⎤⎦<br />
Tr<br />
ij<br />
j=<br />
0<br />
j , Mij = Tr i j<br />
∑ M D S∆S D SD S . (1.79)<br />
33
Part 1<br />
Improving Self-consistent Field Convergence<br />
⊥ ⊥<br />
Inserting Eq. (1.77) in the last term of Eq. (1.76) and neglecting the term Tr ∆ G ( ∆ ) , the<br />
augmented Roothaan-Hall energy model can be written as<br />
( ) ( ) ( )<br />
ARH ( ) RH ( ) ( ) RH<br />
<br />
E D = E D + EHF D0 − E ( D0 ) + Tr 2∆−∆ G ∆ , (1.80)<br />
where G ( ∆ ) is evaluated as a linear combination of previous Fock matrices<br />
n<br />
<br />
( ) ∑ωi ( i ) ∑ωi ( i )<br />
G ∆ = G D = ( F D − h ). (1.81)<br />
i= 1 i=<br />
1<br />
The energy model E ARH has no intrinsic restrictions with respect to how different the densities<br />
spanning the subspace are allowed to be, and this is one of the benefits compared to the TRSCF<br />
scheme. For the TRDSM energy model, the purification implicit in the DSM energy makes no sense<br />
if the densities are too different, in particular if they have different electron configurations. In ARH,<br />
configuration shifts can be handled without problems, and whereas old, obsolete densities pollute<br />
the DSM energy model, they simply disappear from the ARH energy model, since their weights ω i<br />
diminish.<br />
We expect a faster convergence rate for ARH compared to TRSCF, mainly because the RH and<br />
DSM steps are merged to an energy model with correct gradient (not just in the subspace) and an<br />
approximate Hessian, which is improved in each iteration using the information from the previous<br />
density and Fock matrices.<br />
1.4.3.2 The Augmented RH Optimization<br />
The density for which the ARH energy model should be optimized can be expanded in the antisymmetric<br />
matrix X<br />
n<br />
D ( X () () () ()<br />
) = exp 1<br />
( − XS ) D i 0 exp ( SX ) = D i ⎡ i 0<br />
+<br />
0 , ⎤ + ⎡⎡ i<br />
2 0<br />
, ⎤ , ⎤<br />
⎣ D X ⎦ ⎣ ⎦<br />
+<br />
⎣<br />
D X X ⎦<br />
, (1.82)<br />
() i<br />
S S S<br />
where D<br />
0<br />
is the reference density from which the step X is taken. Optimizing the ARH energy is<br />
thus a nonlinear problem and an iterative scheme should be applied.<br />
A Newton-Raphson (NR) optimization of the ARH energy is therefore carried out, and the steps are<br />
ARH<br />
found minimizing a second order approximation of the ARH energy E<br />
(2)<br />
by the preconditioned<br />
conjugate gradient (PCG) method. The second order approximation of the ARH energy, where the<br />
constant terms are excluded, can be written as<br />
34
Development of SCF Optimization Algorithms<br />
E<br />
where<br />
() i<br />
() i<br />
( X)<br />
= 2Tr F0 ⎡<br />
0<br />
, ⎤ + Tr ⎡<br />
0<br />
⎡<br />
0<br />
, ⎤ , ⎤<br />
⎣<br />
D X<br />
⎦<br />
F<br />
⎣⎣ D X<br />
⎦<br />
X<br />
⎦<br />
ARH<br />
(2) S S S<br />
() i<br />
(1) (2)<br />
( D0<br />
D0<br />
) ∑( ωi<br />
ωi<br />
) G( Di<br />
)<br />
+ 2Tr − +<br />
i, j=<br />
1<br />
n<br />
i=<br />
1<br />
n<br />
n<br />
() i<br />
(0) (1) () (0)<br />
0 S ⎣ 0 S ⎦<br />
i= 1<br />
S<br />
i=<br />
1<br />
i<br />
∑( ωi ωi ) ( i ) ⎡<br />
⎤ ∑ωi<br />
( i )<br />
+ 2Tr ⎡ , ⎤ Tr ⎡ , ⎤<br />
⎣<br />
D X<br />
⎦<br />
+ G D +<br />
⎣<br />
D X<br />
⎦<br />
, X G D<br />
n<br />
∑<br />
( ) ⎤DG i ( Dj<br />
)<br />
(0) (1) (2) (1) (1)<br />
j i i i j<br />
− Tr ⎡<br />
⎣<br />
2 ω ω + ω + ω ω<br />
⎦<br />
,<br />
(1.83)<br />
ω<br />
ω<br />
ω<br />
n<br />
(0) −1<br />
( )<br />
i = ∑ ⎡⎣ ⎤⎦<br />
Tr<br />
ij<br />
j=<br />
1<br />
i<br />
( j 0 )<br />
M D SD S<br />
i<br />
( j<br />
⎡ ⎤ )<br />
n<br />
(1) −1<br />
( )<br />
i = ∑ ⎡⎣ ⎤⎦ Tr<br />
0<br />
,<br />
ij ⎣ ⎦S<br />
j=<br />
1<br />
M D S D X S<br />
( ⎡<br />
i<br />
j<br />
⎡ ⎤ ⎤<br />
0<br />
)<br />
n<br />
(2) 1 −1<br />
( )<br />
i =<br />
2 ∑ ⎡⎣ ⎤⎦ij<br />
⎣⎣ ⎦S ⎦<br />
j=<br />
1<br />
S<br />
M Tr D S D , X , X S .<br />
(1.84)<br />
If the summations are put in the most favorable way, the number of matrix multiplications is limited<br />
and independent of subspace size. Only the update of the metric M takes a number of matrix<br />
multiplications linearly in the subspace size.<br />
ARH<br />
∂E (2)<br />
∂X<br />
From the derivative , the problem to be solved by PCG is set up for the current reference<br />
() i<br />
density D<br />
0<br />
where i denotes the Newton-Raphson step number. Through the whole NR<br />
optimization D 0 and F 0 are the density and Fock matrices from the previous SCF iteration. The NR<br />
step X found by PCG is used to evaluate a new density from Eq. (1.82) and if the new density is<br />
similar to the previous one, the Newton-Raphson optimization has converged, if not, the density is<br />
() i<br />
used as reference density D in the next step.<br />
0<br />
The final density matrix resulting from the NR optimization is then used to evaluate a new Fock<br />
matrix, and so the SCF iterative procedure is established. The SCF scheme for the described<br />
algorithm is illustrated in Fig. 1.24.<br />
35
Part 1<br />
Improving Self-consistent Field Convergence<br />
( 0 )<br />
D 0<br />
( 0 )<br />
( )<br />
F D n<br />
ARH<br />
min E(2) ( X ) ( i<br />
D )<br />
n<br />
by PCG<br />
( i 1<br />
D<br />
) ( X)<br />
n +<br />
i = i + 1<br />
n = n + 1<br />
no<br />
( i+<br />
1) ( i)<br />
no<br />
n ≈ Dn<br />
yes<br />
( 0 ) ( i+<br />
1)<br />
n+ 1 = D n<br />
D ( 0 ) ( 0 )<br />
n+ 1 ≈ D n<br />
D<br />
D<br />
yes<br />
D conv<br />
1.4.3.3 Applications<br />
Fig. 1.24 Flow diagram of the SCF optimization with<br />
the diagonalization of the Fock matrix replaced by a<br />
minimization of the ARH energy. The light blue box<br />
embraces the Newton-Raphson optimization of E ARH .<br />
SCF calculations have been carried out using the ARH scheme. In Fig. 1.25 the convergence of<br />
HF/STO-3G calculations on CrC with 2.00Å bond distance are displayed. Results are given for the<br />
augmented RH scheme, DIIS and TRSCF with the C-shift and d orth -shift schemes, respectively. For<br />
the first iterations in the ARH optimization a limit is put on the ||X|| S norm to avoid changes in the<br />
densities which go beyond the region that is well described by the energy model.<br />
The ARH scheme is clearly superior for this test case, even with the convergence improvements for<br />
TRSCF obtained with the d orth -shift scheme; ARH is almost an iteration in front of ‘TRSCF/d orth -<br />
shift’ in the local region. The standard DIIS approach does not converge at all for this case.<br />
36
The Quality of the Energy Models for HF and DFT<br />
Error in energy / E h<br />
1.E+02<br />
1.E+00<br />
1.E-02<br />
1.E-04<br />
1.E-06<br />
1.E-08<br />
1.E-10<br />
DIIS<br />
TRSCF C-shift std.<br />
TRSCF dnew<br />
orth -shift<br />
ARH<br />
1 3 5 7 9<br />
Iteration<br />
Fig. 1.25 HF/STO-3G calculations on CrC using<br />
different approaches.<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0.0<br />
1 3 5 7 9<br />
Iteration<br />
Fig. 1.26 Details from the ARH optimization in<br />
Fig. 1.25: The part of the density change which can<br />
be described in the subspace of the previous<br />
densities.<br />
To illustrate how information gradually is obtained from the previous densities in ARH, the part of<br />
the density change ∆D = D n+1 - D n in each iteration that can be described in the previous densities<br />
∆D is found as in Eq. (1.78)-(1.79), and the ratio<br />
<br />
∆D<br />
∆D<br />
is depicted in Fig. 1.26. It is<br />
seen how the description of ∆D improves during the first five iterations until a significant part of the<br />
Hessian is described, then a qualified step is taken to another region, and the new density is<br />
therefore not well described in the previous densities. This step is followed by a significant decrease<br />
in SCF energy of two orders of magnitude. The same pattern is repeated after two additional<br />
iterations.<br />
Even though only preliminary results are given in this section, the ARH energy minimization seems<br />
promising, taking the best of the RH and DSM energy models, and improving the convergence<br />
compared to TRSCF, which already saw better or as good convergence rates as DIIS. It could be<br />
expected that this scheme has the ability to converge in fewest SCF iterations overall. The future<br />
success of ARH is dependent on the development of effective ways of solving the nonlinear<br />
equations in X, e.g. by setting up a good preconditioner.<br />
1.5 The Quality of the Energy Models for HF and DFT<br />
Having considered the theory behind the TRRH and TRDSM steps in Section 1.4.1 and 1.4.2<br />
without being concerned with the approximations introduced in the energy functions, this section<br />
takes a closer look at the errors in the energy models compared to the SCF energy. The SCF<br />
optimization of Hartree-Fock and Kohn-Sham-DFT energies is similar; the only difference lies in<br />
the energy expressions to be optimized. The approximations in the energy models will thus also<br />
differ in HF and DFT, and while Section 1.2 described the HF and DFT theory in a generic manner,<br />
this section will focus on the differences, ignoring the general elements already stated in Section<br />
1.2.<br />
S<br />
S<br />
37
Part 1<br />
Improving Self-consistent Field Convergence<br />
To make the differences in the HF and DFT energy expressions clear, we will now study them<br />
separately:<br />
E<br />
= 2TrhD + Tr DG ( D ) + h , (1.85)<br />
HF HF nuc<br />
E = 2TrhD + Tr DG ( D) + h + E ( D ), (1.86)<br />
DFT DFT nuc XC<br />
where<br />
[ G HF ( D ) ] = 2 gµνρσ Dρσ − gµσρν Dρσ<br />
, (1.87)<br />
µν<br />
∑<br />
ρσ<br />
∑<br />
ρσ<br />
[ G DFT ( D )]<br />
= 2 gµνρσ Dρσ −γ gµσρν Dρσ<br />
. (1.88)<br />
µν<br />
∑<br />
ρσ<br />
∑<br />
ρσ<br />
The second term in Eq. (1.87) and Eq. (1.88) is the contribution from exact exchange, with γ = 0 in<br />
pure DFT (LDA), and γ ≠ 0 in hybrid DFT. The exchange-correlation energy E XC (D) in Eq. (1.86) is<br />
a functional of the electronic density. In the local-density approximation (LDA), the exchangecorrelation<br />
energy is local in the density, whereas in the generalized gradient approximation (GGA),<br />
it is also local in the squared density gradient, and may thus be expressed as<br />
EXC ( D) = ∫ f ( ρ( x), ζ( x)<br />
) dx. (1.89)<br />
Here the electron density ρ(x) and its squared gradient norm ζ(x) are given by<br />
T<br />
ρ( x) = χ ( xDχ ) ( x),<br />
ζ( x) =∇ρ( x) ⋅∇ρ( x),<br />
(1.90)<br />
where χ(x) is a column vector containing the AOs. Note that the exchange-correlation energy<br />
density f(ρ(x), ζ(x)) in Eq. (1.89) is a nonlinear (and non-quadratic) function of ρ(x) and ζ(x). In the<br />
following is relied on an expansion of E XC (D) around some reference density matrix D 0<br />
E<br />
T<br />
T<br />
XC XC 0 0 XC 2 0 XC 0<br />
(1) (2)<br />
( D) = E ( D ) +<br />
1<br />
( D− D ) E + ( D−D ) E ( D− D ) + , (1.91)<br />
( n)<br />
where the derivatives E<br />
XC<br />
have been evaluated at D = D 0 and where for convenience a vectormatrix<br />
notation for D, E<br />
(1)<br />
XC<br />
, and E (2)<br />
XC<br />
is used. The precise form of E XC depends on the DFT<br />
functional chosen for the calculation.<br />
It is often more problematic to obtain convergence for DFT than HF, mainly for two reasons: The<br />
HOMO-LUMO gap ∆ε ai is smaller for DFT than for HF, and a determinant with a well separated<br />
occupied and virtual part has better convergence properties than one with a lot of close lying<br />
states 54,55 . Also, since the exchange-correlation is nonlinear and non-quadratic in the density, the<br />
higher order terms in the density not present in Hartree-Fock theory introduces some extra<br />
approximations to the SCF scheme for DFT. In this section these differences and their consequences<br />
for the convergence properties will be discussed for the TRSCF algorithm. It is here assumed that if<br />
the energy models employed in TRSCF were of the same quality for HF and DFT, that is, had errors<br />
38
The Quality of the Energy Models for HF and DFT<br />
of the same order compared to the true SCF energy, then the convergence properties would also be<br />
of the same quality.<br />
The study is mainly performed in the MO basis with a block diagonal Fock matrix as in Eq. (1.10)<br />
and the reference density matrix<br />
MO<br />
D<br />
0<br />
2δ<br />
ij<br />
MO ⎛ 0 ⎞<br />
D0<br />
= ⎜ ⎟<br />
⎝ 0 0 ⎠<br />
. (1.92)<br />
It is also exploited that any valid density matrix D may be expressed in terms of a valid reference<br />
density matrix D 0 as<br />
MO<br />
MO<br />
D ( K)<br />
= exp( −K) D exp( K ) , (1.93)<br />
and can thus be expanded in orders of K through the BCH-expansion 46<br />
MO MO MO 1 MO 3<br />
=<br />
0<br />
+ ⎡<br />
0<br />
⎤ + ⎡⎡<br />
2 0<br />
⎤ ⎤ +<br />
0<br />
D ( K) D ⎣D , K⎦ ⎣⎣D , K⎦, K⎦<br />
O ( K ). (1.94)<br />
The anti-symmetric rotation matrix may be written in the form<br />
⎛ 0 −κ<br />
⎞<br />
K = ⎜ ⎟ , (1.95)<br />
⎝κ 0 ⎠<br />
where κ holds the orbital rotation parameters. The diagonal block matrices representing rotations<br />
among the occupied MOs and among the virtual MOs are zero since the density matrix in Eq. (1.8)<br />
is invariant to such rotations.<br />
In the following subsections the RH energy model Eq. (1.22) and the DSM energy model Eq. (1.55)<br />
are analyzed separately with respect to differences for HF and DFT.<br />
1.5.1 The Quality of the TRRH Energy Model<br />
To compare the RH energy model to the SCF energy, both are expanded about a reference density<br />
matrix D 0 (neglecting the possible difference between F 0 and F(D 0 ) noted in Section 1.4)<br />
E<br />
T<br />
RH RH<br />
E ( D) = E ( D0) + 2Tr F( D0)<br />
( D−D 0 ), (1.96)<br />
( D) = E ( D ) + 2TrF( D )( D− D ) + Tr( D−D ) G( D−D<br />
)<br />
(1)<br />
+ E ( D) − E ( D ) −Tr ( D−D ) E ( D ),<br />
(1.97)<br />
SCF SCF 0 0 0 0 0<br />
XC XC 0 0 XC 0<br />
where the last three terms of Eq. (1.97) only are present in DFT theory. These expansions have the<br />
same first-order term 2TrF(D 0 )(D - D 0 ) and thus the same first derivative with respect to the orbital<br />
rotation parameters κ ai of Eq. (1.95)<br />
RH<br />
(1) ∂E<br />
( κ )<br />
⎡ ⎤<br />
⎣<br />
E<br />
RH ⎦<br />
= = −4F<br />
ai , (1.98)<br />
ai ∂κ<br />
ai<br />
κ=<br />
0<br />
39
Part 1<br />
Improving Self-consistent Field Convergence<br />
(1) ∂ESCF<br />
( κ )<br />
⎡ ⎤<br />
⎣<br />
E<br />
SCF ⎦<br />
= = −4F<br />
ai . (1.99)<br />
ai ∂κ ai κ=<br />
0<br />
The expressions are found replacing D in Eqs. (1.96) and (1.97) with D MO in Eq. (1.94) and<br />
differentiating with respect to κ ai .<br />
All higher order terms in κ arising from 2TrF(D 0 )(D - D 0 ) are consequently also shared for the SCF<br />
and RH energies whereas terms of second and higher order arising from the last term(s) in Eq. 1.94<br />
are neglected in the RH energy model. To study the differences, the second order derivatives in κ<br />
are found in the same way as the first derivatives<br />
2 RH<br />
(2) ∂ E ( κ)<br />
⎡ ⎤<br />
⎣<br />
E<br />
RH ⎦<br />
= = 4δ ij δ ab ( ε a −ε<br />
i )<br />
(1.100)<br />
aibj ∂κ<br />
∂κ<br />
2<br />
ai<br />
bj<br />
κ=<br />
0<br />
(2) ∂ ESCF<br />
( κ)<br />
⎡ ⎤<br />
⎣<br />
E<br />
SCF ⎦<br />
= = 4δδ ij ab ( ε a − ε i ) + W aibj , (1.101)<br />
aibj ∂κ<br />
∂κ<br />
ai<br />
bj<br />
κ=<br />
0<br />
where<br />
HF<br />
16 4( )<br />
W = g − g + g<br />
(1.102)<br />
aibj<br />
aibj abij ajib<br />
( )<br />
DFT<br />
Waibj<br />
= 16gaibj − 4 γ gabij + gajib<br />
+ ⎡ ( ) ⎤ ⎣<br />
E κ ⎦<br />
. (1.103)<br />
(2)<br />
XC<br />
aibj<br />
(2)<br />
E XC ( κ ) is the second derivative of the term E XC expanded in the orbital rotation parameters κ. The<br />
error in the RH energy model can then be said to depend partly on the size of W and partly on the<br />
size of the third and higher order contributions from the nonlinear terms in Eq. (1.97) which are not<br />
included in Eq. (1.96). This general consideration goes for DFT as well as HF, but with different<br />
impact. As seen in Eq. (1.102) and (1.103), the definition of W differs in the two approaches and<br />
even differs depending on which DFT functional is chosen. Furthermore, since the size of the<br />
HOMO-LUMO gap ∆ε ai = ε a - ε i is typically smaller in DFT, the term 4δ ij δ ab (ε a – ε i ) will have<br />
different weights in Eq. (1.101) depending on the method. Also the size of the third and higher<br />
order contributions in Eq. (1.97) would be expected to differ for HF and DFT, since for DFT both<br />
the terms Tr(D - D 0 )G(D - D 0 ) and E XC (D) contribute whereas HF only contains the Tr(D - D 0 )G(D<br />
- D 0 ) term. In the beginning of the optimization, where large steps are taken, the size of the third<br />
and higher order contributions is the potential source of error. Near convergence this should be less<br />
of an issue, and in this region the size of the lowest Hessian eigenvalues should be the decisive error<br />
source.<br />
HF and LDA calculations have been carried out and the part of the SCF energy change arising from<br />
RH<br />
the RH step ∆ E SCF<br />
has been found as well as the change in the RH energy model ∆E RH in each<br />
iteration.<br />
40
The Quality of the Energy Models for HF and DFT<br />
4.0<br />
2.0<br />
0.0<br />
-2.0<br />
HF<br />
LDA<br />
0 5 10 15 20<br />
Iteration<br />
Fig. 1.27 Calculations on the cadmium complex in<br />
Fig. 1.6 in the STO-3G basis set.<br />
3.0<br />
2.0<br />
1.0<br />
0.0<br />
-1.0<br />
-2.0<br />
HF<br />
LDA<br />
0 5 10 15 20 25<br />
Iteration<br />
Fig. 1.28 Calculations on the zinc complex in Fig.<br />
1.3 in the 6-31G basis set.<br />
The change in the RH energy model is found as<br />
idem<br />
( n )<br />
RH<br />
E 2Tr + 1 0<br />
∆ = F D −D , (1.104)<br />
idem<br />
where D<br />
0<br />
is the reference density matrix, typically a D from the previous TRDSM step purified<br />
as in Eqs. (1.32)-(1.33), and D n+1 is the new density found from diagonalization of the Fock matrix.<br />
In the C-shift scheme the criterion Eq. (1.31) ensures that the occupied and virtual orbitals do not<br />
mix, and thus the Hessian, Eq. (1.100), is positive and the RH energy decreases. The SCF energy<br />
change is found as<br />
RH<br />
idem<br />
SCF SCF n+<br />
1 SCF 0<br />
∆ E = E ( D ) − E ( D ). (1.105)<br />
The ratio between Eq. (1.104) and Eq. (1.105) contains information of the quality of the RH energy<br />
model. If the errors are negligible, the ratio is close to 1. If the ratio is larger than one, the RH<br />
energy model exaggerates the energy decrease, and if it is between 0 and 1 it underestimates the<br />
energy decrease. If it is negative, the SCF energy increases even though the RH energy model<br />
predicts an energy decrease.<br />
RH RH<br />
For two test cases the ∆E ∆ E SCF<br />
ratio is displayed in Fig. 1.27 and Fig. 1.28, respectively. It is<br />
clearly seen that generally, the RH energy model is better for HF than for DFT, in particular,<br />
negative values are seen for the LDA ratios. The errors in the RH energy model for the LDA<br />
calculations get worse as convergence is approached, so it would be expected that the significant<br />
source of error is the neglected term W in the Hessian rather than the higher order terms. Since<br />
locally the lowest Hessian eigenvalue should be the one controlling the optimization, this theory is<br />
inspected evaluating the lowest Hessian eigenvalue for both the RH energy model and for SCF<br />
according to Eq. (1.100) and Eq. (1.101), respectively, at convergence of the two test cases. The<br />
results are compared in Table 1-4.<br />
41
Part 1<br />
Improving Self-consistent Field Convergence<br />
Table 1-4 The lowest Hessian eigenvalues for the RH energy<br />
model and SCF energy at convergence of the calculations in Fig.<br />
1.27 and Fig. 1.28. The deviation is found as<br />
( ⎡ (2) ⎤ ⎡ (2) ⎤ )<br />
(2)<br />
RH SCF<br />
100% ⎡ ⎤<br />
⎣<br />
E<br />
⎦<br />
−<br />
⎣<br />
E<br />
⎦<br />
⋅<br />
⎣<br />
E<br />
SCF ⎦<br />
.<br />
(2)<br />
SCF<br />
(2)<br />
RH<br />
min min min<br />
cadmium complex zinc complex<br />
HF LDA HF LDA<br />
⎡<br />
⎣<br />
E ⎤<br />
⎦ min<br />
0.557 0.017 1.000 0.290<br />
⎡ ⎤<br />
⎣<br />
E<br />
⎦ min<br />
1.112 0.014 1.621 0.281<br />
Deviation 100% -21% 62% -2%<br />
As expected, the lowest Hessian eigenvalue for the RH energy model, that is the HOMO-LUMO<br />
gap, is much smaller for LDA than for HF, but surprisingly it is seen that the Hessian prediction in<br />
the RH energy model for LDA is much better than the one for HF. Of course this is only the lowest<br />
eigenvalue, and we have not studied the corresponding eigenvector. We know for sure that the size<br />
of the orbital rotation parameters κ ai decreases during the optimization and should be very small at<br />
convergence, where only small adjustments to the density are made. It is thus difficult to imagine<br />
that terms of third and higher order in κ should be the reason for the larger errors in the DSM<br />
energy model for LDA compared to HF.<br />
This is a matter we will investigate further in the future since it is not understood at the moment.<br />
The importance of the higher order terms should be examined directly to understand how they affect<br />
the errors, and the Hessian should be studied more carefully introducing information about the<br />
direction of the eigenvalues. However, it can still be concluded from Fig. 1.27 and Fig. 1.28 that the<br />
RH energy model is poorer for LDA than for HF optimizations.<br />
1.5.2 The Quality of the TRDSM Energy Model<br />
The TRDSM energy model of Section 1.4.2.2 is formulated in a general manner and is as applicable<br />
to DFT theory as to HF theory. Still, the model will be poorer for DFT than for HF because of the<br />
general exchange-correlation term appearing in the DFT energy.<br />
For the DSM energy model there are in general four possible sources of errors:<br />
1. The purified density D still has an idempotency error.<br />
2. The term<br />
1 T [2]<br />
2 δ 0 δ<br />
D E D in E( D ) , Eq. (1.50), is neglected.<br />
3. E( D ) , Eq. (1.50), is truncated after second order.<br />
4.<br />
( 2 )<br />
0 +<br />
E D in Eq. (1.50) is approximated by 2 F + .<br />
42
The Quality of the Energy Models for HF and DFT<br />
Let us take a closer look at the errors one by one. In ref. 39 a general order analysis of the purified<br />
density D used in the parameterization of the DSM energy is given, and the results are summarized<br />
in Table 1-5.<br />
Table 1-5. Comparison of the properties of the unpurified density D and the purified<br />
density D . c is the density expansion coefficients and κ is the orbital rotation parameters<br />
that change D 0 to another density in the subspace D i .<br />
D<br />
Differences D+ = D− D0 = ( c κ )<br />
O<br />
2<br />
Dδ = D<br />
− D = O ( c κ )<br />
Idempotency error<br />
2<br />
4<br />
DSD − D = O ( c κ ) DSD − D<br />
= O ( c 2 κ )<br />
Trace error Tr DS − N / 2 = 0<br />
2 4<br />
Tr DS − N / 2 = O ( c κ )<br />
In the D column, the order of the idempotency correction D δ and the idempotency error for D are<br />
found. These are the same for DFT and HF; the idempotency error is of order c 2 ||κ|| 4 , and since D δ<br />
is of the order c||κ|| 2 , the error connected to the neglect of the term second order in D δ , will be of<br />
order c 2 ||κ|| 4 as well.<br />
The third possible source of errors is the truncation of the energy E( D ) after second order in the<br />
density. Since the Hartree-Fock energy is quadratic in the density, this truncation leads to no errors<br />
for HF, but for DFT there will be an error of order ||D + || 3 and from the first column in Table 1-5 it is<br />
seen that it can be written as an error of order c 3 ||κ|| 3 , since D + is of the order c||κ||. Also since the<br />
(3)<br />
HF energy is quadratic in the density, no third derivative E<br />
0<br />
exists and thus the Taylor expansion<br />
( 2 )<br />
used to find E0 D+ = 2F + is terminated for HF, but for DFT terms of order ||D + || 2 are neglected.<br />
( 2 )<br />
Since E0 D + is multiplied by D + in the energy function Eq. (1.50), this gives an error for DFT of<br />
the order ||D + || 3 or as before c 3 ||κ|| 3 . The sizes of the introduced errors are summarized in Table 1-6.<br />
Table 1-6. Comparison of the errors introduced in the DSM energy model for<br />
HF and DFT respectively.<br />
D <br />
1 Idempotency error DSD − D<br />
2 Neglected term<br />
3 Truncation of ( )<br />
4 Approximation of<br />
( )<br />
error in HF<br />
error in DFT<br />
( 2 4<br />
O c κ )<br />
2 4<br />
O ( c κ )<br />
1 T [2]<br />
D<br />
2 δ<br />
E0<br />
D<br />
2 4<br />
2 4<br />
δ O ( c κ ) O ( c κ )<br />
E D 0 3 3<br />
O ( c κ )<br />
2<br />
E0 D +<br />
0 3 3<br />
O ( c κ )<br />
Depending on the sizes of c and ||κ|| respectively, the error for DFT will be of same or lower order<br />
than the one for HF. To inspect whether or not the DSM energy is a poorer model for DFT than for<br />
HF, a number of calculations have been carried out, and the sizes of ||D δ || and ||D + || for the DSM<br />
step in each iteration are examined. Since D δ is of the order c||κ|| 2 and D + is of the order c||κ||, the<br />
43
Part 1<br />
Improving Self-consistent Field Convergence<br />
size of ||D δ || 2 and ||D + || 3 will indicate whether the error in the energy model is controlled by the<br />
( c 2 4<br />
3 3<br />
O κ ) or the ( c κ )<br />
O error. The test cases showed similar behavior and results from HF<br />
and LDA calculations on the cadmium complex in Fig. 1.6 with a STO-3G basis and a H1-core start<br />
guess are displayed in Fig. 1.29 and Fig. 1.30.<br />
1.E+00<br />
1.E-02<br />
1.E-04<br />
1.E-06<br />
1.E-08<br />
1.E-10<br />
4<br />
||D+||^4D + S<br />
||D+||^3<br />
3<br />
D + S<br />
2||Ddelta||^2 2<br />
2 Dδ<br />
S<br />
dEDSM<br />
E − E<br />
HF<br />
DSM<br />
2 5 8 11 14 17 20<br />
Iteration<br />
1.E+01<br />
1.E-01<br />
1.E-03<br />
1.E-05<br />
1.E-07<br />
1.E-09<br />
1.E-11<br />
4<br />
||D+||^4D + S<br />
3<br />
||D+||^3D + S<br />
||Ddelta||^2 2<br />
D δ S<br />
dEDSM<br />
E − E<br />
LDA<br />
DSM<br />
2 5 8 11 14 17 20 23<br />
Iteration<br />
Fig. 1.29 HF/STO-3G calculation. The size of<br />
different density norms compared to the actual<br />
error in the DSM energy model.<br />
Fig. 1.30 LDA/STO-3G calculation. The size of<br />
different density norms compared to the actual<br />
error in the DSM energy model.<br />
DSM<br />
The SCF energy at the end of a DSM step ESCF<br />
is found by purifying the resulting D by Eq. (1.32)<br />
–(1.33) and evaluating the SCF energy, Eq. (1.1), for this density. The DSM energy, Eq. (1.55), is<br />
DSM DSM<br />
also evaluated and the error of the DSM energy model is then found as the size ESCF<br />
− E .<br />
For the HF calculation this error is expected to be of the size ||D δ || 2 , and it is seen in Fig. 1.29 that<br />
this is actually the case; if ||D δ || 2 is multiplied by 2, there is a remarkable fit. Also it is seen that if<br />
the error in the DSM energy for HF should be expressed in the density differences D + , it would be<br />
the density differences to the third rather than the fourth order. For the DFT calculation the<br />
interesting point was to see whether or not ||D + || 3 is the controlling error. In Fig. 1.30 is seen that<br />
even though there is not an obvious fit as for HF, ||D δ || 2 seems to be the dominant error here as well.<br />
Still, if the error should be expressed in the density differences D + , it would be the density<br />
differences to the third rather than the fourth order as expected for DFT.<br />
In conclusion it seems that the dominating error in the DSM energy both for HF and DFT is ||D δ || 2 ,<br />
that is, the idempotency correction squared. In comparison it should be mentioned that the EDIIS<br />
model 37 by Kudin, Scuseria, and Cancès corresponds to E( D ) in Eq. (1.55) and thus has an error of<br />
the order ||D δ || compared to the SCF energy.<br />
1.6 Convergence for Problems with Several Stationary Points<br />
The HF equation is a nonlinear equation and, therefore, it presents in principle several solutions.<br />
Several minima might exist, and even though it is typically preferred to find the global minimum,<br />
44
Convergence for Problems with Several Stationary Points<br />
no optimization method can make that a guarantee. Furthermore, it cannot be tested if the minimum<br />
found is a local or the global minimum without knowledge of the whole surface. Depending on the<br />
start guess and the optimization approach, an optimization can converge to different stationary<br />
points. Further, it is necessary to decide in which subspace of orbital rotations the desired solution<br />
should be found, since a solution representing a stable stationary point in one subspace is not<br />
necessarily stable in another.<br />
Orbital rotations can be divided in real and complex rotations and each of those can be further<br />
divided in singlet and triplet rotations. Each of those can then again be divided in rotations within<br />
the different point group symmetries. Generally, we do not consider the complex rotations, and we<br />
only optimize in the real space. Further, when optimizing a closed shell wave function, only the<br />
total-symmetric part of the singlet rotations is considered. A stationary point in the subspace of real,<br />
total-symmetric, singlet rotations can be shown through elementary arguments to be a stationary<br />
point for all types of rotations. However, a stationary point can both be a maximum, a saddle point<br />
or a minimum. A way to realize if the stationary point also is a minimum is to evaluate the Hessian<br />
eigenvalues. This is done within the subspace in which the solution should be stable. If a negative<br />
Hessian eigenvalue is found in the subspace of singlet rotations, the stationary point is said to have<br />
a singlet instability and if a negative Hessian eigenvalue is found in the subspace of triplet rotations,<br />
it is said to have a triplet instability 54,56 . Triplet instabilities are connected to breaking the symmetry<br />
between α and β orbitals. If a triplet instability is found, a minimum with a lower energy than the<br />
current stationary point can be found, if the α and β parts are allowed to differ, typically leading to<br />
2<br />
a solution which is not an eigenfunction of Ŝ . Hence, the lower minimum could be found by an<br />
unrestricted HF (UHF) optimization. A singlet instability found in the total-symmetric subspace<br />
indicates that the current stationary point is a saddle point and a minimum with lower energy exists<br />
within the subspace. If a singlet instability is found outside the total-symmetric subspace, orbitals of<br />
different symmetries should be mixed to decrease the energy further, changing the symmetry of the<br />
resulting wave function.<br />
The aufbau ordering rule assumes that occupying the orbitals of lowest energy also leads to the<br />
lowest Hartree-Fock energy. This cannot be proven to always apply for restricted HF as it can for<br />
UHF 57 . Thus it is a risk when the aufbau ordering is forced upon an optimization, that a lower<br />
energy with the aufbau ordering broken could exist. However in a study by Dardenne et. al. 58 , in<br />
which different ordering schemes were tested, they found in all cases that the minimum was an<br />
aufbau solution. The aufbau ordering was broken only for saddle points. In our schemes we always<br />
apply the aufbau ordering rule, but if the RH step is level shifted to the end of the optimization, it<br />
can force the convergence to a non-aufbau solution.<br />
45
Part 1<br />
Improving Self-consistent Field Convergence<br />
1.6.1 Walking Away from Unstable Stationary Points<br />
As concluded in the previous section, the Hessian eigenvalues should be tested to make sure the<br />
optimized state is stable. This is expensive, so it is only done when it is expected that the problem<br />
has several stationary points. Depending on the desired solution, only the relevant part of the<br />
Hessian is checked. So far we have only considered singlet instabilities, but currently tests for triplet<br />
instabilities are implemented as well.<br />
The check for singlet instabilities is made on the converged wave function, finding the lowest<br />
Hessian eigenvalue of the Hessian in the real, singlet subspace. If the lowest Hessian eigenvalue<br />
turns out to be positive, we are sure to have a solution which is stable with respect to singlet<br />
rotations, but if it is negative we are in a saddle point, and a minimum with a lower energy exists<br />
within the subspace. We have in our SCF program implemented the possibility to test the singlet<br />
Hessian and in case of a negative lowest Hessian eigenvalue follow the corresponding direction<br />
downhill and away from the saddle point. The scheme and some examples of its use will be<br />
described in the following.<br />
1.6.1.1 Theory<br />
When the SCF optimization has converged, the set of optimized orbitals described by their<br />
expansion coefficients C opt are used to evaluate the lowest Hessian eigenvalues and the<br />
corresponding eigenvectors by an iterative subspace method. If the lowest Hessian eigenvalue ε min is<br />
found positive, then it is clear that the optimization has converged to a minimum. If on the other<br />
hand the eigenvalue is negative, we know for sure that a lower stationary point exists.<br />
We would then like to take a step downhill in the direction x corresponding to the negative<br />
eigenvalue ε min<br />
( 2 )<br />
SCF<br />
E x = εminx. (1.106)<br />
This can be accomplished making a unitary transformation of the optimized expansion coefficients<br />
C opt with x as the orbital rotation parameters to define the direction X dir of the step<br />
X<br />
dir<br />
T<br />
ai<br />
⎡ 0 −x<br />
⎤<br />
= ⎢ ⎥ . (1.107)<br />
⎣ xai<br />
0 ⎦<br />
The step length is controlled by a parameter α<br />
Uα<br />
= exp ( −α<br />
X dir )<br />
(1.108)<br />
C′ ( α ) = C U . (1.109)<br />
opt opt α<br />
A line search is then carried out for α > 0 to find the lowest SCF energy in the direction X dir . This is<br />
of course expensive since every point in the line search requires an evaluation of the Fock matrix<br />
46
Convergence for Problems with Several Stationary Points<br />
with respect to the new coefficients C opt ′ . When the SCF energy minimum in the direction X dir is<br />
found, the corresponding coefficients should be the initial orbitals for a new SCF optimization,<br />
hopefully now optimizing further downhill to a minimum. In problematic cases, e.g. with a very flat<br />
saddle point close to the minimum, we have found it convenient to continue the optimization with<br />
the line search scheme TRSCF-LS (the combination of TRRH-LS and TRDSM-LS described in<br />
Sections 1.4.1.4 and 1.4.2.4) to ensure a continued decrease in the energy.<br />
1.6.1.2 Examples<br />
In Fig. 1.31 and Fig. 1.32 two examples of problems with several stationary points are given.<br />
Error in energy / E h<br />
1.E+02<br />
1.E+00<br />
1.E-02<br />
1.E-04<br />
1.E-06<br />
1.E-08<br />
TRSCF<br />
d orth -shift<br />
TRSCF C-shift<br />
Line search<br />
0 20 40 60<br />
Iteration<br />
Fig. 1.31 HF calculations on the rhodium complex.<br />
Error in energy / E h<br />
1.E+02<br />
1.E+00<br />
1.E-02<br />
1.E-04<br />
1.E-06<br />
1.E-08<br />
(1)<br />
Line search<br />
(2)<br />
(3)<br />
TRSCF<br />
0 40 80 120<br />
Iteration<br />
Fig. 1.32 HF/STO-3G calculation on CrC.<br />
The first example is a HF optimization on the rhodium complex seen in Fig. 1.33 in the<br />
AhlrichsVDZ basis 59 combined with STO-3G on rhodium. For this example DIIS diverges, but the<br />
TRSCF scheme with C-shift converges nicely in 38 iterations. However, when the Hessian is<br />
inspected it is found that the lowest eigenvalue is negative, and a search in α is carried out in the<br />
direction corresponding to the negative eigenvalue. This is<br />
illustrated with the orange line in the picture. Since each<br />
evaluation of a step-length α necessitates an evaluation of the<br />
Rh Cl<br />
Fock matrix, it is fair to display each line search step as an<br />
iteration on the SCF iteration scale. When a minimum is found in<br />
this direction, the corresponding orbitals are used as a start guess<br />
for a new TRSCF optimization, and it is seen that it now Fig. 1.33 Rhodium complex.<br />
converges nicely to a new and lower stationary point which is<br />
found to be a minimum. When the d orth -shift scheme is applied in the TRRH steps instead of the C-<br />
shift scheme, it turns out that convergence to the minimum is obtained with no problems, as seen<br />
from Fig. 1.31, illustrating how the stationary point found from an SCF optimization not only<br />
depends on the start guess, but also on the optimization procedure.<br />
47
Part 1<br />
Improving Self-consistent Field Convergence<br />
The second example is a HF/STO-3G optimization of CrC with a bond distance on 2.00Å. The<br />
example is also used in Fig. 1.13 and Fig. 1.25, but without discussing the stability of the converged<br />
state. Also in this case DIIS diverges whereas TRSCF converges nicely in 12-13 iterations to a<br />
stationary point which is found to have singlet instabilities. As for the first example, a line search is<br />
carried out in the downhill direction and a new TRSCF optimization is started from the resulting<br />
orbitals. This time the second optimization has more problems than was the case for the rhodium<br />
example, but finally it converges to a minimum. Whereas in the rhodium case, only one plateau<br />
corresponding to the saddle point could be seen, in this case three plateaus can be found, marked by<br />
numbers on the figure. The first is the saddle point that TRSCF converges to, at E SCF =<br />
− 1068.77014939 and with a lowest Hessian eigenvalue of -0.624. The second and third stationary<br />
points are recognized as saddle points by TRSCF itself and it manages to move away. If a DIIS<br />
optimization is carried out with a Hückel start guess, it converges to the second stationary point,<br />
which has E SCF = -1069.21761813 and a lowest Hessian eigenvalue of -0.038, again demonstrating<br />
that depending on the optimization procedure and start guess, different stationary points can be<br />
found. It is thus necessary to check the Hessian of the result to know for sure that a minimum is<br />
found, and in this case the final minimum has E SCF = -1069.30090709 and a lowest Hessian<br />
eigenvalue of 0.043. CrC is well known for being a molecule with a complicated electronic energy<br />
surface and has been the object for several theoretical studies 60 .<br />
The scheme testing for singlet instabilities and walking away from unstable stationary points could<br />
be integrated more efficiently in the optimization than is done here. It can be seen from Fig. 1.31<br />
and Fig. 1.32 that the optimizations are completely converged before the Hessian check is made,<br />
spending many iterations improving the unwanted result. The check could be made in an earlier<br />
stage, saving a number of iterations. Also the steps taken in the line search could be optimized such<br />
that fewer steps were necessary to find the minimum. Anyhow, it is convenient to have the<br />
possibility to continue an optimization until a minimum is found.<br />
1.7 Scaling<br />
As mentioned in the introduction, it is now possible to apply ab-initio quantum chemical methods,<br />
in particular HF and DFT, to large molecular systems of interest for biology and nano-science. This<br />
is due to both the developments in integral screening and algorithms for the Fock matrix builder and<br />
to approaches avoiding diagonalization and exploiting sparsity in the matrices. Since the TRSCF<br />
scheme has properties which would be of great advantage for SCF calculations on large and<br />
complex molecules, it is crucial that the scheme can be formulated in a linear or near-linear scaling<br />
manner. We have not been concerned with the build of the Fock matrix, and any state-of-the-art,<br />
linear or near-linear scaling approach could be used as the Fock builder for our scheme. The steps to<br />
48
Scaling<br />
consider are thus the Roothaan-Hall step TRRH, which evaluates a new density matrix, and the<br />
density subspace minimization TRDSM, which improves convergence. In the following subsections<br />
the scaling of these steps will be discussed.<br />
1.7.1 Scaling of TRRH<br />
The TRRH scheme with C-shift described in Section 1.4.1.2 requires the diagonalization of a level<br />
shifted Fock matrix and the knowledge of the occupied molecular orbital coefficients. The<br />
diagonalization scales as well as a matrix multiplication as N 3 , where N is the dimension of the<br />
problem, in this case the number of basis functions. However, a diagonalization is ineffective and<br />
cannot be nearly as well optimized as a matrix multiplication, and thus the scaling factor is much<br />
larger for the diagonalization than for the matrix multiplication. Also, the matrix multiplication can<br />
exploit sparsity and obtain a scaling linearly in the number of non-zero elements whereas sparsity is<br />
not as easily exploited in diagonalizations. Furthermore, the molecular orbitals described by the<br />
eigenvectors from the diagonalization of the Fock matrix are inherently delocalized and thus there is<br />
no sparsity to exploit.<br />
To obtain a linear scaling TRRH step it is thus necessary to avoid completely the diagonalizations<br />
and any reference to the MO basis. This can be done in our SCF program – a local version of<br />
DALTON 38,49 - by combining the d orth -shift scheme described in Section 1.4.1.5 with the trace<br />
purification (TP) described in Section 1.4.1.6.<br />
The trace purification scheme replaces the diagonalization of the level shifted Fock matrix and<br />
makes it possible to exploit sparsity in the matrices. A sparse blocked matrix storage scheme has<br />
been implemented for this purpose. In this scheme the columns and rows in the matrices are<br />
permuted such that close lying atoms are collected in blocks, making it possible to exploit the<br />
locality in the basis functions. Based on some drop tolerance for the size of matrix elements, pure<br />
zero blocks can be found and neglected, both saving storage and computing time. A library has been<br />
developed for the purpose of handling the matrix operations for this type of matrices and controlling<br />
the truncation error arising from the neglect of elements 49 .<br />
Calculations have been carried out on glycine chains of different length in the 4-31G basis set on a<br />
3.4GHz Xeon/Nocona Machine with EM64T architecture and MKL BLAS+LAPACK library.<br />
Timings have been made in the third iteration of the SCF optimization, measuring how much time<br />
(CPU) is spent in the TRRH step in the case of full matrices and diagonalizations of the level<br />
shifted Fock matrix (Diag./full) and in the case of sparse blocked matrices and the TP scheme<br />
(TP/sparse). The results are seen in Fig. 1.34. Both in the full and sparse case the d orth -shift scheme<br />
is applied.<br />
49
Part 1<br />
Improving Self-consistent Field Convergence<br />
60<br />
Time / min.<br />
50<br />
40<br />
30<br />
20<br />
10<br />
Diag./full<br />
TP/sparse<br />
0<br />
400 1050 1700 2350 3000<br />
Number of basis functions<br />
Fig. 1.34 Timings of a TRRH step in case of<br />
diagonalizations of full matrices (Diag./full) and in<br />
case of trace purification of sparse blocked matrices<br />
(TP/sparse).<br />
The crossover is already around 1500 basis functions, and it is clear how the diagonalization<br />
scheme quickly will become too time consuming if the number of basis functions is increased<br />
further. Of course, this is a linear molecule as seen from Fig. 1.35, and the cross over will be later<br />
for more three-dimensional molecules. The TP method does not have an exact linear scaling<br />
because of the transformation to the orthogonal basis which gives rise to a quadratic term, but the<br />
scaling factor on the quadratic term is very small. It should be noted that the dynamic level shift<br />
scheme typically takes 5-10 diagonalizations or trace purifications to find the optimal level shift in<br />
the first couple of iterations, and as the timings are from the third iteration, then not just one, but<br />
several diagonalizations or purifications are included in the timings in Fig. 1.34. Currently a full<br />
trace purification optimization (30-70 purification iterations) is carried out for each level shift tested<br />
to find the optimal level shift. It is straightforward to optimize this process such that the purification<br />
is not converged as hard for the level shifts tested and rejected, as for the final optimal level shift.<br />
Fig. 1.35 Glycine chain.<br />
To conclude, the scaling of the TRRH scheme with C-shift is dominated by the diagonalization, and<br />
sparsity cannot be exploited. Still with a good Fock builder it can run effectively up to a couple of<br />
thousand basis functions, but at some point the diagonalizations get too time consuming. For larger<br />
systems the purification scheme with the d orth -shift scheme can be used with blocked sparse matrices<br />
resulting in a near-linear scaling.<br />
50
Applications<br />
1.7.2 Scaling of TRDSM<br />
For the density subspace minimization, a set of linear equations, Eq. (1.66), are solved in each DSM<br />
step, but only in the dimension of the subspace which is much smaller than the number of basis<br />
functions. It is therefore of no significance compared to the matrix additions and multiplications<br />
needed to set up the DSM gradient g and Hessian H for the linear equations. For TRDSM it will<br />
thus only be the number of matrix multiplication that determines the scaling. Nothing has to be<br />
changed to exploit sparsity in the matrices, and linear scaling is automatically obtained from the<br />
point where the number of non-zero elements in the matrices is linear scaling. For full matrices the<br />
scaling is formally N 3 , where N is the number of basis functions, but as mentioned in the previous<br />
subsection this is not a problem as it is for the diagonalization, since matrix multiplications can be<br />
carried out with close to peak performance on computers. However, the number of matrix<br />
multiplications should be kept at a minimum as it affects the scaling factor.<br />
The number of matrix multiplications is dependent on the dimension of the subspace as the number<br />
of gradient and Hessian elements grows with the size of the subspace, but even though the Hessian<br />
is set up explicitly, the number of matrix multiplications only scales linearly with the dimension of<br />
the subspace. The expressions for the DSM gradient and Hessian are found in 0, and it is seen that if<br />
only the matrices FD i , SD i , FDiS and DSD i are evaluated, then all the terms for a Hessian<br />
element can be expressed as the trace of two known matrices or their transpose. As the operation<br />
TrAB scales quadratically instead of cubically, the overall scaling of TRDSM will be nN 3 for full<br />
matrices, where n is the dimension of the subspace and N the dimension of the problem. For sparse<br />
matrices both the matrix multiplications and TrAB scale linearly, but since n 2 TrABs are evaluated,<br />
the overall scaling is n 2 N. However, the trace operations have a very small prefactor.<br />
In the TRSCF scheme with C-shift the diagonalizations are thus the dominating operations, but<br />
since both the TRRH and TRDSM step can be carried out without any reference to the MO basis<br />
and with matrix multiplications as the most expensive operations, the TRSCF scheme is near-linear<br />
scaling and has what it takes to be applied to really large molecular systems. It is still a work in<br />
progress to get all the parts working together, so unfortunately no large scale TRSCF calculations<br />
will appear in this thesis, and no benchmarks in which sparsity in the matrices is exploited for<br />
TRDSM can be presented, but the whole framework is in place.<br />
1.8 Applications<br />
In this section, numerical examples are given to illustrate the convergence characteristics of the<br />
TRSCF and ARH calculations. Comparisons are made with DIIS, the TRSCF-LS method, and the<br />
globally convergent trust-region minimization method (GTR) of Francisco et. al. 26 .<br />
51
Part 1<br />
Improving Self-consistent Field Convergence<br />
In Section 1.8.1 a set of small molecules used by Francisco et. al. to illustrate the convergence<br />
characteristics of GTR is considered. Next in Section 1.8.2 the convergence of calculations on three<br />
metal complexes is discussed for the DIIS, TRSCF and TRSCF-LS methods.<br />
1.8.1 Calculations on Small Molecules<br />
As an alternative to the RH diagonalization, Francisco et. al. have developed an energy<br />
minimization method (GTR), where an energy model is minimized by a trust-region minimization.<br />
They have proven that it is a globally convergent algorithm, that is, no matter the starting point; the<br />
iterative steps will converge towards a stationary point. The best results are obtained when they<br />
combine GTR with DIIS and thereby let DIIS accelerate the convergence. To examine the<br />
convergence characteristics of TRSCF and ARH compared to GTR, calculations have been carried<br />
out with the attempt to reproduce the conditions given in the paper by Francisco et. al.. Thus HF<br />
calculations have been carried out with a maximum number of 10 previous density matrices for the<br />
density subspace minimizations and convergence is obtained when the difference between two<br />
consecutive energies is smaller than 10 -9 E h . The results are given in Table 1-7; the numbers found<br />
with our SCF program are on a white background, whereas results copied from the GTR paper are<br />
on a grey background.<br />
Table 1-7 Number of iterations in HF calculations performed by each algorithm in some test problems. The<br />
geometry of the molecules and the results in grey are taken from the paper by Francisco et. al. 26 , and<br />
GTR+DIIS is their globally convergent trust-region algorithm with DIIS acceleration.<br />
Algorithm<br />
Molecule Basis Start guess DIIS TRSCF<br />
C-shift<br />
TRSCF<br />
d orth -shift<br />
ARH DIIS GTR<br />
+DIIS<br />
H 2 O STO-3G H1-core 7 7 7 6 5 5<br />
6-31G H1-core 10 9 8 8 8 8<br />
NH 3 STO-3G H1-core 7 8 7 6 7 7<br />
6-31G H1-core 9 9 8 8 7 7<br />
CO STO-3G H1-core 12 9 9 9 11 10<br />
Hückel 8 8 8 - 7 7<br />
CO(Dist) * STO-3G H1-core 39(a) 9 8 8 117(b) 10<br />
Hückel 35 10 8 - 85 15<br />
6-31G H1-core 24(a) 13 10 9 27(b) 115<br />
Hückel 21(a) 10 10 - 36(b) 59<br />
Cr 2 STO-3G H1-core 34(a) 14(a) 10(a) 12(a) 13 38<br />
CrC STO-3G H1-core 29(a) 13(a) 11(a) 10(a) (X) 29<br />
* Distorted geometry – double bond length compared to CO<br />
(a) Negative Hessian eigenvalue.<br />
(b) Converged to a higher energy than some of the other algorithms<br />
(X) No convergence in 5001 iterations.<br />
Let us first consider the results obtained from our SCF program. Comparing the TRSCF results<br />
(both C-shift and d orth -shift) to the DIIS results, it is clear that the TRSCF method not only is an<br />
52
Applications<br />
improvement when DIIS cannot converge, but also for small simple examples, the convergence of<br />
TRSCF is as good as or better than for DIIS. Also it is observed that in five instances DIIS converge<br />
to a stationary point which is not a minimum, while that only happens in two instances for TRSCF.<br />
This suggests that the TRSCF algorithm does not have a high tendency to converge to saddle points<br />
compared to DIIS. Comparing the results obtained for TRSCF with the C-shift and the d orth -shift<br />
schemes, only minor differences are seen for these small examples, but in all cases the d orth -shift<br />
scheme presents a faster or similar convergence rate compared to the C-shift scheme. With the<br />
ARH method the convergence is further improved compared to the TRSCF/d orth -shift scheme. It is<br />
only a matter of saving a single iteration in some of the examples, but the tendency is clear. As the<br />
algorithm is still in the implementation phase, no numbers can currently be obtained with the<br />
Hückel start guess.<br />
Comparing now the results from our SCF program with the results from the GTR paper, the obvious<br />
peculiarity is the discrepancies between the DIIS results obtained by Francisco et. al. and by us. A<br />
plain DIIS optimization should be completely reproducible, but there is a difference of two out of<br />
seven iterations. These differences cannot be explained and make it more difficult to compare our<br />
results with theirs. Furthermore it seems that they have not tested the Hessian eigenvalues at the<br />
end; only if they for some other start guess or optimization method found a lower energy, it is noted<br />
in their table, and thus we cannot know for sure if the given number of iterations corresponds to<br />
convergence to a minimum. For Cr 2 and CrC it is very difficult to find the minimum, and several<br />
saddle points exist where convergence can be obtained (see Section 1.6). It is thus an open question<br />
whether the GTR+DIIS calculations for Cr 2 and CrC actually converge to a minimum or to a saddle<br />
point as for the TRSCF methods.<br />
In the examples where GTR+DIIS gives an improvement compared to their DIIS results, TRSCF<br />
and ARH also give significant improvements to our DIIS results. For the distorted CO example,<br />
TRSCF and ARH show better convergence than GTR+DIIS even if the results could be compared<br />
directly. For all examples TRSCF and ARH converge in 7-14 iterations, whereas GTR+DIIS use<br />
between five and 115. However, as discussed in Section 1.4.1.3, DIIS does not perform well when<br />
the gradient and energy are not correlated as is often the case in the global region when using<br />
TRRH, and could very well be the case for GTR as well. TRRH should be combined with a density<br />
subspace minimization method in the energy (e.g. TRDSM), and the same probably applies for<br />
GTR. We would thus suggest an implementation of TRDSM in connection with GTR.<br />
In conclusion it has been illustrated that the TRSCF and ARH methods have very nice convergence<br />
properties with improvements compared to DIIS in general and to GTR+DIIS as well, in case of<br />
more problematic examples.<br />
53
Part 1<br />
Improving Self-consistent Field Convergence<br />
1.8.2 Calculations on Metal Complexes<br />
In reference 39 and throughout this part of the thesis, three molecules including transition metals<br />
have been used for examples, namely the molecules in Fig. 1.3, Fig. 1.6 and Fig. 1.33. In this<br />
section HF and LDA calculations on these metal complexes are given both for DIIS, TRSCF and<br />
TRSCF-LS. For all calculations a H1-core start guess has been employed and a maximum of 10<br />
matrices are used to define the subspace in the density subspace minimization. This is different<br />
from the examples given in ref. 39, where the subspace dimension never was larger than eight.<br />
Furthermore for the TRSCF calculations in ref. 39 the C-shift scheme was applied whereas in the<br />
calculations reported here, the d orth -scheme has been applied.<br />
TRSCF-LS is the TRSCF line search method in which the TRRH-LS and TRDSM-LS steps<br />
described in Sections 1.4.1.4 and 1.4.2.4 are combined to set up an expensive, but highly robust<br />
method, in which the lowest SCF energy is identified by a line search at each step. The convergence<br />
results of the optimizations are seen in Fig. 1.36. For the cadmium complex a STO-3G basis set has<br />
been applied, for the rhodium complex the AhlrichsVDZ basis set 59 has been applied except for the<br />
rhodium which is described in the STO-3G basis and for the zinc complex the 6-31G basis set has<br />
been applied.<br />
The convergence of the TRSCF and TRSCF-LS methods is comparable for all cases in Fig. 1.36,<br />
and in general the TRSCF calculations converge in fewer iterations than the TRSCF-LS calculations<br />
do. As mentioned the line search method TRSCF-LS is much more expensive than TRSCF, and the<br />
only reason for applying it instead of TRSCF is for very difficult examples, where convergence<br />
cannot be obtained in any other way.<br />
The convergence behavior of the DIIS method is somewhat more erratic than that of the TRSCF<br />
methods since it makes no use of Hessian information and therefore cannot predict reliably what<br />
directions will reduce the total energy. The HF calculation on the rhodium complex and the LDA<br />
calculation on the zinc complex both diverge for the DIIS method. In general the erratic behavior is<br />
in particular seen in the global region whereas in the local region, it converges as well as the<br />
TRSCF method.<br />
54
Applications<br />
HF<br />
LDA<br />
Error in energy / E h<br />
1.E+02<br />
1.E+00<br />
1.E-02<br />
1.E-04<br />
1.E-06<br />
A<br />
Error in energy / E h<br />
1.E+02<br />
1.E+00<br />
1.E-02<br />
1.E-04<br />
1.E-06<br />
A<br />
1.E-08<br />
0 5 10 15 20<br />
Iteration<br />
1.E-08<br />
0 5 10 15 20<br />
Iteration<br />
Error in energy / E h<br />
1.E+02<br />
1.E+00<br />
1.E-02<br />
1.E-04<br />
1.E-06<br />
B<br />
Error in energy / E h<br />
1.E+02<br />
1.E+00<br />
1.E-02<br />
1.E-04<br />
1.E-06<br />
B<br />
1.E-08<br />
0 10 20 30 40<br />
Iteration<br />
1.E-08<br />
0 10 20 30 40<br />
Iteration<br />
Error in energy / E h<br />
1.E+02<br />
1.E+00<br />
1.E-02<br />
1.E-04<br />
1.E-06<br />
C<br />
Error in energy / E h<br />
1.E+02<br />
1.E+00<br />
1.E-02<br />
1.E-04<br />
1.E-06<br />
C<br />
1.E-08<br />
0 10 20 30 40<br />
Iteration<br />
1.E-08<br />
0 10 20 30 40<br />
Iteration<br />
DIIS TRSCF TRSCF-LS<br />
Fig. 1.36 Convergence of HF and LDA calculations on (A) the cadmium complex from Fig. 1.6,<br />
(B) the rhodium complex from Fig. 1.33, and (C) the zinc complex from Fig. 1.3.<br />
For the examples presented both in this and the previous subsection, the TRSCF convergence is as<br />
good as or better than DIIS, and for problems where DIIS diverges, convergence is obtained with<br />
the TRSCF methods. It thus seems that TRSCF has the properties of a good black-box optimization<br />
algorithm.<br />
55
Part 1<br />
Improving Self-consistent Field Convergence<br />
1.9 Conclusion<br />
In this part of the thesis the trust region SCF (TRSCF) algorithm is presented as a means to improve<br />
SCF convergence compared to methods typically used today e.g. DIIS. In the TRSCF method, both<br />
the Roothaan-Hall (RH) step and the density-subspace minimization (DSM) steps are replaced by<br />
optimizations of local energy models of the Hartree-Fock/Kohn-Sham energy E SCF . These local<br />
models have the same gradient as the energy E SCF , but an approximate Hessian. Restricting the steps<br />
of the TRSCF algorithm to the trust region of these local models, that is, to the region where the<br />
local models approximate E SCF well, smooth and fast convergence may be obtained.<br />
The developments through the years in SCF optimization algorithms are reviewed, and it is found<br />
that the fundamental schemes used in TRSCF to improve convergence have been around for several<br />
years; DIIS is actually a subspace minimization in the gradient norm, and level shifts have been<br />
used to improve or force convergence since 1973. Anyhow, the level shifts have previously been<br />
found on a trial and error basis as a constant parameter, whereas we advocate a dynamic level shift<br />
scheme in which the level shift is used to control the density change in the RH step. As such the<br />
level shift is optimized in each iteration to allow the density to change to the trust radius of the RH<br />
energy model, hence the name trust region Roothaan-Hall (TRRH) for our RH scheme. Also, the<br />
density subspace minimization has been improved compared to previous methods. An accurate<br />
energy model is constructed in the iterative subspace, where only minor approximations are made<br />
compared to the SCF energy. The trust region minimization of this energy model thus corresponds<br />
well to a minimization of E SCF in the iterative subspace, thus resulting in an energy decrease in each<br />
trust region DSM (TRDSM) step. The TRRH and TRDSM steps in combination make up a<br />
successful scheme with a high convergence rate without compromising the control of the density<br />
changes in each step.<br />
Compared to ref. 38 and 39 , an alternative level shift scheme (d orth -shift) for the TRRH step is<br />
presented which does not control the density change through the overlap of the individual orbitals,<br />
but instead controls the amount of new information added to the density subspace. Thus the d orth -<br />
shift scheme does not contain any reference to the MO basis and can be used in connection with<br />
alternatives to diagonalization. Also, it is found that the d orth -shift scheme leads to a faster<br />
convergence since the former level shift scheme is too restrictive, ignoring the well known changes<br />
contained in the density subspace.<br />
For TRDSM, an improvement of the energy model is developed, in which a part of the term<br />
neglected in the DSM energy model compared to the SCF energy is recovered. However, the effects<br />
of the improvement are found rather small compared to the extra complexity added to the algorithm.<br />
56
Conclusion<br />
An energy minimization algorithm is presented as well, replacing the standard RH-diagonalization<br />
in the SCF optimization. The novel idea is to exploit the valuable information saved in the density<br />
subspace of the previous densities to construct an improved RH energy model (augmented<br />
Roothaan-Hall - ARH) and minimize this model instead of the RH model. This makes the TRDSM<br />
step redundant since a density subspace minimization now is included in the minimization of the<br />
RH energy model. We expect a faster convergence rate for ARH compared to TRSCF, mainly<br />
because the RH and DSM steps are merged to an energy model with correct gradient (not just in the<br />
subspace) and an approximate Hessian, which is improved in each iteration using the information<br />
from the previous density and Fock matrices. The preliminary results from the ARH energy<br />
minimization seems promising, with convergence improvements compared to TRSCF, which<br />
already had better or as good convergence rates as DIIS.<br />
The errors introduced in the TRRH and TRDSM energy models compared to the SCF energy are<br />
studied. Since the DFT and HF energy expressions differ, the errors in the energy models are<br />
potentially different for the two methods. It is found that the DSM energy model has the same error<br />
of the order ||D δ || 2 for both HF and DFT, where D δ is the idempotency correction we impose on the<br />
averaged density. For the RH energy model it is found by inspecting test cases that the errors are<br />
larger for LDA than for HF, especially when convergence is approached. The error can be divided<br />
into two sources, namely the error in the RH Hessian compared to the SCF Hessian, and the size of<br />
the third and higher order contributions from the nonlinear terms in the SCF energy, which are not<br />
included in the RH energy model. By further tests it seems that the Hessian is better described in<br />
LDA than in HF, and since the errors are larger for LDA in particular close to convergence, it seems<br />
unlikely that the third and higher order terms are causing the difference. The question why larger<br />
errors are seen for LDA than for HF is thus still unanswered and it will be further investigated.<br />
The stability of stationary points is discussed and a method to test and walk away from unstable<br />
stationary points is described, and examples are given, where it has been applied. It is<br />
acknowledged that such a method is very valuable since otherwise a minimum could not have been<br />
found for the examples given.<br />
The scaling of TRSCF is also considered. An alternative to diagonalization has been implemented<br />
in our SCF program, where instead of diagonalizing the Fock matrix, the trace purification scheme<br />
by Palser and Manolopoulos 19 and later Niklasson 48 is used. The purification scheme in combination<br />
with the d orth -shift scheme make the TRRH step near-linearly scaling. The trace purification scheme<br />
is linear scaling in an orthogonal basis, but since the optimization scheme is formulated in the nonorthogonal<br />
AO basis, the transformation to an orthogonal basis has an N 2 scaling with a small<br />
prefactor. Timings for the TRRH step with diagonalizations and with purifications are given, and it<br />
57
Part 1<br />
Improving Self-consistent Field Convergence<br />
is seen that the trace purification scheme is a major improvement compared to diagonalization when<br />
more that a couple of thousand basis functions are needed. The TRDSM step is based on matrix<br />
multiplications and additions, so by construction it will be linearly scaling when sparsity in the<br />
matrices is exploited.<br />
As illustrated in the examples throughout this part of the thesis and in the applications section,<br />
significant improvements to SCF convergence have been obtained. For both the TRSCF and ARH<br />
examples presented, the convergence is as good as or better than DIIS, and for problems where<br />
DIIS diverges, convergence is obtained with the TRSCF and ARH methods. The globally<br />
convergent trust region method by Francisco et. al. 26 is found to be better only for the simplest<br />
examples whereas for the rest, the TRSCF and ARH methods are found superior. The future success<br />
of the TRSCF method depends on a well optimized implementation of the diagonalization<br />
alternative combined with the dynamic level shift scheme, and sparsity being exploited in an<br />
efficient manner such that it can compete with the linear scaling SCF programs used today. The<br />
future success of the ARH method depends on finding efficient ways of solving the nonlinear<br />
equations corresponding to the minimization of the energy model. For this purpose different<br />
preconditioners will be tested.<br />
To conclude, there are still some adjustments that should be done to improve the algorithms, but the<br />
framework is in place. The SCF optimization algorithms presented in this thesis, each make up a<br />
black-box optimization scheme for HF and DFT as there is one scheme without any user-adjustment<br />
that lead to fast and stable convergence for both simple and problematic systems studied so far. We<br />
are thus convinced that TRSCF and ARH are build to handle the optimization problems of the<br />
future.<br />
58
Part 2<br />
Atomic Orbital Based Response Theory<br />
2.1 Introduction<br />
The first part of this thesis was concerned with the optimization of the one electron density matrix<br />
for Hartree-Fock (HF) and density-functional theory (DFT). From such an optimized density,<br />
information about excited states and how the system reacts to a perturbation (e.g. an external<br />
electric field) may be obtained using response theory. Response theory and the derivation of<br />
molecular properties will be the subject of this part of the thesis.<br />
Response theory provides a rigorous approach for calculating molecular properties. As for the SCF<br />
optimization algorithms, the theory has usually been formulated in the molecular orbital (MO) basis<br />
which is inherently delocal, making the implicated matrices non-sparse. A reformulation in the local<br />
atomic orbital (AO) basis is thus necessary to obtain linear scaling algorithms and permit<br />
calculations of properties for large systems. Such a reformulation, in which an exponential<br />
parameterization of the density matrix is employed, is given in a paper by Larsen et al. 61 .<br />
The AO formulation of the response functions has a number of advantages compared to the MO<br />
formulation, besides locality. The response equations and molecular property expressions are<br />
simpler in the AO basis as the involved matrices (e.g. the Fock and property matrices) enter the<br />
equations in the basis they are evaluated in originally. No transformation between bases is necessary<br />
in the AO formulation as it is in the MO formulation. The AO formulation is particular convenient<br />
for perturbation dependent basis sets. In the MO formulation a set of perturbation dependent<br />
orthonormal molecular orbitals must be introduced. These orbitals have no physical content and<br />
thus add artificial complexity to the problem. To exemplify the benefits of the AO formulation, the<br />
expression for the excited state geometrical gradient is derived in Section 2.4.<br />
59
Part 2<br />
Atomic Orbital Based Response Theory<br />
In the conventional MO formulation, number operators are redundant and can be eliminated.<br />
However, in the AO basis the number operators are not redundant and must be included. Because of<br />
this, the proof of pairing in the solutions of the response equations cannot be directly taken from the<br />
MO basis to the AO basis. It is thus necessary to study the impact of the included number operators<br />
on the solver for the AO response equations. This has been done in Section 2.2, using the method of<br />
second quantization to formulate the AO based response equations. Implementation issues<br />
connected to solving the AO response equations are discussed in Section 2.3. In Section 2.5 a<br />
couple of simple examples are given, where the AO response solver is used to find ground and<br />
excited state properties. In Section 2.6 the results of this part of the thesis are summarized.<br />
2.2 AO Based Response Equations in Second Quantization<br />
In this section the linear response equations are derived for Hartree-Fock theory, but with minor<br />
technical changes they apply to DFT as well. The quadratic and higher response equations could<br />
equally well be derived in this formulation; however, this is not necessary to arrive at the basic<br />
conclusions.<br />
2.2.1 The Parameterization<br />
Consider a set of atomic orbitals (χ µ ) with the real and symmetric metric S. The creation and<br />
annihilation operators for the atomic orbitals fulfil the anticommutation relation<br />
†<br />
µ , ν + νµ<br />
⎣⎡a a ⎦ ⎤ = S . (2.1)<br />
We will consider the following exponential operator<br />
Tˆ<br />
= exp ( iκˆ<br />
), (2.2)<br />
where ˆκ is a Hermitian one-electron operator<br />
To examine the action of<br />
ˆ κ = ∑ κ<br />
(2.3)<br />
µν<br />
†<br />
µν aµ aν<br />
†<br />
κ = κ .<br />
(2.4)<br />
exp( iκ ˆ)<br />
, we consider the transformed creation operators<br />
a = exp( iˆ) a exp( −iˆ<br />
κ)<br />
. (2.5)<br />
† †<br />
µ κ µ<br />
It is seen that the transformed operators satisfy the same anticommutation relations as the<br />
untransformed operators<br />
⎡⎣a<br />
a<br />
⎤⎦ ⎡⎣ iˆ a iˆ iˆ a iˆ<br />
⎤⎦<br />
† †<br />
µ , ν = exp( κ) µ exp( − κ),exp( κ) ν exp( − κ)<br />
+ +<br />
= exp( iˆ<br />
κ) ⎡⎣a , a exp( − iˆ) = S .<br />
†<br />
µ ν ⎤⎦<br />
κ<br />
+<br />
νµ<br />
(2.6)<br />
60
AO Based Response Equations in Second Quantization<br />
The exponential operators of Eq. (2.2) are therefore the manifold of operators that conserves the<br />
general metric S. In the special case where S = 1, the exponential operator reduces to the standard<br />
exponential operator occurring in the second quantization formalism of the molecular orbital based<br />
method. 46<br />
Using the Baker-Champbell-Hausdorff expansion 46 and the anticommutation relation of Eq. (2.1),<br />
we get<br />
a<br />
a i ˆ a ˆ ˆ a<br />
† † † 1<br />
†<br />
µ = µ + ⎡⎣κ, µ ⎤⎦− ⎡ , ,<br />
2 ⎣κ ⎡⎣κ<br />
µ ⎤⎤ ⎦⎦ + <br />
2<br />
µ ∑ νµ ν 2 ∑ νµ ν<br />
ν<br />
ν<br />
† † 1<br />
†<br />
= a + i ( κS ) a − ( κS ) a + . (2.7)<br />
=<br />
∑<br />
ν<br />
†<br />
exp ( iκS<br />
) a .<br />
νµ<br />
ν<br />
To further investigate the properties of the above exponential transformation, we next consider the<br />
transformation of a single determinant state 0 with exp( iκ ˆ)<br />
0 = exp( iκˆ<br />
) 0 . (2.8)<br />
The properties of 0 may be obtained by comparing the expectation values of transformed<br />
creation-annihilation operators<br />
∆ = 0<br />
a a 0<br />
= 0 exp( −iˆ κ) a exp( iˆ κ) exp( −iˆ κ) a exp( iˆ<br />
κ) 0<br />
(2.9)<br />
† †<br />
µν µ ν µ ν<br />
with the expectation values of the untransformed operators<br />
†<br />
µν aµ aν<br />
∆ = 0 0 . (2.10)<br />
To rewrite Eq. (2.9) in terms of Eq. (2.10) we use Eq. (2.7) to write the transformed creation- and<br />
annihilation-operators in terms of the untransformed operators<br />
∑<br />
∑<br />
exp( − iˆ<br />
κ) a exp( iˆ) = exp( −i ) a<br />
† †<br />
µ κ<br />
κS ρµ ρ<br />
ρ<br />
exp( − iˆ<br />
κ) a exp( iˆ<br />
κ) = exp( iSκ) a .<br />
ν νρ ρ<br />
ρ<br />
T<br />
T<br />
( i ) exp ( i )<br />
(2.11)<br />
Substituting these expressions into Eq. (2.9) gives<br />
∆ = exp - Sκ ∆ κ S . (2.12)<br />
In Appendix B, it is shown that if 0 is a single determinant wave function, then ∆ fulfils Eqs.<br />
(B-7), corresponding to the symmetry, trace, and idempotency condition for the one-electron<br />
density. We will now show that if ∆ fulfils these equations then so does ∆ . The Hermiticity of ∆<br />
follows from the Hermiticity of S and κ and will not be shown explicitly here. The trace relation is<br />
shown as follows<br />
61
Part 2<br />
Atomic Orbital Based Response Theory<br />
Tr ∆S = Tr ∆exp( iκ SS ) exp( −iSκ ) SS<br />
−1 T −1 T −1<br />
−1<br />
T T −1<br />
= Tr ∆exp( iκ S) exp( −iκ SS )<br />
= Tr ∆S ,<br />
(2.13)<br />
where we have used the relation<br />
−1 −1<br />
B exp( A) B = exp( B AB ) . (2.14)<br />
The same relation may be used to show the idempotency relation<br />
−<br />
( i ) ( i ) ( i ) ( i )<br />
T T T −1 T<br />
( iSκ ) ∆ ( iκ S ) ( iκ S) S ∆ ( iκ S )<br />
T −1 T<br />
( iSκ ) ∆S ∆exp<br />
( iκ S )<br />
T<br />
T<br />
( iSκ ) ∆ ( iκ S ) ∆<br />
−1 T T 1 T T<br />
∆S ∆<br />
= exp − Sκ ∆exp κ S S exp − Sκ ∆exp<br />
κ S<br />
= exp − exp exp − exp<br />
= exp −<br />
= exp − exp = .<br />
(2.15)<br />
We can therefore conclude that ∆ fulfils Eqs. (B-7) and exp( iκ ˆ) 0 is therefore a legitimate<br />
normalized single-determinant wave function. It can be shown that all matrices fulfilling Eqs. (B-7)<br />
can be obtained from an appropriate choice of κ, so the transformation of Eq. (2.8) is a complete<br />
parameterization.<br />
2.2.2 The Linear Response Function<br />
We will now use the parameterization of Eq. (2.8) for an arbitrary single-determinant wave function<br />
to describe a Hartree-Fock wave function in an external, time-dependent field. The parameters in κ<br />
will become time-dependent and we will in the following develop equations for obtaining these<br />
parameters. The time-dependent Hamiltonian can be written as<br />
H = H0 + Vt<br />
, (2.16)<br />
where H 0 is the Hamiltonian for the unperturbed system, and V t is a first-order perturbation. The<br />
perturbation will be turned on adiabatically, and V t can be expressed as<br />
∞<br />
−∞<br />
Vt<br />
= ∫ dωVω<br />
exp( ( − iω + ε ) t)<br />
, (2.17)<br />
where ε is a positive infinitesimal that ensures V t → 0 as t → -∞. The perturbation is required to be<br />
Hermitian, so we have the relation<br />
†<br />
ω<br />
V<br />
= V . (2.18)<br />
−ω<br />
To determine the linear response function, we begin by considering the time dependence of the<br />
expectation value 0<br />
A 0 of a one-electron operator A. We need only expand the wave function<br />
0 of Eq. (2.8) to first order in the external perturbation to obtain the linear response:<br />
(1) (2)<br />
t t<br />
ˆ κ = ˆ κ + ˆ κ +. (2.19)<br />
62
AO Based Response Equations in Second Quantization<br />
(0)<br />
ˆt κ<br />
The zero-order contribution, , vanishes as the unperturbed wave function 0 is assumed to be<br />
optimized for the zero-order Hamiltonian, so the Brillouin-conditions in the AO basis hold<br />
∂<br />
∂<br />
κ µν<br />
†<br />
µ ν<br />
0 H0 0 = i 0 ⎡⎣H0, a a ⎤⎦<br />
0 = 0. (2.20)<br />
Substitution of the expansion of ˆκ into Eq. (2.8) gives to first order:<br />
(1)<br />
0 A 0 = 0 A 0 −i 0 ⎡ ˆ κt<br />
, A⎤<br />
⎣ ⎦<br />
0 . (2.21)<br />
Since the response functions are defined in the frequency rather than the time domain, we formulate<br />
the wave function corrections in the frequency space. By analogy with Eq. (2.17), we write<br />
∞<br />
−∞<br />
Inserting Eq. (2.22) into Eq. (2.21) we obtain<br />
(1) (1)<br />
κt = ∫ dωκω<br />
exp( ( − iω + ε ) t)<br />
. (2.22)<br />
∞<br />
(1)<br />
0 A 0 = 0 A 0 −i dω 0 ⎡ ˆ κω<br />
, A⎤<br />
⎣ ⎦<br />
0 exp (( − iω + ε)<br />
t)<br />
. (2.23)<br />
∫<br />
-∞<br />
Comparing Eq. (2.23) with the formal expansion of an expectation value in terms of a response<br />
function<br />
∞<br />
-∞<br />
0 A 0 = 0 A 0 + d ω A; V exp (( − iω + ε)<br />
t)<br />
, (2.24)<br />
we may identify the linear response function as<br />
∫<br />
ω<br />
ω<br />
(1)<br />
ω<br />
AV ; ω 0 ˆ<br />
ω<br />
=−i ⎡κ<br />
, A⎤<br />
⎣ ⎦<br />
0<br />
. (2.25)<br />
2.2.3 The Time Development of the Reference State<br />
Before the explicit time-dependent equations are set up for determining the time-dependent<br />
parameters of κ, it is convenient to rewrite ˆκ , Eq. (2.3), as<br />
† † †<br />
∑( µν µ ν ∗<br />
µν ν µ ) ∑ µµ µ µ , (2.26)<br />
ˆ κ = κ a a + κ a a + κ a a<br />
µ > ν µ<br />
which follows from the Hermiticity of ˆκ . The operators of ˆκ may be collected in a vector (here in<br />
row form):<br />
where the three classes of operators are defined as<br />
† †<br />
( )<br />
Λ = Q D Q , (2.27)<br />
Q<br />
D<br />
Q<br />
† †<br />
m aµ aν<br />
= , µ > ν<br />
† †<br />
m = aµ aµ<br />
m<br />
†<br />
ν µ<br />
= a a , µ > ν .<br />
(2.28)<br />
63
Part 2<br />
Atomic Orbital Based Response Theory<br />
The parameters of κ may similarly be arranged in a vector<br />
such that<br />
⎛ ⎞ ><br />
() i<br />
κ µν µ ν<br />
⎜ ⎟<br />
() i () i<br />
= ⎜ κµµ<br />
⎟<br />
⎜ () i<br />
κ µ ν ,<br />
µν<br />
∗ ⎟ ><br />
α (2.29)<br />
⎝<br />
⎠<br />
ˆ() i ()<br />
κ = ∑ αm<br />
i Λm<br />
. (2.30)<br />
m<br />
Here the index m on Λ runs over all three classes of operators listed in Eq. (2.28).<br />
The single excitation operators a †<br />
µ aν have by Eq. (2.27)-(2.28) been divided into a set of atomic<br />
orbital excitations, corresponding to µ > ν and a set of atomic orbital deexcitations, corresponding to<br />
µ < ν. As the atomic orbital excitations and deexcitation have the same formal properties, this<br />
division does not have any physical content. However, the division will prove important when the<br />
paired structure of the response equations is investigated in Section 2.2.5. Note that it is not possible<br />
to exclude the number operators a †<br />
µ aµ in the atomic orbital representation, whereas they are<br />
redundant in the standard molecular orbital formulation.<br />
In the presence of the time-dependent perturbation, we introduce the time transformed operator<br />
basis<br />
⎛ Q<br />
⎞<br />
† ⎜ ⎟<br />
Λ<br />
= ⎜ D<br />
⎟ , (2.31)<br />
⎜ † ⎟<br />
⎝Q<br />
<br />
⎠<br />
where<br />
and similarly for<br />
†<br />
Q m and D m .<br />
Q = exp( iˆ<br />
κ) Q exp( −iˆ<br />
κ)<br />
(2.32)<br />
m<br />
The time evolution of 0 may now be determined using Ehrenfest’s theorem for the transformed<br />
†<br />
operators of Λ in Eq. (2.31):<br />
d † ∂<br />
0 0 0<br />
† 0 0<br />
†<br />
Λ −<br />
⎛<br />
Λ<br />
⎞<br />
= − ⎡ Λ , 0 + ⎤ 0<br />
dt<br />
2.2.4 The First-order Equation<br />
m<br />
<br />
⎜<br />
i H V<br />
∂t<br />
⎟<br />
<br />
⎣ t<br />
<br />
⎝ ⎠<br />
⎦ . (2.33)<br />
We now expand Eq. (2.33) in orders of the external perturbation, restricting ourselves to terms that<br />
are linear in the amplitudes. Inserting Eq. (2.19) into Eq. (2.33) and collecting the terms linear in the<br />
perturbation, we obtain the first-order time-dependent equation<br />
64
AO Based Response Equations in Second Quantization<br />
† (1) † † (1)<br />
κt<br />
=− ⎡ t ⎤ +<br />
0 ˆ κt<br />
i 0 ⎡ , ⎤ 0 i 0 , V 0 0 ⎡ , ⎡H<br />
, ⎤⎤<br />
⎣ Λ <br />
⎦ ⎣ Λ ⎦ ⎣ Λ ⎣ ⎦⎦<br />
0 . (2.34)<br />
To solve the time-dependent equation Eq. (2.34), we insert the frequency expansion of the wave<br />
function correction of Eq. (2.22) and of the external perturbation Eq. (2.17)<br />
∞<br />
−∞<br />
∞<br />
∫−∞<br />
∫<br />
(1) (1)<br />
( − i + t)( ⎡Λ<br />
† ˆω<br />
⎤ − ⎡Λ<br />
† ⎡H0<br />
ˆω<br />
⎤⎤ )<br />
dωexp ( ω ε) ω 0<br />
⎣<br />
, κ<br />
⎦<br />
0 0<br />
⎣<br />
,<br />
⎣<br />
, κ<br />
⎦⎦<br />
0<br />
†<br />
( i t)( i ⎡Λ<br />
Vω<br />
⎤ )<br />
= dωexp ( − ω + ε) − 0 ⎣ , ⎦ 0 .<br />
The first-order response equation is then found as<br />
† (1) † (1)<br />
ˆ<br />
†<br />
ω H0<br />
ˆω i Vω<br />
(2.35)<br />
ω 0 ⎡ , κ ⎤ 0 0 ⎡ , ⎡ , κ ⎤⎤<br />
⎣<br />
Λ<br />
⎦<br />
−<br />
⎣<br />
Λ<br />
⎣ ⎦⎦<br />
0 = − 0 ⎡⎣ Λ , ⎤⎦<br />
0 . (2.36)<br />
The equation may be written in terms of the matrices<br />
and the vector<br />
E<br />
= 0 ⎡⎣ Λ ,[ H0<br />
, Λ ] ⎤⎦ 0 , (2.37)<br />
[2] †<br />
Smn = 0 ⎡⎣ Λm , Λn<br />
⎤⎦ 0 , (2.38)<br />
[2] †<br />
mn m n<br />
[1] †<br />
ω = Λ<br />
m<br />
m<br />
⎡⎣V<br />
⎤⎦ 0 ⎡⎣ , Vω<br />
⎤⎦ 0 . (2.39)<br />
Using Eqs. (2.37)-(2.39) and (2.29)-(2.30), we now write the first-order response equations, Eq.<br />
(2.36), in the form<br />
( ω )<br />
[2] − [2] (1) = i [1]<br />
ω<br />
E S α V , (2.40)<br />
where E [2] and S [2] may be viewed as generalized electronic Hessian and overlap matrices 61,62 . The<br />
[2] [2]<br />
matrix elements E mn and S mn (Eq. (2.37) and (2.38)) can be expressed as matrix multiplications<br />
and additions of the density, Fock and overlap matrices. 61<br />
The linear response function is obtained by inserting the first-order correction as obtained in Eq.<br />
(2.40) in the expression for the linear response function Eq. (2.25). Renaming the perturbation<br />
operator V ω to B and introducing<br />
we obtain<br />
A<br />
B<br />
[1]<br />
m =− ⎡ ⎣ Λm<br />
[1] †<br />
m = ⎡Λm<br />
0 , A⎤<br />
⎦ 0<br />
(2.41)<br />
0 ⎣ , B⎤<br />
⎦ 0<br />
−<br />
( ) 1<br />
[1] [2] [2] [1]<br />
AB ; ω<br />
=−A E −ωS B . (2.42)<br />
The linear response function may thus be calculated by solving one set of linear equations at each<br />
frequency. To be more explicit, denoting the solution vector to the linear response equation<br />
B ω<br />
−<br />
( ω ) 1<br />
[2] [2] [1]<br />
N ( ) = E − S B , (2.43)<br />
65
Part 2<br />
Atomic Orbital Based Response Theory<br />
the linear response function in Eq. (2.42) can be obtained as<br />
[1]<br />
B<br />
AB ; ω<br />
=−A N ( ω)<br />
. (2.44)<br />
2.2.5 Pairing<br />
The excitation energies are identified as the poles of the linear response function of Eq. (2.42) and<br />
are therefore solutions to the generalized eigenvalue problem<br />
[2] [2]<br />
E X = ωS X. (2.45)<br />
In the MO formulation of response theory, it has been shown that the excitation energies are<br />
paired 63 , so that if ω i is an eigenvalue for Eq. (2.45) then so is -ω i . It is important to understand how<br />
pairing appears in the AO basis, in particular since this structural feature is exploited when the<br />
equations are solved iteratively as is necessary for large problems. This is further discussed in<br />
Section 2.3. Since the proof of the pairing given in the MO formulation cannot be directly<br />
transferred to the AO formulation due to the presence of the diagonal operators D m , this section<br />
gives the proof in the AO formulation.<br />
The structure of E [2] and S [2] in the AO formulation is analyzed for the purpose of examining the<br />
pairing structure. Dividing Λ into the tree classes of Eq. (2.28), the matrix E [2] may be written as<br />
†<br />
⎛ 0 ⎡⎣Q, ⎡⎣H0, Q ⎤⎤ ⎦⎦ 0 0 [ Q, [ H0, D]<br />
] 0 0 [ Q, [ H0, Q]<br />
] 0 ⎞<br />
[2]<br />
⎜<br />
⎟<br />
†<br />
E = ⎜ 0 ⎣⎡D, ⎣⎡H0, Q ⎦⎦ ⎤⎤ 0 0 [ D, [ H0, D]<br />
] 0 0 [ D, [ H0, Q]<br />
] 0 ⎟. (2.46)<br />
⎜ † † † †<br />
⎟<br />
⎝ 0 ⎣⎡Q , ⎣⎡H0, Q ⎦⎦ ⎤⎤ 0 0 ⎣⎡Q ,[ H0, D] ⎦⎤ 0 0 ⎣⎡Q ,[ H0, Q]<br />
⎦⎤<br />
0 ⎠<br />
If we assume for simplicity that all orbitals and integrals for the unperturbed system are real, the<br />
†<br />
elements of for example the block 0 ⎡⎣Q ,[ H0<br />
, Q ] ⎤⎦<br />
0 are trivially rewritten as<br />
† †<br />
∗<br />
0 ⎡⎣Qm, [ H0, Qn ] ⎤⎦ 0 = 0 ⎡⎣Qm, [ H0, Qn<br />
] ⎤⎦<br />
0<br />
(2.47)<br />
†<br />
= 0 ⎡⎣Qm, ⎡⎣H0<br />
, Qn<br />
⎤⎤ ⎦⎦ 0 .<br />
The nine blocks in Eq. (2.46) can then all be written in terms of the following four matrices<br />
and we obtain<br />
†<br />
mn m 0 n<br />
A = 0 ⎡⎣Q , ⎡⎣H , Q ⎤⎤ ⎦⎦ 0 ,<br />
Bmn = 0 ⎡⎣Qm , ⎡⎣H0<br />
, Qn<br />
⎤⎤ ⎦⎦ 0 ,<br />
(2.48)<br />
Fmn = 0 ⎡⎣Qm , ⎡⎣H0<br />
, Dn<br />
⎤⎤ ⎦⎦ 0 ,<br />
Gmn = 0 ⎡⎣Dm , ⎡⎣H0<br />
, Dn<br />
⎤⎤ ⎦⎦ 0 ,<br />
⎛ A F B ⎞<br />
[2] ⎜ T T<br />
E = F G F<br />
⎟<br />
. (2.49)<br />
⎜<br />
⎟<br />
⎝ B F A ⎠<br />
66
AO Based Response Equations in Second Quantization<br />
The matrix S [2] may in a similar way be written as<br />
⎛ Σ Ω ∆ ⎞<br />
[2] T T<br />
S =<br />
⎜<br />
Ω 0 -Ω<br />
⎟<br />
⎜<br />
- - -<br />
⎟<br />
⎝ ∆ Ω Σ ⎠<br />
, (2.50)<br />
where<br />
†<br />
mn ⎡Qm Qn<br />
Σ = 0 ⎣ , ⎤⎦<br />
0 ,<br />
∆ mn = 0 ⎡⎣Qm , Qn<br />
⎤⎦<br />
0 ,<br />
Ω = 0 [ Q , D ] 0 .<br />
mn m n<br />
(2.51)<br />
Note that the block containing two diagonal operators vanishes as<br />
† † † †<br />
[ Dm<br />
Dn<br />
] = ⎡⎣aµ aµ aνaν ⎤⎦ = Sµν aµ aν − Sνµ aν aµ<br />
= . (2.52)<br />
0 , 0 0 , 0 0 0 0 0 0<br />
To illustrate how the pairing is obtained in the AO formulation, we assume that the vector<br />
⎛ Z ⎞<br />
X =<br />
⎜<br />
U<br />
⎟<br />
⎜ ⎟<br />
⎝Y<br />
⎠<br />
(2.53)<br />
is an eigenvector for Eq. (2.45) with eigenvalue ω<br />
⎛ A F B ⎞⎛ Z⎞ ⎛ Σ Ω ∆ ⎞⎛ Z ⎞<br />
⎜ T T ⎟⎜ ⎟ T T<br />
F G F U = ω<br />
⎜<br />
Ω 0 -Ω ⎟⎜<br />
U<br />
⎟<br />
. (2.54)<br />
⎟⎜ ⎟⎜ ⎜ ⎟⎜ ⎟ ⎜<br />
- - -<br />
⎟⎜ ⎟<br />
⎝ B F A ⎠⎝Y⎠ ⎝ ∆ Ω Σ ⎠⎝Y<br />
⎠<br />
Multiplying the blocks of Eq. (2.54) gives three sets of equations<br />
AZ + FU + BY = ω ( ΣZ + ΩU + ∆Y )<br />
( )<br />
T T T T<br />
F Z+ GU+ F Y = ω Ω Z −Ω Y<br />
BZ + FU + AY = ω ( −∆Z −ΩU −ΣY<br />
).<br />
(2.55)<br />
We will now prove that the paired vector<br />
X<br />
P<br />
⎛Y<br />
⎞<br />
=<br />
⎜<br />
U<br />
⎟<br />
⎜ ⎟<br />
⎝ Z ⎠<br />
(2.56)<br />
is an eigenvector for Eq. (2.45) with eigenvalue –ω<br />
⎛ A F B ⎞⎛Y⎞ ⎛ Σ Ω ∆ ⎞⎛Y<br />
⎞<br />
⎜ T T ⎟⎜ ⎟ T T<br />
F G F U =−ω<br />
⎜<br />
Ω 0 -Ω ⎟⎜<br />
U<br />
⎟<br />
. (2.57)<br />
⎟⎜ ⎟⎜ ⎜ ⎟⎜ ⎟ ⎜<br />
- - -<br />
⎟⎜ ⎟<br />
⎝ B F A ⎠⎝ Z⎠ ⎝ ∆ Ω Σ ⎠⎝ Z ⎠<br />
Multiplying the blocks of Eq. (2.57) leads to the three sets of equations<br />
67
Part 2<br />
Atomic Orbital Based Response Theory<br />
AY + FU + BZ = − ω ( ΣY + ΩU + ∆Z )<br />
( )<br />
T T T T<br />
F Y+ GU+ F Z = −ω<br />
Ω Y −Ω Z<br />
BY + FU + AZ = −ω<br />
( −∆Y −ΩU −ΣZ<br />
),<br />
(2.58)<br />
which are identical to Eqs. (2.55). It is thus concluded that if X is an eigenvector of Eq. (2.45) with<br />
eigenvalue ω, then X P is also an eigenvector with eigenvalue –ω.<br />
2.3 Solving the Response Equations<br />
For large systems, the response equations<br />
( ω )<br />
[2] [2] [1]<br />
E − S N B ( ω ) = B (2.59)<br />
are best solved using iterative algorithms. These algorithms rely on the ability to set up linear<br />
transformations. Expressions for E [2] b and S [2] b, where b is a trial vector, have previously been<br />
derived. 61 [2]<br />
σ = E b (2.60)<br />
[2]<br />
ρ = S b. (2.61)<br />
In each iteration, the response equations are set up and solved in a reduced space. For a reduced<br />
space consisting of k trial vectors, the equations can be written as<br />
where the reduced matrices are found as<br />
( ω )<br />
[2] [2] RED [1]<br />
RED<br />
−<br />
RED<br />
=<br />
RED<br />
E S X B , (2.62)<br />
[2] T [2] T<br />
RED ⎦ i j i j<br />
ij<br />
⎡<br />
⎣<br />
E ⎤ = b E b = b σ<br />
[2] T [2] T<br />
RED ⎦ i j i j<br />
ij<br />
⎡<br />
⎣<br />
S ⎤ = b S b = b ρ<br />
[1] T [1]<br />
RED ⎦<br />
bi<br />
B .<br />
i<br />
⎡<br />
⎣<br />
B ⎤ =<br />
(2.63)<br />
Normally when this type of iterative procedure is used, the reduced space is extended with one new<br />
trial vector in each iteration. However, due to the pairing described in the previous section, the<br />
linear transformations of E [2] and S [2] on a trial vector, here exemplified by E [2] b,<br />
⎛ A F B ⎞⎛ Z⎞ ⎛ AZ+ FU+<br />
BY ⎞<br />
[2] ⎜ T T ⎟⎜ ⎟ ⎜ T T<br />
E b = F G F U = F Z+ GU+ F Y<br />
⎟<br />
= σ , (2.64)<br />
⎟⎜ ⎜ ⎟⎜ ⎟ ⎜<br />
+ +<br />
⎟<br />
⎝ B F A ⎠⎝Y⎠ ⎝ BZ FU AY ⎠<br />
may be obtained directly for the paired trial vector as well<br />
⎛ A F B ⎞⎛Y⎞ ⎛ AY+ FU+<br />
BZ ⎞<br />
[2] P ⎜ T T ⎟⎜ ⎟ ⎜ T T ⎟ P<br />
E b = F G F U = F Y+ GU+ F Z = σ . (2.65)<br />
⎟⎜ ⎜ ⎟⎜ ⎟ ⎜<br />
+ +<br />
⎟<br />
⎝ B F A ⎠⎝ Z⎠ ⎝ BY FU AZ ⎠<br />
68
Solving the Response Equations<br />
The reduced space is therefore extended with both vectors without additional cost. Furthermore,<br />
when a trial vector and its paired counterpart are simultaneously added to the reduced space, the<br />
paired structure of the response equations is preserved. With this structure preserved, the<br />
eigenvalues in the reduced space will also be real and paired, and the lowest eigenvalue will<br />
monotonically decrease towards the converged value as the reduced space is increased. 64<br />
The solution vector in the reduced space X RED , can be expanded in the basis of trial vectors to<br />
express the solution vector in the full space<br />
k<br />
B<br />
. (2.66)<br />
N<br />
= ∑<br />
i=<br />
1<br />
RED<br />
( X i bi<br />
)<br />
The residual can then be found as<br />
k<br />
( ω )<br />
R = E − S N<br />
−B<br />
k<br />
∑<br />
[2] [2] B [1]<br />
= X ( σ −ωρ ) −B<br />
i=<br />
1<br />
RED [1]<br />
i i i<br />
.<br />
(2.67)<br />
If the norm of the residual is smaller than some specified tolerance, the iterative procedure is ended<br />
and the converged solution vector has been found<br />
B<br />
B<br />
N ( ω ) = N . (2.68)<br />
If the residual is too large, a new trial vector may be generated from the residual, preferably with a<br />
preconditioner A to speed up the convergence<br />
k+ 1 =<br />
−1<br />
b A R . (2.69)<br />
The reduced space is then extended with b k+1 and bk<br />
+ 2 = b<br />
k + 1<br />
and Eq. (2.62) is set up and solved<br />
again, establishing the iterative procedure.<br />
2.3.1 Preconditioning<br />
As mentioned above, the residual found in each iteration should be preconditioned to obtain an<br />
effective solver. As a consequence of the strict AO formulation, the electronic Hessian has no<br />
diagonal dominance as was the case in the MO basis. This makes preconditioning a challenge. So<br />
far, this problem has not been solved in our SCF response solver. Instead, a transformation is made<br />
to the MO basis, where the preconditioning is carried out in the usual way using the orbital<br />
eigenvalue differences,<br />
k<br />
P<br />
MO<br />
T<br />
⎣⎡b + 1 ⎦⎤ = ⎣⎡C RkC ⎦⎤<br />
( εa −εi<br />
), (2.70)<br />
k ai ai<br />
69
Part 2<br />
Atomic Orbital Based Response Theory<br />
where C is the MO expansion coefficients and ε the orbital energies of the reference state. The<br />
index a refers to virtual orbitals and i refers to occupied orbitals. The resulting vector is then back<br />
transformed to the AO basis<br />
MO<br />
k + 1 =<br />
k + 1<br />
T<br />
b Cb C . (2.71)<br />
An AO alternative to this preconditioner should of course be found, since the reference to the MO<br />
basis in this preconditioner introduces dense matrix intermediates. Moreover, at least one<br />
diagonalization should be carried out at the end of the optimization of the reference state to obtain<br />
the information on the MOs.<br />
2.3.2 Projections<br />
In the MO basis, the orbital rotations within the occupied and virtual spaces are redundant. The<br />
response equations in the MO formulation are thus simply set up in the non-redundant occupiedvirtual<br />
space to avoid linear dependencies. In the AO basis no such separation exists and the<br />
equations are set up in the full space. To avoid redundancies in the AO formulation, projections<br />
onto the non-redundant space should be made. In the exponential parameterization of the density<br />
matrix used in our AO formulation of the response functions, the projector 23<br />
where<br />
P = P⊗ Q+ Q⊗P<br />
T T<br />
( X) = ∑ µν ρσ X ρσ = ( PXQ + QXP )<br />
P P (2.72)<br />
µν , ,<br />
µν<br />
ρσ<br />
P = DS<br />
Q = 1−DS,<br />
(2.73)<br />
projects onto the non-redundant parameter space. It can be shown that all new trial vectors b and<br />
linear transformations σ and ρ should be projected onto the non-redundant space in the following<br />
manner<br />
b<br />
σ<br />
ρ<br />
= P b<br />
k+ 1 k+<br />
1<br />
T<br />
k+ 1=<br />
P σk+<br />
1<br />
T<br />
k+ 1=<br />
P ρk+<br />
1<br />
,<br />
,<br />
.<br />
(2.74)<br />
When solving the response equations as described in the beginning of this section, the vectors<br />
projected as in Eq. (2.74) are used.<br />
70
The Excited State Gradient<br />
2.4 The Excited State Gradient<br />
In this section the expression for the geometrical gradient of the singlet excited state is derived, to<br />
illustrate how expressions for properties can straightforwardly be derived in the AO response<br />
framework.<br />
As for the derivations in Section 2.2 we assume that the wave function of the ground state is<br />
optimized at the point of the potential surface, x 0 , where the excited state gradient is evaluated. The<br />
variational condition is thus fulfilled at that point<br />
FDS − SDF = 0, (2.75)<br />
and the ground-state energy at x 0 is further obtained as<br />
E<br />
0<br />
= 2TrhD + TrDG ( D ) + h , (2.76)<br />
nuc<br />
where h is the one-electron Hamiltonian matrix in the AO basis, h nuc is the nuclear-nuclear<br />
repulsion, G holds the two-electron AO integrals and the Fock matrix F is given by h + G(D).<br />
As mentioned previously, the excitation energy corresponding to the excitation from the ground<br />
state 0 to the excited state f can be found from the poles of the linear response function for the<br />
optimized ground state, 62 i.e. as the eigenvalue of the linear response generalized eigenvalue<br />
equation as Eq. (2.45)<br />
where ω f is the electronic excitation energy<br />
and b f is the normalized eigenvector. 61,62<br />
( ω f )<br />
[2] [2] f<br />
0<br />
The excitation energy can then be obtained from Eq. (2.77) as<br />
E − S b = , (2.77)<br />
f<br />
0<br />
ω f = E − E<br />
(2.78)<br />
f<br />
f † [2]<br />
assuming that the eigenvectors b f satisfy the normalization condition<br />
f<br />
ω = b E b , (2.79)<br />
f † [2] f<br />
b S b = 1. (2.80)<br />
Since we are interested in the molecular gradient for the excited state, f , the energy of the excited<br />
state should be defined at arbitrary points on the potential surface.<br />
2.4.1 Construction of the Lagrangian<br />
The analytic expression for the excited state gradient is found using the Lagrangian technique 65 . We<br />
construct the Lagrangian for the excited state energy E f = E 0 + ω f , using a matrix-vector notation,<br />
( 1) ( )<br />
f 0 f † [2] f f † [2] f<br />
†<br />
L = E + b E b −ω<br />
b S b − −X FDS−SDF . (2.81)<br />
71
Part 2<br />
Atomic Orbital Based Response Theory<br />
The variational condition on the ground state, Eq. (2.75), and the orthonormality constraint<br />
condition on the eigenvectors, Eq. (2.80), are included, and they are multiplied by the Lagrange<br />
multipliers ω and X , respectively.<br />
We then require the Lagrangian to be variational in all parameters<br />
∂L f<br />
= SDF − FDS = 0<br />
(2.82)<br />
∂X<br />
f<br />
∂L<br />
f † [2] f<br />
= b S b − 1=<br />
0<br />
(2.83)<br />
∂ω<br />
f<br />
∂L<br />
[2] f [2] f<br />
= E b − ωS b = 0<br />
(2.84)<br />
f †<br />
∂b<br />
f<br />
∂L<br />
f † [2] f † [2]<br />
= b E − ωb S = 0<br />
(2.85)<br />
f<br />
∂b<br />
f 0 f † [2] f f † [2] f<br />
∂L<br />
∂E<br />
∂b E b ∂b S b ∂( FDS −SDF<br />
)<br />
n<br />
= + −ω<br />
− X n<br />
= 0<br />
∂X ∂X ∂X ∂X ∑<br />
, (2.86)<br />
∂X<br />
m m m m n<br />
m<br />
where X m are the orbital rotation parameters. Due to the 2n + 1 rule, and since the gradient is a firstorder<br />
property, we only need to solve the above equations through zero order. Eqs. (2.82)-(2.85) are<br />
thus already taken care of, and it is seen that the multiplier ω is determined as the eigenvalue of the<br />
linear response equations, i.e. it corresponds to the excitation energy. It is then only necessary to<br />
determine the Lagrange multipliers X such that Eq. (2.86) is also fulfilled.<br />
2.4.2 The Lagrange Multipliers<br />
To evaluate the terms in Eq. (2.86), the asymmetric Baker-Campbell-Hausdorff (BCH) expansion 46<br />
of the exponentially parameterized density is applied<br />
DX ( ) = exp( − XSD ) exp( SX) = D+ [ DX , ] S<br />
+ , (2.87)<br />
where<br />
[ AB , ] S<br />
= ASB−BSA. (2.88)<br />
Since the derivatives are evaluated at the expansion point, only terms of first order in X are nonzero.<br />
The last term in Eq. (2.86) is found to be equal to 61<br />
[2]<br />
[ , ] [ , ] ([ , ] ) ([ , ] )<br />
E X = F X D S− S X D F+ G X D DS−SDG X D . (2.89)<br />
S S S S<br />
We can thus find X by solving the set of linear equations<br />
E<br />
[2]<br />
0 f † [2] f f † [2] f<br />
∂E<br />
∂b E b ∂b S b<br />
X = + −ω<br />
∂X ∂X ∂X<br />
From the matrix expressions for b f† E [2] b f and b f† S [2] b f 61<br />
. (2.90)<br />
72
The Excited State Gradient<br />
( )<br />
b E b F ⎡ b D b ⎤ G b D D b (2.91)<br />
f † [2] f f f † f f †<br />
=−Tr ⎣<br />
⎡⎣ , ⎤⎦ , −Tr ⎡ , ⎤ ⎡ , ⎤<br />
S ⎦S<br />
⎣ ⎦S ⎣ ⎦S<br />
f † [2] f f † f<br />
b S b = Tr b S⎡⎣D,<br />
b ⎤⎦<br />
S (2.92)<br />
and the relations for the two-electron integrals<br />
T<br />
S<br />
T<br />
( ) = ( )<br />
G A G A (2.93)<br />
Tr AG ( B ) = Tr BG ( A ) , (2.94)<br />
the terms on the right hand side of Eq. (2.90) are found as<br />
where<br />
0<br />
∂E<br />
∂X<br />
= 0 , (2.95)<br />
f † [2] f<br />
f f †<br />
A<br />
2 ω⎡<br />
, ⎤<br />
⎣<br />
SDS ⎡b b ⎤ S<br />
∂X S ⎦<br />
, (2.96)<br />
∂b S b<br />
− ω<br />
= − ⎣ ⎦<br />
f † [2]<br />
∂b E b<br />
∂X<br />
f<br />
= ADS −SDA<br />
, (2.97)<br />
( ) ( ) ( ⎡ , ⎡ , ⎤ ⎤ )<br />
f † f f f f f † f f †<br />
A = Sb Fb S−Sb F − Fb S− Sb F b S+ G<br />
⎣<br />
b ⎣D b ⎦<br />
( ⎡ ⎤ ) ( ⎡ ⎤ )<br />
+ 2 ⎡<br />
, − , ⎤<br />
⎣<br />
Sb G b D G b D b S<br />
⎦<br />
f † f f f †<br />
⎣ ⎦S<br />
⎣ ⎦S<br />
S<br />
S<br />
⎦<br />
S<br />
(2.98)<br />
and<br />
[ ] A 1 1 †<br />
M = M−<br />
M (2.99)<br />
2 2<br />
[ ] S 1 1 †<br />
M = M + M . (2.100)<br />
2 2<br />
Eq. (2.95) is straight forward since the variational condition Eq. (2.75) is fulfilled at the expansion<br />
point.<br />
2.4.3 The Geometrical Gradient<br />
The excited state geometrical gradient should be expressed in terms of the first derivatives of the<br />
one and two electron integral matrices h x , G x , S x and the density, Fock and overlap matrices at the<br />
expansion point x 0 . The notation A x denotes the geometrical first derivative of A. In ref. 66 it was<br />
found that the first derivative of the density D x (X) is given by the first derivative of the reference<br />
density matrix D x which, from the idempotency condition for D, is found to be<br />
x<br />
x<br />
D =−DS D. (2.101)<br />
The first-order geometrical derivative is given by<br />
f f 0 f † [2] f f † [2] f<br />
dE dL dE<br />
∂b E b ∂b S b ∂( FDS −SDF<br />
)<br />
= = + −ω<br />
−X . (2.102)<br />
dx dx dx ∂x ∂x ∂x<br />
73
Part 2<br />
Atomic Orbital Based Response Theory<br />
The first term is simply the geometrical gradient of the ground state. In ref. 66 this was shown to be<br />
E<br />
0 x = 2Tr x + Tr x ( ) + Tr<br />
x + hnuc<br />
x<br />
Dh DG D D F . (2.103)<br />
The other terms are found as the derivative of the matrix expressions in Eq. (2.91) and (2.92)<br />
f † [2]<br />
∂b E b<br />
∂x<br />
f † [2]<br />
f<br />
f<br />
( ( ))<br />
=− Tr F + G D ⎡ , , ⎤ −Tr ⎡ , , ⎤<br />
⎣<br />
b D b<br />
⎦<br />
F<br />
⎣<br />
b D b<br />
⎦<br />
x x f f † f x f †<br />
⎡⎣ ⎤⎦ ⎡ ⎤<br />
S S ⎣ ⎦S<br />
−Tr F⎡⎡⎣ , ⎤⎦ x<br />
, ⎤ Tr ⎡⎡ ⎣ , ⎤⎦<br />
, ⎤<br />
⎣<br />
b D b<br />
⎦<br />
F<br />
⎣<br />
b D b<br />
⎦<br />
f f † f f †<br />
−<br />
S S<br />
S<br />
†<br />
( ⎡ ⎤ ) ⎡ ⎤<br />
S<br />
S<br />
f x f † f †<br />
( ⎡ ⎤ )( ⎡ ⎤ ⎡ ⎤<br />
x )<br />
x f f<br />
− Tr G ⎣b , D⎦ ⎣D,<br />
b ⎦<br />
− 2Tr G ⎣b , D⎦ ⎣D , b ⎦ + ⎣D,<br />
b ⎦<br />
∂b S b<br />
f † x f<br />
− ω<br />
= −ωTr b S ⎡ , ⎤<br />
∂x<br />
⎣D b ⎦ S<br />
S<br />
S S S<br />
( ⎡ ⎤ ⎡ ⎤<br />
x<br />
⎡ ⎤ )<br />
f † x f f f x<br />
⎣ ⎦S ⎣ ⎦S ⎣ ⎦S<br />
− ω Tr b S D , b S+ D, b S+<br />
D,<br />
b S<br />
S<br />
x<br />
S<br />
(2.104)<br />
(2.105)<br />
∂( FDS −SDF<br />
)<br />
x x x x A<br />
− X = − 2X⎡<br />
+ ( ) + + ⎤<br />
∂x<br />
⎣F DS G D DS FD S FDS ⎦ , (2.106)<br />
where F x = h x + G x (D). Collecting the various terms we obtain<br />
f<br />
∂E<br />
∂x<br />
f f † x f † x f<br />
( D ⎡<br />
⎤<br />
⎣<br />
⎡⎣b D⎤⎦ b [ ]<br />
S ⎦<br />
D X<br />
S ) h ⎡ ⎤<br />
S<br />
( ⎡ ⎤<br />
S<br />
)<br />
S<br />
⎣D b ⎦ G ⎣b D⎦<br />
f f †<br />
x x<br />
( D ⎡⎡<br />
⎣b D⎤⎦<br />
b ⎤ [ D X]<br />
) G D hnuc<br />
= Tr 2 − , , − , −Tr , ,<br />
+ Tr −<br />
⎣<br />
, ,<br />
⎦<br />
− , ( ) +<br />
S<br />
S<br />
S<br />
x f f † x f † f †<br />
f<br />
DG( ⎡<br />
⎤<br />
⎣<br />
⎡⎣b D⎤⎦ b ) ( x<br />
S ⎦<br />
⎡<br />
S S<br />
) (<br />
S<br />
)<br />
S ⎣D b ⎤⎦ ⎡⎣Db ⎤⎦ G ⎡⎣b D⎤⎦<br />
x<br />
x<br />
DG( [ DX]<br />
) ( ⎡ ⎤ [ ] x<br />
S ⎣D X⎦<br />
DX<br />
S<br />
S<br />
) F<br />
f x<br />
( ⎡<br />
f † † †<br />
⎡⎣<br />
b D ⎤ , ⎤ ⎡<br />
f f<br />
,<br />
x<br />
, ⎤ ⎡<br />
f f<br />
, , ⎤<br />
⎣ ⎦ b<br />
S ⎦<br />
+<br />
S S<br />
x )<br />
S ⎣⎣ ⎡b D⎦⎤ b<br />
⎦<br />
+<br />
S ⎣⎣ ⎡b D⎦⎤<br />
b<br />
⎦<br />
F<br />
S<br />
f † f x f f x<br />
Tr b S( ⎡b , D ⎤ S ⎡b , D⎤ x<br />
S ⎡b , D⎤<br />
S )<br />
−Tr , , − 2Tr , + , ,<br />
−Tr , − Tr , + ,<br />
− Tr ,<br />
+ ω f ⎣ ⎦ + ⎣ ⎦ + ⎣ ⎦<br />
f † x f<br />
+ ω f Tr b S ⎡⎣b , D⎤⎦<br />
S,<br />
G b D b , ( [ , ] )<br />
where ( ⎡<br />
f<br />
f †<br />
, , ⎤<br />
⎣<br />
⎡⎣<br />
⎤⎦S<br />
⎦ )<br />
S<br />
G x x f<br />
(D), ( ⎡ , ⎤ )<br />
S<br />
S S S<br />
f<br />
G D X , ( ⎡ , ⎤ )<br />
(2.107)<br />
G S ⎣ b D ⎦ and F can be evaluated, whereas<br />
S<br />
G ⎣ b D ⎦ , h x and nuc<br />
x<br />
h have to be evaluated for each geometrical perturbation.<br />
S<br />
Note that no two-electron integrals are represented explicitly, in order to obtain the best<br />
performance – e.g. for linear scaling codes - no reference should be made to four-index integrals.<br />
2.4.4 The First-order Excited State Properties<br />
The expression for the first-order one-electron excited state properties for perturbation independent<br />
basis sets is obtained from the expression for the excited state gradient by omitting all two-electron<br />
derivative terms, as well as all terms involving the derivative of the overlap matrix<br />
74
Test Calculations<br />
( ⎡<br />
†<br />
⎡⎣<br />
⎤⎦<br />
⎤ [ ] )<br />
x 2Tr x Tr f , , f , x x<br />
= −<br />
⎣<br />
S ⎦<br />
− +<br />
S<br />
S nuc<br />
f h f Dh b D b D X h h . (2.108)<br />
The first and last terms in Eq. (2.108) correspond to the ground state first order property as seen<br />
from Eq. (2.103).<br />
2.5 Test Calculations<br />
To illustrate the possibilities of an AO response solver in connection with our SCF optimization<br />
program, test calculations have been carried out on problematic cases from the first part of the<br />
thesis. The lowest excitation energy and the average polarizability, both static and in a field with ω<br />
= 0.03a.u., have been found for the zinc complex in Fig. 1.3 and the rhodium complex in Fig. 1.33.<br />
The levels of theory chosen are those where DIIS could not optimize the reference state, namely<br />
LDA/6-31G for the zinc complex and HF/AhlrichsVDZ with STO-3G on the rhodium for the<br />
rhodium complex.<br />
Table 2-1 Ground state properties obtained with our AO response solver. All numbers are in a.u.<br />
The average polarizability Excitation<br />
static ω = 0.03 energy<br />
Rhodium complex HF/AhrichsVDZ 170.598 173.349 0.0938<br />
Zinc complex LDA/6-31G 161.406 162.517 0.0713<br />
The basis sets applied in the test calculations are not satisfactory for serious polarizability<br />
calculations, and the numbers only demonstrate the perspectives of the AO response solver in<br />
combination with the SCF optimization algorithms described in Part 1. When the solver is fully<br />
implemented in the AO basis, we will be able to obtain molecular properties for large complex<br />
molecules in a routine manner.<br />
The implementation of the excited state gradient is a work in progress. So far we have implemented<br />
calculation of first-order one-electron properties of the excited state for perturbation independent<br />
basis sets as described in Section 2.4.4. The excited state dipole moment of the Rhodium complex<br />
from above has been found as<br />
Rh<br />
Cl<br />
µ = 5.960a.u.<br />
Again it should be noted that the basis set is insufficient for this type of calculation. This is only to<br />
demonstrate that it can be done.<br />
75
Part 2<br />
Atomic Orbital Based Response Theory<br />
2.6 Conclusion<br />
The atomic orbital (AO) based response equations have been derived using the second quantization<br />
framework. In particular, the proof of pairing is considered. Since the diagonal elements in κ are not<br />
redundant in the AO basis, the proof given in the MO basis cannot be directly applied. However, it<br />
is shown that there is also pairing in the AO basis.<br />
An AO response solver has been implemented similar to the solver in the MO basis with a few<br />
exceptions. The lack of diagonal dominance in the electronic Hessian in the AO basis makes<br />
preconditioning a difficult task. Optimally, the AO solver should be implemented in a linear scaling<br />
manner with only matrix multiplications and additions, and without reference to the MO basis.<br />
However, currently a transformation is made to the MO basis where the preconditioning is carried<br />
out followed by a transformation back to the AO basis. The redundant orbital rotations, which are<br />
simply left out of the MO equations, are removed in the AO formulation using projection operators.<br />
The response equations and molecular property expressions are simpler in the AO formulation than<br />
in the MO formulation. To demonstrate how expressions for properties can easily be derived in the<br />
AO response framework, the expression for the geometrical gradient of the singlet excited state has<br />
been derived.<br />
To illustrate the possibilities of the AO optimization methods presented in Part 1, joined with the<br />
AO response solver presented in this part of the thesis, test calculations are given for cases where<br />
DIIS diverged when optimizing the reference state. The averaged polarizability and the lowest<br />
excitation energy are given as well as the excited state dipole for one of the examples.<br />
The derivation and implementation of the various molecular properties is straightforward in the AO<br />
formulation compared to the MO formulation as exemplified by the excited state geometrical<br />
gradient. Especially the derivation of higher derivatives of molecular properties is simplified, and it<br />
will thus be natural to expand our response program in this direction. However, before calculations<br />
of molecular properties of large and complex molecules can be carried out in a truly linear scaling<br />
framework, the problems related to preconditioning of the AO solver must be solved.<br />
76
Part 3<br />
Benchmarking for Radicals<br />
3.1 Introduction<br />
To corroborate the reliability of ab initio quantum chemical predictions of molecular properties, it is<br />
important to investigate and describe strengths and weaknesses of the many-electron models<br />
through systematic benchmark studies on different kinds of molecules.<br />
Regarding open-shell molecules, benchmarks have been reported comparing open- and closed-shell<br />
molecules examining the accuracy of molecular properties computed by various many-electron<br />
models. In a study of the atomization energies of 11 small molecules 67 no significant difference in<br />
the performance for closed- and open-shell molecules was found for the CCSDT model. However,<br />
in another study 68 it was found that even though the CCSD(T) model performs convincingly for<br />
closed-shell molecules, the performance for open-shell molecules is less impressive.<br />
In this part of the thesis full configuration interaction (FCI) benchmarks of molecular properties for<br />
the small open-shell molecules CN and CCH are presented. In the FCI model, all Slater<br />
determinants arising from distributing the electrons in the given one-electron basis with correct<br />
symmetry and spin-projection are included. Errors due to truncation of the many-electron basis are<br />
thus eliminated in an FCI calculation and it provides important benchmarks for other many-electron<br />
models. For open-shell molecules, the number of FCI benchmarks is limited and the work presented<br />
in this part of the thesis is an attempt to improve on this situation. We thus hope our results will<br />
serve as valuable benchmarks for further analysis of open-shell methods.<br />
3.2 Computational Methods<br />
All calculations have been carried out with the quantum chemical program package LUCIA 69 , using<br />
integrals and Hartree-Fock (HF) orbitals obtained from the DALTON 70 program. The calculations<br />
77
Part 3<br />
Benchmarking for Radicals<br />
are based on a ROHF reference wave function, but no spin-adaption is imposed in the CI and CC<br />
calculations.<br />
All FCI calculations have been carried out in the Dunnings cc-pVDZ 71 basis set. Since the number<br />
of determinants in the FCI model increases exponentially with the number of basis functions and<br />
electrons, it is currently not feasible to do the FCI calculations on CN and CCH in the cc-pVTZ<br />
basis. As the cc-pVDZ basis does not provide accurate geometries and energetics, 46 we will also<br />
obtain the equilibrium geometry, harmonic frequency, and dissociation energy for CN using the ccpVTZ<br />
71 basis set in coupled cluster calculations, including up to quadruple excitations. In addition,<br />
FCI and CC calculations up to quadruples level have been carried out on CN and CN - in the basis<br />
set aug-cc-pVDZ without the diffuse d-functions (aug´-cc-pVDZ) to obtain the vertical electron<br />
affinity of CN.<br />
We investigate two ways of defining the excitation-level in CC. The typical approach is to let the<br />
excitation level identify the allowed number of orbital excitations, denoted CC(orb). If instead the<br />
excitation level is taken to identify the spin-orbital excitation level, selected excitations, which<br />
involve spin-flipping and other internal excitations, are excluded from the calculation for open-shell<br />
molecules. This scheme will be referred to as CC(spin-orb). The difference between the two<br />
definitions of the excitation level is illustrated in Fig. 3.1. The CI calculations will all be carried out<br />
with orbital excitations.<br />
Double<br />
orbital<br />
excitation<br />
Triple<br />
Spin-orbital<br />
excitation<br />
Fig. 3.1 An excitation which would be<br />
included in a CCSD(orb) calculation, but<br />
not in a CCSD(spin-orb) calculation.<br />
In the following SD, SDT, SDTQ, SDTQ5, SDTQ56 and SDTQ567 denote excitation-spaces which<br />
include up to 2, 3, 4, 5, 6 and 7 excitations from the occupied spin-orbitals respectively.<br />
78
Numerical Results<br />
3.3 Numerical Results<br />
First, the convergence of the CC and CI hierarchies for the open shell molecule CN is studied. Next,<br />
the potential curve for CN is obtained from CCSD, CCSDT, CCSDTQ, and FCI calculations at<br />
various inter-nuclear distances. In Section 3.3.3, the equilibrium geometries, harmonic frequencies,<br />
and dissociation energies obtained for CN are presented and in Section 3.3.4 the vertical electron<br />
affinity for CN is found. Finally, in Section 3.3.5 a minor benchmark study is presented where the<br />
equilibrium geometry of the intergalactic radical CCH is determined at the FCI level.<br />
3.3.1 Convergence of CC and CI Hierarchies<br />
The convergence of the CC and CI hierarchies are studied. For CN calculations have been carried<br />
out at the experimental equilibrium distance 72 r exp = 1.1718Å at the levels CCSD through<br />
CCSDTQ56. Both the orbital excitation and spin-orbital excitation approaches are considered. In<br />
addition, calculations have been carried out at the levels CISD through CISDTQ567 and in FCI. In<br />
all calculations the cc-pVDZ basis-set is used. The results are seen in Fig. 3.2.<br />
1.E-01<br />
1.E-02<br />
CI<br />
E dev / E h<br />
1.E-03<br />
1.E-04<br />
1.E-05<br />
CC(spinorb)<br />
CC(orb)<br />
1.E-06<br />
SD<br />
SDT<br />
SDTQ<br />
SDTQ5<br />
SDTQ56<br />
SDTQ567<br />
Fig. 3.2 E dev for CC with spin-orbital and orbital<br />
excitation levels and for CI with orbital excitation<br />
levels. E dev = E – E FCI .<br />
The first thing to note is the similarity of the two CC curves. Clearly the spin-orbital excitation<br />
restriction does not affect the accuracy in a significant way, the deviation energies are in all cases<br />
smaller for CC(orb), but the difference is negligible.<br />
Comparing the CI curve with the CC curves, two trends are obvious; the smooth convergence of the<br />
CC hierarchy compared to the CI hierarchy and the faster convergence of the CC hierarchy. The CC<br />
energy obtained using up to n-fold excitations is roughly as accurate as the CI energy using up to<br />
n+1-fold excitations. Both phenomena are explained by the inclusion of disconnected clusters in the<br />
CC wave function. At a given level of CC theory, the CC wave function includes all the CI<br />
configurations at the same level of CI theory plus some higher excitations arising from disconnected<br />
clusters. Consequently, it covers the dynamical correlation better than CI and is thus at the given<br />
79
Part 3<br />
Benchmarking for Radicals<br />
level closer to the FCI solution. Describing the convergence pattern of the CI and CC hierarchies<br />
through orders of Møller-Plesset perturbation theory (MPPT), 73 the form of the curves can be<br />
predicted. Because also disconnected products of excitations are included in the ansatz of CC, the<br />
order of its error grows continually in the order of MPPT. Going from uneven to even excitation<br />
levels, both methods have an increase in the order of error in energy of two orders of MPPT, thus,<br />
the graphs are parallel. Going from even to uneven excitation levels, the CC error increases one<br />
order, whereas the CI error remains unchanged, giving a greater slope for the CC curve. This<br />
explains the parallel behavior going from uneven to even excitation levels and the smoother<br />
convergence of the CC hierarchy compared to the CI hierarchy. The stepwise convergence<br />
predicted by MPPT, which should be significant for CI and noticeably for CC, is not apparent<br />
though. The reason could be that CN is not strictly mono-configurational.<br />
The convergence patterns for CI and CC are very similar to the convergence patterns previously<br />
reported for N 2 . 74 Therefore, it does not seem that the open-shell nature of CN leads to slow<br />
convergence of the CI and CC hierarchies compared to closed shell cases.<br />
3.3.2 The Potential Curve for CN<br />
The potential curve for CN was determined from single-point calculations at the FCI level with<br />
basis set cc-pVDZ. Close to equilibrium the energies were converged to 10 -9 E h making the<br />
determination of accurate spectroscopic constants possible. The result is displayed in Fig. 3.3.<br />
E FCI / E h<br />
-92.15<br />
-92.20<br />
-92.25<br />
-92.30<br />
-92.35<br />
-92.40<br />
-92.45<br />
-92.50<br />
0.5 1.5 R / Å 2.5 3.5<br />
Fig. 3.3 The potential curve for CN found from FCI<br />
cc-pVDZ calculations.<br />
E dev / E h<br />
0.03<br />
0.02<br />
0.01<br />
0.00<br />
CCSD<br />
CCSDT<br />
CCSDTQ<br />
0.9 1.2 R / Å 1.5 1.8<br />
Fig. 3.4 E dev for the CC potential curves. E dev (R) =<br />
E(R) – E FCI (R).<br />
The potential curve was also created with the methods CCSD(orb), CCSDT(orb) and CCSDTQ(orb)<br />
in the basis set cc-pVDZ. Since the weight of the reference HF- determinant decreases as the internuclear<br />
distance increases, we examine the HF-coefficients from the FCI calculations and discover<br />
that it is irrelevant to make single-reference CC calculations beyond R = 1.8Å, since the weight of<br />
the reference has already dropped to 0.57 at that point. Fig. 3.4 displays the differences of the CC<br />
80
Numerical Results<br />
potential curves compared to the FCI curve. At a given inter-nuclear distance, the FCI energy has<br />
been subtracted from the CC energy.<br />
The decreasing weight of the reference ground state with increasing atomic distance is reflected in<br />
the quality of the CC wave functions. The correlation in the wave function compensates partially for<br />
the lack of a single dominant configuration; the higher the correlation level, the better the<br />
compensation. This is illustrated by the slopes of the curves in Fig. 3.4. Furthermore, it should be<br />
noticed how the deviation energy is nearly linear in R, with a slightly positive curvature around the<br />
equilibrium geometry.<br />
3.3.3 Spectroscopic Constants and Atomization Energy for CN<br />
The equilibrium geometry and harmonic frequency for CN were found from single-point<br />
calculations using quartic interpolation. The atomization energy was found at the experimental<br />
equilibrium distance. The results are displayed in Table 3-1.<br />
Table 3-1 Equilibrium geometry, harmonic frequency, and atomization energy for CN.<br />
R eq / Å ω e / cm -1 D e / kJ/mol<br />
CCSD(spin-orb) cc-pVDZ 1.1855 2114 629.2<br />
CCSD(orb) cc-pVDZ 1.1860 2111 631.6<br />
CCSDT(spin-orb) cc-pVDZ 1.1944 2046 662.9<br />
CCSDT(orb) cc-pVDZ 1.1946 2043 663.0<br />
CCSDTQ(spin-orb) cc-pVDZ 1.1964 2026 666.4<br />
CCSDTQ(orb) cc-pVDZ 1.1964 2025 666.5<br />
FCI cc-pVDZ 1.1969 2020 667.0<br />
CCSD(spin-orb) cc-pVTZ 1.1688 2136 674.2<br />
CCSDT(spin-orb) cc-pVTZ 1.1783 2067 714.4<br />
CCSDTQ(spin-orb) cc-pVTZ 1.1804 2045 718.5<br />
Experimental 72 1.1718 2069 ---<br />
As mentioned in Section 3.2, it is not feasible to carry out FCI calculations at the cc-pVTZ level.<br />
Still, the convergence of the CC hierarchy can be estimated by examining the changes in the<br />
constants. Since the difference in accuracy between the models CC(orb) and CC(spin-orb) is<br />
negligible compared to the deviation from FCI, only the CC(spin-orb) results are discussed from<br />
now on and only the CC(spin-orb) numbers are found at the cc-pVTZ level.<br />
The deviation curves for the coupled cluster energies (see Fig. 3.4) are increasing functions, and<br />
thus the coupled cluster equilibrium bond lengths are shorter than the one found from FCI.<br />
Furthermore, the positive curvature of the deviation-curves around the equilibrium leads to coupled<br />
cluster frequencies that are higher than the FCI frequency.<br />
81
Part 3<br />
Benchmarking for Radicals<br />
As expected, the cc-pVDZ basis set does not provide accurate geometries and frequencies, and the<br />
cc-pVTZ numbers are clearly more in the range of the experimental data than the cc-pVDZ<br />
numbers.<br />
CCSD displays its insufficiency for prediction of equilibrium properties by differing from the FCI<br />
values by 0.01Å in the geometry, 90 cm -1 in the frequency, and 35 kJ/mol in the atomization energy.<br />
The errors in R eq and ω e are reduced by a factor of four going to the CCSDT level and a factor of<br />
five going from the CCSDT to the CCSDTQ level. The error in the atomization energy is reduced<br />
by a factor of nine going to the CCSDT level and a factor of eight going from the CCSDT to the<br />
CCSDTQ level, but while the equilibrium geometry on the CCSDTQ level is only 0.0005Å from<br />
the FCI value, the harmonic frequency is still about 5 cm -1 too high.<br />
Both the equilibrium geometry and the harmonic frequency are apparently better approximated by<br />
the CCSDT method than the CCSDTQ. This is due to a favorable cancellation in errors for CCSDT<br />
calculations in small basis sets. By extrapolation to the larger aug-cc-pVQZ basis, 67,75 we get an<br />
equilibrium distance of 1.1759Å and a harmonic frequency of 2060cm -1 at the CCSDTQ level.<br />
3.3.4 The Vertical Electron Affinity of CN<br />
Calculations on CN - and CN were carried out in the aug´-cc-pVDZ basis at the experimental<br />
equilibrium geometry for CN. The FCI calculation on CN - is one of the largest FCI calculations<br />
carried out so far containing about 20 billion Slater determinants. The vertical electron affinity (EA)<br />
was found and is displayed in Table 3-2. Again only CC(spin-orb) calculations have been carried<br />
out because of the rather small difference in performance of CC(spin-orb) and CC(orb).<br />
Table 3-2 The vertical electron affinity of CN.<br />
EA / E h EA - EA FCI<br />
CCSD(spin-orb) aug’-cc-pVDZ 0.13025 0.00063<br />
CCSDT(spin-orb) aug’-cc-pVDZ 0.12977 0.00014<br />
CCSDTQ(spin-orb) aug’-cc-pVDZ 0.12966 0.00003<br />
FCI aug’-cc-pVDZ 0.12962 ---<br />
The convergence is remarkable; already at the CCSD level we are down to an error of 0.5% of the<br />
FCI value, on the CCSDT level it is 0.1% and on the CCSDTQ level 0.02%. The reason for the<br />
excellent convergence is found in a cancellation of errors that influence the result. The deviations of<br />
the individual energies are always roughly an order of magnitude larger than the deviation of the<br />
affinity, 75 but the errors cancel when the CN and CN - energies are subtracted. That the convergence<br />
is from above is also noteworthy. This is because the CC hierarchy converges faster for CN - than for<br />
82
Numerical Results<br />
CN. This seems surprising since CN - contains one more electron than CN, but it could be explained<br />
by CN - being more one-configurational than CN.<br />
3.3.5 The Equilibrium Geometry of CCH<br />
The equilibrium geometry of CCH found from FCI/cc-pVDZ calculations is used in ref. 76 to<br />
calibrate coupled cluster calculations in larger basis sets. The FCI correction is assumed to be<br />
independent of basis set.<br />
To optimize for the two variables R(CC) and R(CH), the CCH radical is assumed linear and the CC<br />
and CH bonds are then distorted in step-lengths of δ = 0.01Å from an initial geometry making a grid<br />
of single-point calculations around the equilibrium geometry with R(CC) on the one axis and R(CH)<br />
on the other. The initial geometry is taken from a CCSDT cc-pVDZ study 76 , the geometry being<br />
R CCSDT (CC) = 1.23448Å and R CCSDT (CH) = 1.07924Å. The resulting potential energy surface is seen<br />
in Fig. 3.5.<br />
-76.4020<br />
-76.4024<br />
E FCI / E h<br />
-76.4028<br />
-76.4032<br />
-76.4036<br />
1.09924<br />
1.08924<br />
1.07924<br />
1.06924<br />
R (C-H)/Å<br />
1.21448<br />
1.22448<br />
1.23448<br />
R (C-C)/Å<br />
1.24448<br />
1.25448<br />
1.05924<br />
Fig. 3.5 The potential energy surface of CCH.<br />
From finite-difference expressions with the error being of the order δ 4 , the gradient and Hessian are<br />
found for the initial geometry and a Newton step is taken giving an improved guess for the<br />
equilibrium geometry. The FCI equilibrium geometry is thus found as<br />
FCI<br />
CCSDT −1<br />
R = R −H G, (3.1)<br />
where G is the gradient, H the Hessian, and R CCSDT the CCSDT geometry.<br />
The equilibrium geometry at the FCI level is found to be<br />
83
Part 3<br />
Benchmarking for Radicals<br />
R FCI (CC) = 1.2367Å and R FCI (CH) = 1.0802Å.<br />
The error in the resulting geometry is a sum of the error from the finite difference approximations<br />
and the error from the Newton step. The gradient and Hessian carry an error of O(δ 4 ) where δ =<br />
0.01Å, this is an error in the order of 10 -8 Å. The Newton step has an error of O((H -1 G) 2 ), in this<br />
case H -1 G is of the size 10 -3 Å and so the error is in the order of 10 -6 Å. The error in total is thus in<br />
the order of 10 -6 Å.<br />
The gradient for the FCI equilibrium geometry has been found as above, making single-point<br />
calculations at the FCI geometry and at geometries distorted in steps of 0.01Å from the FCI<br />
geometry. The same finite-difference expressions as before are used. The gradient is found to be<br />
⎡<br />
FCI 1.8593 10<br />
E<br />
⎢<br />
;3.0661 10<br />
⎣<br />
Å<br />
⎤<br />
Å⎥⎦<br />
G −5 h<br />
−5<br />
= − ⋅ ⋅ h , (3.2)<br />
thus verifying the correctness of the FCI geometry.<br />
Since the geometry was determined at the CCSDT level to be R CCSDT (CC) = 1.23448Å and<br />
R CCSDT (CH) = 1.07924Å, the error due to truncation of the many-electron basis in CCSDT is in the<br />
order of 10 -3 Å. This is similar to the results obtained for CN. This also suggests that the quadruples<br />
correction to the equilibrium geometry is in the order of 0.001-0.002Å.<br />
3.4 Conclusion<br />
Full configuration interaction (FCI) and coupled cluster (CC) calculations have been carried out on<br />
CN using the cc-pVDZ and cc-pVTZ basis sets. The equilibrium bond distance, harmonic<br />
frequency, atomization energy, and vertical electron affinity have been evaluated on the various<br />
levels of theory.<br />
As expected, the cc-pVDZ basis set does not provide accurate geometries and frequencies and<br />
CCSD is insufficient for prediction of equilibrium properties. Apparently, the CCSDT method is a<br />
better approximation than CCSDTQ for obtaining the equilibrium geometry and the harmonic<br />
frequency. This is due to a favorable cancellation of errors for CCSDT calculations in small basis<br />
sets. Also the vertical electron affinities are affected by cancellation of errors, and already at the<br />
CCSD level, the error is less than 1mE h compared to the FCI value.<br />
The convergence patterns for the CI and CC hierarchies are studied for CN and it is found similar to<br />
the convergence patterns previously reported for N 2 . 74 Thus, it does not seem that the open-shell<br />
nature of CN leads to slow convergence of the CI and CC hierarchies compared to closed shell<br />
cases.<br />
E<br />
84
Conclusion<br />
For a number of the CC calculations, the excitation levels have been defined by spin-orbital<br />
excitations instead of orbital excitations. Certain internal excitations are thereby omitted, but it is<br />
seen that this does not affect the accuracy in any significant way. For a given excitation level, the<br />
energies obtained in the orbital formalism are in all cases closer to the FCI energy than the ones<br />
obtained in the spin-orbital formalism. However, the difference is negligible.<br />
The equilibrium geometry of CCH has been found at the FCI level in the cc-pVDZ basis set to be<br />
R FCI (CC) = 1.2367Å and R FCI (CH) = 1.0802Å. The correction found to the initial CCSDT geometry<br />
is in the order of 10 -3 Å. The FCI correction to the CCSDT equilibrium geometry of CN was of the<br />
same order.<br />
85
Summary<br />
The developments in computer hardware and linear scaling algorithms over the last decade have<br />
made it possible to carry out ab-initio quantum chemical calculations on bio-molecules with<br />
hundreds of amino acids and on large molecules relevant for nano-science. Quantum chemical<br />
calculations are thus evolving to become a widespread tool for use in several scientific branches. It<br />
is therefore important that the algorithms work as black-boxes, such that the user outside quantum<br />
chemistry does not have to be concerned with the details of the calculations. In particular Hartree<br />
Fock (HF) and density functional theory (DFT) methods are employed for calculations on large<br />
systems as they represent good compromises between relatively low computational costs and<br />
reasonable accuracy of the results. The HF and DFT methods have been a fundamental part of<br />
quantum chemistry for many years, and calculations on molecules of ever increasing size and<br />
complexity are made possible due to increasing computer resources. The conventional algorithms<br />
used for optimization of the one-electron density in HF and DFT are therefore continually tried on<br />
their stability and general performance and occasionally they break down. In these cases the<br />
calculation takes more time to complete than acceptable or no result can be obtained at all.<br />
We have improved on this situation. In the first part of this thesis, algorithms are presented which<br />
improve the optimization in HF and DFT significantly. The optimization has become more effective<br />
and where the optimization broke down using conventional algorithms, it now converges without<br />
problems. Furthermore, the presented algorithms have no problem-specific parameters and can thus<br />
be used as black-boxes.<br />
When the one-electron density has been optimized, molecular properties such as polarizabilities and<br />
excitation energies can be calculated. Response theory is often used for this purpose. In the second<br />
part of this thesis an atomic orbital (AO) based formulation of response theory is presented which<br />
allows linear scaling calculations of molecular properties. Furthermore, the derivation of<br />
expressions for molecular properties is simpler in the AO formulation than in the molecular orbital<br />
formulation typically used. To illustrate the benefits, the expression for the geometrical derivative<br />
of the excited state is derived in the AO formulation.<br />
To confirm the reliability of quantum chemical predictions of molecular properties, it is important<br />
to investigate and describe strengths and weaknesses of the quantum chemical models employed.<br />
The full configuration interaction (FCI) model is exact within a certain basis set of atomic orbitals.<br />
It is thus of great value to be able to compare results from approximate models with FCI results. In<br />
the third part of this thesis FCI results are presented for two open-shell molecules, namely CN and<br />
CCH. The FCI results are compared with results from approximate models used today for<br />
calculations where an accuracy comparable to the experimental is needed.<br />
87
Dansk Resumé<br />
Udviklingen i det seneste årti indenfor computerhardware og lineært skalerende algoritmer har gjort<br />
det muligt at udføre ab-initio kvantekemiske beregninger på bio-molekyler med hundredvis af<br />
aminosyrer og på store molekyler relevant for nanoteknologi. Kvantekemiske beregninger udvikler<br />
sig derfor til at være et bredt anvendt værktøj til brug for adskillige naturvidenskabelige grene. Det<br />
er derfor vigtigt at algoritmerne fungerer som såkaldte black-boxes, således at brugere uden for<br />
kvantekemi ikke behøver bekymre sig om detaljerne i beregningen. Især Hartree Fock (HF) og<br />
density functional theory (DFT) metoderne er benyttet til beregninger på store systemer, da de<br />
repræsenterer et godt kompromis mellem fornuftig nøjagtighed af resultaterne og relativ kort<br />
beregningstid. HF og DFT er metoder, som har været anvendt i kvantekemien igennem mange år,<br />
og da stadig større computer ressourcer er til rådighed bliver de brugt til at udføre beregninger på<br />
stadigt større og mere komplekse molekyler. De algoritmer som benyttes i dag til optimering af den<br />
en-elektroniske densitet i HF og DFT bliver derfor til stadighed testet på deres stabilitet og<br />
effektivitet og til tider bryder de sammen. I disse tilfælde tager beregningen enten uacceptabelt lang<br />
tid eller opgiver at levere et resultat.<br />
Vi har forbedret denne situation. I den første del af afhandlingen præsenteres algoritmer, som<br />
signifikant forbedrer optimeringen i HF og DFT. Optimeringen er blevet mere effektiv, og tilfælde<br />
hvor optimeringen før brød sammen kan nu udføres uproblematisk. De præsenterede algoritmer har<br />
desuden ingen problem-specifikke parametre og kan derfor betragtes som black-boxes.<br />
Når den en-elektroniske densitet er optimeret, kan molekylære egenskaber såsom polarisabiliteter<br />
og eksitationsenergier beregnes. Til det formål benyttes ofte responsteori. I anden del af<br />
afhandlingen præsenteres en atomorbitalformulering af responsteori, som muliggør en lineær<br />
skalering af egenskabsberegningerne. Desuden er udviklingen af udtryk for molekylære egenskaber<br />
blevet simplere i atomorbitalformuleringen sammenlignet med molekylorbitalformuleringen som<br />
ellers typisk benyttes. For at illustrere fordelene er udtrykket for den eksiterede tilstands<br />
geometriske gradient udviklet i atomorbitalformuleringen.<br />
For at bekræfte troværdigheden af kvantekemiske forudsigelser af molekylære egenskaber, er det<br />
vigtigt at undersøge og beskrive styrker og svagheder ved de kvantekemiske modeller som<br />
anvendes. Full configuration interaction (FCI) er en eksakt model inden for et bestemt sæt af<br />
atomorbital basisfunktioner. Det er derfor værdifuldt at kunne sammenligne resultater fra<br />
approksimative modeller med FCI resultater. I tredje del af afhandlingen er FCI resultater<br />
præsenteret for to åben-skal molekyler, CN og CCH. Disse resultater er sammenlignet med<br />
resultater fra approksimative modeller, som i dag bruges til at levere kvantekemiske beregninger<br />
med en nøjagtighed, som i visse tilfælde overgår den eksperimentelle.<br />
89
Appendix A<br />
The Derivatives of the DSM Energy<br />
The first and second derivatives of the DSM energy model with respect to c is found recalling that<br />
and<br />
DSM<br />
( ) ( )<br />
( ) ( ) 2Tr<br />
E c = E D + 2TrFD δ , (A-1)<br />
E D = E D0 + DF + 0 + TrDF, + +<br />
(A-2)<br />
n<br />
D = c ( D −D ), (A-3)<br />
+<br />
∑<br />
i=<br />
1<br />
The two terms in Eq. (A-1) is evaluated one by one:<br />
and<br />
∂E<br />
∂c<br />
( D )<br />
x<br />
i<br />
i<br />
0<br />
D δ = 3DSD −2DSDSD −D. (A-4)<br />
= Tr DF − Tr DF + Tr DF + Tr DF−Tr DF −Tr<br />
DF (A-5)<br />
x 0 0 x x x<br />
0 0<br />
∂<br />
∂F<br />
∂D<br />
2TrFDδ<br />
= 2Tr Dδ<br />
+ 2TrF<br />
∂c ∂c ∂c<br />
x x x<br />
∂Dδ<br />
= 2TrFD<br />
x δ + 2Tr F ,<br />
∂c<br />
x<br />
δ<br />
(A-6)<br />
where<br />
∂D<br />
∂<br />
δ<br />
c x<br />
= 3DSD + 3D SD −2DSDSD −2DSD SD −2D SDSD −D . (A-7)<br />
The second derivative is found in the same manner<br />
∂<br />
where<br />
2<br />
E<br />
∂c<br />
x<br />
( D )<br />
∂c<br />
y<br />
x x x x x x<br />
= 2TrDF + TrDF + TrDF −TrDF −TrDF −TrDF −TrDF, (A-8)<br />
0 0 x y y x 0 x x 0 y 0 0 y<br />
2<br />
2<br />
∂<br />
∂ δ ∂ δ ∂ δ<br />
2Tr δ = 2Tr D x + 2Tr D y + 2Tr<br />
D<br />
x y y x x y<br />
FD F F F , (A-9)<br />
∂c ∂c ∂c ∂c ∂c ∂c<br />
2<br />
∂ D<br />
∂c<br />
∂c<br />
x<br />
δ<br />
y<br />
= 3D SD + 3D SD −2DSD SD −2D SDSD −2DSD SD<br />
y x x y y x y x x y<br />
−2DSDSD−2DSDSD −2 DSDSD.<br />
y x x y x y<br />
(A-10)<br />
91
Appendix B<br />
The Density Matrix in the Atomic Orbital Basis<br />
In this appendix we will briefly review the density matrix in the atomic orbital basis and derive the<br />
most important relations. For convenience consider a single-determinant wave function with n<br />
molecular orbitals occupied. The expectation value of a one-electron operator may then be written<br />
as a sum over occupied spin-orbitals<br />
0 hˆ<br />
0<br />
n<br />
= ∑ h . (B-1)<br />
i=<br />
1<br />
ii<br />
Explicitly introducing the MO-AO transformation matrix C allow us to write the expectation value<br />
as<br />
0 hˆ<br />
0<br />
=<br />
n<br />
i=<br />
1<br />
ii<br />
N n<br />
⎛<br />
∗<br />
∑ hµν ∑Cµ iCν<br />
i<br />
µν , = 1 i=<br />
1<br />
N<br />
h<br />
⎞<br />
= ⎜ ⎟<br />
⎝ ⎠<br />
=<br />
∑<br />
∑<br />
h<br />
D<br />
µν µν<br />
µν , = 1<br />
,<br />
(B-2)<br />
where N is the number of AO basis functions and we have introduced D as<br />
D<br />
n<br />
µν C ∗<br />
µ iCνi<br />
i=<br />
1<br />
= ∑ . (B-3)<br />
It is of interest to study the relation between D and the expectation values ∆ of Eq. (2.10). To<br />
accomplish this we consider the second quantization expression for 0 h ˆ 0 in the nonorthogonal<br />
atomic orbital basis. According to ref. 46 one obtains<br />
N<br />
0 hˆ<br />
0 =<br />
0 0<br />
µν , = 1<br />
N<br />
µν , = 1<br />
N<br />
h<br />
1 1 †<br />
aµ a<br />
µν ν<br />
= ∆<br />
=<br />
− −<br />
∑ ( S hS )<br />
−1 −1<br />
∑ ( S hS )<br />
∑<br />
µν<br />
−1 −1<br />
( S ∆S )<br />
µν<br />
µν µν<br />
µν , = 1<br />
.<br />
(B-4)<br />
By comparing Eqs. (B-4) and (B-2) we have the identification<br />
−1 −1<br />
D = S ∆S . (B-5)<br />
93
Thus, the density element D µν is only identical to the matrix element ∆ µν in an orthonormal basis.<br />
Although it could be argued that it would be appropriate to call ∆ the one-electron density matrix in<br />
the AO-basis, we will be consistent with the standard literature and call D the density matrix in the<br />
AO basis, and ∆ the matrix of expectation values of creation-annihilation operators. From the<br />
properties of the one-electron density matrix<br />
D<br />
†<br />
= D<br />
Tr DS = N<br />
DSD = D ,<br />
elec.<br />
(B-6)<br />
one straightforwardly obtains the following relations for ∆<br />
∆<br />
Tr ∆S<br />
−1<br />
†<br />
−1<br />
= ∆<br />
= N<br />
∆S ∆ = ∆.<br />
elec.<br />
(B-7)<br />
Although Eqs. (B-6) and Eqs. (B-7) are formally equivalent, the equations for the standard AO<br />
density matrix D are somewhat simpler to use as they contain the metric S whereas the equations for<br />
∆ involves the inverted metric S -1 . It should be noted that Eqs. (B-7) are necessary and sufficient<br />
conditions, so all three equations are fulfilled if and only if 0 is a normalized single-determinant<br />
wave function.<br />
94
Acknowledgements<br />
A number of people have made <strong>my</strong> four years of <strong>PhD</strong> study a pleasant and interesting experience,<br />
and I could not have done it without them. First of all I would like to thank Jeppe Olsen and Poul<br />
Jørgensen for guidance and support through the years; they are a fantastic team. I am grateful to the<br />
whole theoretical chemistry group for nice lunch breaks and cake-meetings, and I would like to<br />
thank in particular Ove Christiansen for his career advices and Andreas Hesselman for sharing some<br />
of his latest work with me. And Stinne, how I managed to get through the days before Stinne joined<br />
the group is a <strong>my</strong>stery. It quickly turned out that we have much the same attitude towards life and<br />
we have shared many a wholehearted opinion of the life as such and our work situation in<br />
particular.<br />
I would like to thank Pawel Salek for being good company during development and debugging of<br />
Fortran90 code of the finest quality and for being willing to help with any problems that I might<br />
have. A special thanks goes to Sonia Coriani and her husband Asger Halkier who took very good<br />
care of me during <strong>my</strong> visits in Trieste (even though I still havn’t tasted her mum’s lasagna).<br />
For a number of conferences, winter schools and summer schools a group of mainly Scandinavian<br />
people made <strong>my</strong> trips an extra pleasant experience. They were always ready for some boozing and<br />
all sorts of crazy ideas. In particular should be mentioned Patzke-guy; a gentleman disguised as a<br />
theoretician, Pekka; the lizard king, Ulf; the sweet Swede, crazy Mikael, Ola, Tom<strong>my</strong> and all the<br />
others. It has been some really fine hours spent with you guys, and I hope to see you all again,<br />
maybe for a salmari or two – no miksi ei.<br />
I also had the pleasure to spend a summer school with some of the students from the Copenhagen<br />
group: Marianne, Anders, Jacob and Thorsten. Anders and Jacob got connected to the Aarhus group<br />
at some point and have always been up for a nice chat and disgusting body noises to cheer up a grey<br />
day at work.<br />
I would like to thank Birgit Schiøtt for nice colleagueship in connection with teaching and for<br />
coffee and talks in her office. I look forward to our collaboration on <strong>my</strong> next project.<br />
I am grateful to the girl-gang; Louise, Trine, Cindie, and Rikke for keeping the connection to Århus<br />
and for gossip, lunch dates and girl nights.<br />
I would also like to thank <strong>my</strong> parents for raising me as a good girl who always did her homework,<br />
otherwise I would never have gotten this far, and last but not least a great thanks goes to Kristoffer<br />
for putting up with me and being considerate and caring when needed.<br />
95
References<br />
1<br />
2<br />
3<br />
4<br />
5<br />
6<br />
7<br />
8<br />
9<br />
C. C. J. Roothaan, Rev. modern Physics 23, 69 (1951).<br />
G. G. Hall, Proc. R. Soc. London, Ser. A 205, 541 (1951).<br />
W. Kohn and L. J. Sham, Phys. Rev. 140, A1133 (1965).<br />
J. Koutecky and V. Bonacic, J. Chem. Phys. 55, 2408 (1971); T. Claxton and W. Smith, Theor.<br />
Chim. Acta 22, 399 (1971); W. A. Lathan, L. A. Curtiss, W. J. Hehre et al., Progress in<br />
Physical Organic Chemistry. (Wiley, New York, 1974).<br />
D. H. Sleeman, Theor. Chim. Acta 11, 135 (1968).<br />
J. C. Slater, J. B. Mann, T. M. Wilson et al., Phys. Rev. 184, 672 (1969); A. D. Rabuck and<br />
G. E. Scuseria, J. Chem. Phys. 110, 695 (1999); B. I. Dunlap, Phys. Rev. A 29, 2902 (1984).<br />
R. McWeeny, Proc. R. Soc. London Ser. A 235, 496 (1956).<br />
R. McWeeny, Rev. Mod. Phys. 32, 335 (1960).<br />
R. Fletcher and C. M. Reeves, Comput. J. 7, 149 (1964).<br />
10 I. H. Hillier and V. R. Saunders, Proc. R. Soc. London Ser. A 320, 161 (1970).<br />
11 R. Seeger and J. A. Pople, J. Chem. Phys. 65, 265 (1976).<br />
12 R. N. Camp and H. F. King, J. Chem. Phys. 75, 268 (1981).<br />
13 R. E. Stanton, J. Chem. Phys. 75, 3426 (1981).<br />
14 W. R. Wessel, J. Chem. Phys. 47, 3253 (1967); Douady, Ellinger, Subra et al., J. Chem. Phys.<br />
72, 1452 (1980).<br />
15 G. B. Bacskay, Chem. Phys. 61, 385 (1981).<br />
16 R. Shepard, I. Shavitt, and J. Simons, J. Chem. Phys. 76, 543 (1982).<br />
17 H. J. Aa. Jensen and P. Jørgensen, J. Chem. Phys. 80, 1204 (1984); H. J. Aa. Jensen and H.<br />
Ågren, Chem. Phys. Lett. 110, 140 (1984).<br />
18 X. Li, J. M. Millam, G. E. Scuseria et al., J. Chem. Phys. 119, 7651 (2003); E. Hernández, M.<br />
J. Gillan, and C. M. Goringe, Phys. Rev. B 53, 7147 (1996); J. M. Millam and G. E. Scuseria, J.<br />
Chem. Phys. 106, 5569 (1997); M. Challacombe, J. Chem. Phys. 110, 2332 (1999).<br />
19 A. H. R. Palser and D. E. Manolopoulos, Phys. Rev. B 58, 12704 (1998).<br />
20 C. Ochsenfeld and M. Head-Gordon, Chem. Phys. Lett. 270, 399 (1997).<br />
21 R. W. Nunes and D. Vanderbilt, Phys. Rev. B 50, 17611 (1994); M. S. Daw, Phys. Rev. B 47,<br />
10895 (1993); X. P. Li, R. W. Nunes, and D. Vanderbilt, Phys. Rev. B 47, 10891 (1993).<br />
22 G. Galli and M. Parrinello, Phys. Rev. Lett. 69, 3547 (1992); F. Mauri, G. Galli, and R. Car,<br />
Phys. Rev. B 47, 9973 (1993); W. Kohn, Chem. Phys. Lett. 208, 167 (1993); P. Ordejon, D.<br />
Drabold, M. Grunbach et al., Phys. Rev. B 48, 14646 (1993).<br />
23 T. Helgaker, H. Larsen, J. Olsen et al., Chem. Phys. Lett. 327, 397 (2000).<br />
24 A. D. Daniels and G. E. Scuseria, Phys. Chem. Chem. Phys. 2, 2173 (2000).<br />
25 J. VandeVondele and J. Hutter, J. Chem. Phys. 118, 4365 (2003).<br />
26 J. B. Francisco, J. M. Martínez, and L. Martínez, J. Chem. Phys. 121, 10863 (2004).<br />
27 D. R. Hartree, The calculation of atomic structures. (John Wiley and Sons, Inc., New York,<br />
1957).<br />
28 E. Isaacson and H. B. Keller, Analysis of numerical methods. (Wiley, New York, 1966); C. C. J.<br />
Roothaan and P. S. Bagus, Methods in Computational Physics. (Academic, New York, 1963).<br />
29 N. W. Winter and T. H. Dunning Jr., Chem. Phys. Lett. 8, 169 (1971).<br />
97
30 W. B. Neilsen, Chem. Phys. Lett. 18, 225 (1973).<br />
31 M. C. Zerner and M. Hehenberger, Chem. Phys. Lett. 62, 550 (1979).<br />
32 G. Karlström, Chem. Phys. Lett. 67, 348 (1979).<br />
33 P. Pulay, Chem. Phys. Lett. 73, 393 (1980); P. Pulay, J. Comput. Chem. 3, 556 (1982).<br />
34 H. Sellers, Int. J. Quant. Chem. 45, 31 (1993).<br />
35 I. Hyla-Krispin, J. Demuynck, A. Strich et al., J. Chem. Phys. 75, 3954 (1981).<br />
36 E. Cancès and C. Le Bris, Int. J. Quant. Chem. 79, 82 (2000).<br />
37 K. N. Kudin, G. E. Scuseria, and E. Cancès, J. Chem. Phys. 116, 8255 (2002).<br />
38 L. Thøgersen, J. Olsen, D. Yeager et al., J. Chem. Phys. 121, 16 (2004).<br />
39 L. Thøgersen, J. Olsen, A. Köhn et al., J. Chem. Phys. 123, 074103 (2005).<br />
40 A. P. Rendell, Chem. Phys. Lett. 229, 204 (1994).<br />
41 H. Sellers, Chem. Phys. Lett. 180, 461 (1991); C. Kollmar, Int. J. Quant. Chem. 62, 617 (1997).<br />
42 V. R. Saunders and I. H. Hillier, Int. J. Quant. Chem. 7, 699 (1973).<br />
43 S. P. Bhattacharyya, Chem. Phys. Lett. 56, 395 (1978).<br />
44 R. Carbó, J. A. Hernández, and F. Sanz, Chem. Phys. Lett. 47, 581 (1977).<br />
45 E. Cancès and C. Le Bris, Math. Model. Num. Anal. 34, 749 (2000).<br />
46 T. Helgaker, P. Jørgensen, and J. Olsen, Molecular Electronic Structure Theory. (Wiley,<br />
Chichester, 2000).<br />
47 S. Goedecker, Rev. Mod. Phys. 71, 1085 (1999).<br />
48 A. M. N. Niklasson, Phys. Rev. B 66, 155115 (2002).<br />
49 E. Rubensson, Masters <strong>Thesis</strong>, Royal Institute of Technology (KTH), Stockholm, 2005.<br />
50 G. W. Stewart, Introduction to Matrix Computations. (Academic Press, inc., New York, 1973).<br />
51 J. W. Demmel, Applied Numerical Linear Algebra. (SIAM, 1997).<br />
52 R. Fletcher, Practical Methods of Optimization, 2nd ed. (Wiley, New York, 1987).<br />
53 G. Chaban, M. W. Schmidt, and M. S. Gordon, Theor. Chem. Acc. 97, 88 (1997); T. H. Fischer<br />
and J. E. Almlöf, J. Phys. Chem. 96, 9768 (1992).<br />
54 R. E. Stanton, J. Chem. Phys. 75, 5416 (1981).<br />
55 M. A. Natiello and G. E. Scuseria, Int. J. Quant. Chem. 26, 1039 (1984).<br />
56 P. Cizek and J. Paldus, J. Chem. Phys. 47, 3976 (1967); H. Fukutome, Int. J. Quant. Chem. 20,<br />
955 (1981); P. J. Thouless, Nucl. Phys. 21, 225 (1960).<br />
57 V. Bach, E. H. Lieb, M. Loss et al., Phys. Rev. Lett. 72, 2981 (1994); P.-L. Lions, Comm. Math.<br />
PHys. 109, 33 (1987).<br />
58 L. E. Dardenne, N. Makiuchi, L. A. C. Malbouisson et al., Int. J. Quant. Chem. 76, 600 (2000).<br />
59 A. Schafer, H. Horn, and R. Ahlrichs, J. Chem. Phys. 97, 2571 (1992).<br />
60 A. Kalemos, T. H. Dunning Jr., and A. Mavridis, J. Chem. Phys. 123, 014302 (2005); R. G. A.<br />
R. Maclagan and G. E. Scuseria, J. Chem. Phys. 106, 1491 (1997); I. Shim and K. A. Gingerich,<br />
Int. J. Quant. Chem. S23, 409 (1989).<br />
61 H. Larsen, P. Jørgensen, J. Olsen et al., J. Chem. Phys. 113, 8908 (2000).<br />
62 J. Olsen and P. Jørgensen, in Modern Electronic Structure Theory, Part II, edited by D. R.<br />
Yarkony (World Scientific, Singapore, 1995).<br />
63 J. Olsen and P. Jørgensen, J. Chem. Phys. 82, 3235 (1985).<br />
64 J. Olsen, H. J. Aa. Jensen, and P. Jørgensen, J. Comp. Phys. 74, 265 (1988).<br />
98
65 T. Helgaker and P. Jørgensen, Theor. Chim. Acta 75, 111 (1989); T. Helgaker and P. Jørgensen,<br />
in Advances in Quantum Chemistry (Academic Press, 1988), Vol. 19; T. Helgaker and P.<br />
Jørgensen, in Methods in Computational Molecular Physics, edited by S. Wilson and G. H. F.<br />
Diercksen (Plenum Press, New York, 1992).<br />
66 H. Larsen, T. Helgaker, P. Jørgensen et al., J. Chem. Phys. 115, 10344 (2001).<br />
67 D. Feller and J. A. Sordo, J. Chem. Phys. 113, 485 (2000).<br />
68 D. Sherrill E. F. C. Byrd, and M. Head-Gordon, J. Phys. Chem. A 105, 9736 (2001).<br />
69 J. Olsen, LUCIA, a quantum chemical program package.<br />
70 T. Helgaker, H. J. Aa. Jensen, P. Joergensen et al., DALTON, an electronic structure program<br />
(1997).<br />
71 T. H. Dunning Jr., J. Chem. Phys. 90, 1007 (1989).<br />
72 K. P. Huber and G. Herzberg, Molecular Spectra and Molecular Structure IV. Constants of<br />
Diatomic Molecules. (Van Nostrand, New York, 1979).<br />
73 W. Kutzelnigg, Theor. Chim. Acta 80, 349 (1991).<br />
74 J. W. Krogh and J. Olsen, Chem. Phys. Lett. 344, 578 (2001).<br />
75 L. Thøgersen and J. Olsen, Chem. Phys. Lett. 393, 36 (2004).<br />
76 P. G. Szalay, L. Thøgersen, J. Olsen et al., J. Phys. Chem. A 108, 3030 (2004).<br />
99
Part 1<br />
The Trust-region Self-consistent Field Method:<br />
Towards a Black Box optimization in Hartree-Fock and Kohn-Sham Theories,<br />
L. Thøgersen, J. Olsen, D. Yeager, P. Jørgensen, P. Sałek, and T. Helgaker,<br />
J. Chem. Phys. 121, 16 (2004)
JOURNAL OF CHEMICAL PHYSICS VOLUME 121, NUMBER 1 1 JULY 2004<br />
The trust-region self-consistent field method: Towards a black-box<br />
optimization in Hartree–Fock and Kohn–Sham theories<br />
Lea Thøgersen, Jeppe Olsen, Danny Yeager, a) and Poul Jørgensen<br />
Department of Chemistry, University of Århus, DK-8000 Århus C, Denmark<br />
Paweł Sałek<br />
Laboratory of Theoretical Chemistry, The Royal Institute of Technology,<br />
Teknikringen 30, Stockholm SE-10044, Sweden<br />
Trygve Helgaker<br />
Department of Chemistry, University of Oslo, P.O. Box 1033 Blindern, N-0315 Norway<br />
Received 17 February 2004; accepted 5 April 2004<br />
The trust-region self-consistent field TRSCF method is presented for optimizing the total energy<br />
E SCF of Hartree–Fock theory and Kohn–Sham density-functional theory. In the TRSCF method,<br />
both the Fock/Kohn–Sham matrix diagonalization step to obtain a new density matrix and the step<br />
to determine the optimal density matrix in the subspace of the density matrices of the preceding<br />
diagonalization steps have been improved. The improvements follow from the recognition that local<br />
models to E SCF may be introduced by carrying out a Taylor expansion of the energy about the<br />
current density matrix. At the point of expansion, the local models have the same gradient as E SCF<br />
but only an approximate Hessian. The local models are therefore valid only in a restricted region—<br />
the trust region—and steps can only be taken with confidence within this region. By restricting the<br />
steps of the TRSCF model to be inside the trust region, a monotonic and significant reduction of the<br />
total energy is ensured in each iteration of the TRSCF method. Examples are given where the<br />
TRSCF method converges monotonically and smoothly, but where the standard DIIS method<br />
diverges. © 2004 American Institute of Physics. DOI: 10.1063/1.1755673<br />
I. INTRODUCTION<br />
The steady progress in computer technology and<br />
quantum-chemical methodology has widened the range of<br />
users of quantum-chemical software packages to include a<br />
vast number of practicing, experimental chemists. Routinely,<br />
such users perform Hartree–Fock HF calculations and<br />
Kohn–Sham KS density-functional theory DFT calculations<br />
for molecules of a size and complexity that, a decade<br />
ago, were beyond reach even for the most advanced research<br />
codes. This development calls for further advances in the<br />
automatization of the self-consistent field SCF procedure<br />
used to optimize the HF and DFT energies, so as to ensure<br />
that convergence may be reached in a routine manner even<br />
for very complex molecules.<br />
In the original formulation, the SCF procedure consists<br />
of a sequence of Roothaan–Hall RH iterations. 1,2 At each<br />
iteration, a Fock/KS matrix is first constructed from the current<br />
approximation to the one-electron density matrix and<br />
then diagonalized to yield an improved set of orbitals and<br />
orbital energies and thus an improved density matrix. In the<br />
subsequent iteration, this improved density matrix is then<br />
used to construct a new Fock/KS matrix, thereby establishing<br />
the iteration procedure. However, such a sequence of RH<br />
a On leave. Permanent address: Department of Chemistry, Texas A&M University,<br />
P.O. Box 30012, College Station, Texas 77842-3012.<br />
iterations converges only in simple cases. To improve upon<br />
the convergence, each RH iteration may be extended to include,<br />
in addition to the diagonalization step, also a step<br />
where the best density matrix is generated in the subspace of<br />
the density matrices of the current and preceding RH iterations.<br />
In the next RH iteration, this averaged density matrix<br />
rather than the pure density matrix obtained in the last diagonalization<br />
is used to construct the new Fock/KS matrix.<br />
In this paper, we make improvements both to the RH<br />
diagonalization step and to the density-subspace optimization<br />
step of the SCF scheme. Our approach follows from the<br />
recognition that, in both steps, we may construct local models<br />
to the SCF energy function E SCF by a Taylor expansion of<br />
the energy about the current density matrix. However, since,<br />
at the point of expansion, these models have an exact gradient<br />
but only an approximate Hessian, they are valid only in a<br />
restricted region about the current approximation to the density<br />
matrix—the trust region. Therefore, when these local<br />
models are used in the course of the SCF optimization, it is<br />
essential they are used only to generate steps within their<br />
trust region. Only in this manner can it be ensured that the<br />
SCF energy is systematically and sufficiently lowered at each<br />
iteration.<br />
In the RH diagonalization part of the SCF optimization,<br />
the improvements are obtained by introducing an energy<br />
function E RH that corresponds to the sum of the occupied<br />
0021-9606/2004/121(1)/16/12/$22.00 16<br />
© 2004 American Institute of Physics<br />
Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Self-consistent field method<br />
17<br />
orbital energies. 3 An unconstrained minimization of E RH results<br />
in the same solution i.e., density matrix as obtained by<br />
a diagonalization of the Fock/KS matrix. However, since, at<br />
the point of expansion, the RH energy function E RH has only<br />
the gradient in common with the true SCF energy E SCF ,a<br />
global minimization of E RH may lead to steps that are too<br />
long to be trusted. We therefore introduce a trust region<br />
where E RH is a good approximation to E SCF . If a global<br />
minimization of E RH leads to a step outside the trust region,<br />
then the step to the minimum on the boundary of the trust<br />
region for E RH is taken instead. This step is found by a<br />
level-shifting technique, where the occupied molecular orbital<br />
energies effectively are shifted by some constant to increase<br />
the gap between the occupied and virtual molecular<br />
orbitals. Level shifting has previously been used to improve<br />
the convergence of the simple RH sequence of iterations. An<br />
essential feature of our implementation is to adjust the level<br />
shift in such a manner that the step is to the boundary of the<br />
trust region, recognizing that only in this manner does a lowering<br />
of E RH result in a lowering of E SCF . For this reason,<br />
the resulting method is called the trust-region RH TRRH<br />
method.<br />
The optimization of the density matrix in the subspace of<br />
the density matrices of the preceding RH iterations has a<br />
long history. Early on, it was recognized that a simple averaging<br />
of the density matrices of the last few RH iterations<br />
significantly improves the convergence of the RH scheme.<br />
This simple density-matrix averaging technique was later rationalized<br />
and systematized in the direct inversion in iterative<br />
subspace DIIS method of Pulay. 4 In the DIIS method,<br />
an improved density matrix is obtained as a linear combination<br />
of the previous density matrices by minimizing the norm<br />
of the corresponding linear combination of gradients. The<br />
DIIS method significantly speeds up the local convergence<br />
and convergence can often be obtained to ground states of<br />
rather complex molecules with a small gap between energies<br />
of the highest occupied molecular orbital HOMO and the<br />
lowest unoccupied molecular orbital LUMO and with a<br />
large number of close-lying electronic states.<br />
Several attempts have been made to modify the DIIS<br />
algorithm so as to improve upon its global convergence behavior.<br />
Recently, Kudin, Scuseria, and Cances proposed the<br />
energy DIIS EDIIS method, where the DIIS gradient-norm<br />
minimization is replaced by a minimization of an approximate<br />
energy function. 5 In EDIIS, the variational parameters,<br />
which are the linear expansion coefficients of the density<br />
matrices from the previous RH iterations, may only take on<br />
values that give densities in the convex set—that is, densities<br />
with occupation numbers between 0 and 1. As the EDIIS<br />
method is based on the minimization of an approximate energy<br />
function, it may have some advantages in the global<br />
region. However, it is worrying that a convex solution often<br />
cannot be obtained and that the observed local convergence<br />
of the EDIIS method is slower than in the standard DIIS<br />
method.<br />
In the DIIS and EDIIS methods, an improved density<br />
matrix is obtained as a sum of the density matrices from the<br />
preceding RH diagonalization steps. Consequently, the averaged<br />
density matrix is not idempotent as required in HF and<br />
KS theories. The deviation from idempotency may be reduced<br />
using a purified density matrix as the one suggested by<br />
McWeeny. 6 This has been done for the SCF energy minimization<br />
by several workers including Nunes and Vanderbilt 7<br />
and Daniels and Scuseria 8 and for the calculation of geometrical<br />
derivatives by Ochsenfeld and co-workers. 9 It may<br />
also be done for the EDIIS energy function. The energy function<br />
then has the same gradient as E SCF , but also contains<br />
terms which cannot be obtained from the densities and<br />
Fock/KS matrices of the previous RH iterations. Neglecting<br />
these terms, we arrive at the density-subspace minimization<br />
DSM algorithm proposed in this paper. At the point of expansion,<br />
the DSM energy function E DSM thus has the same<br />
gradient as the true energy function E SCF but only an approximate<br />
Hessian. Again, a trust region may be introduced<br />
and only steps within this region are taken, ensuring that any<br />
lowering of E DSM also corresponds to a lowering of E SCF .<br />
The resulting method is called the trust-region DSM<br />
TRDSM method.<br />
In the next section, we first describe the standard optimization<br />
of the SCF energy function in a density-matrix formulation.<br />
The TRRH method is then discussed in Sec. II A<br />
and the TRDSM method in Sec. II B. In Sec. III, we give<br />
some numerical examples to demonstrate the performance of<br />
the resulting trust-region SCF TRSCF method. The last<br />
section contains some concluding remarks.<br />
II. THEORY<br />
For a closed-shell system with N/2 electron pairs, the<br />
Hartree–Fock HF energy excluding the nuclear–nuclear repulsion<br />
energy is given by 3<br />
E SCF D2 TrhDTr DGD,<br />
1<br />
where D is the one-electron density matrix in the atomicorbital<br />
AO basis, h is the one-electron Hamiltonian matrix<br />
and GD is defined as<br />
G D <br />
2g g D , 2<br />
where g is a two-electron integral in the AO basis. For<br />
the energy in Eq. 1 to be a valid approximation to the true<br />
HF energy, the density matrix D must satisfy the symmetry,<br />
trace, and idempotency conditions:<br />
D T D,<br />
3<br />
Tr DS N 2 ,<br />
DSDD.<br />
5<br />
Similar conditions apply in the Kohn–Sham KS theory, but<br />
the energy function of Eq. 1 must then be modified by<br />
including the exchange-correlation term and by scaling or<br />
complete removal of the exchange term from Eq. 2.<br />
The traditional approach to the optimization of the HF<br />
energy is an iterative one. From the current approximation to<br />
the density matrix D n in iteration n, a Fock matrix is built<br />
FD n hGD n <br />
6<br />
4<br />
Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
18 J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Thøgersen et al.<br />
and, following the Roothaan–Hall RH procedure, the Fock<br />
matrix is diagonalized<br />
FD n C occ SC occ ,<br />
7<br />
where S is the overlap matrix in the AO basis, to give a set of<br />
occupied molecular orbitals MOs, from which a new approximation<br />
to the density matrix is obtained as<br />
D n1 C occ C T occ . 8<br />
The iteration procedure is established using D n1 as the current<br />
density in Eq. 6. The final solution to the minimization<br />
problem is obtained when the D n and D n1 are the same.<br />
This self-consistent field SCF procedure may also be used<br />
in KS theory, the only difference being the addition of the<br />
exchange-correlation potential and the scaling of the exchange<br />
contribution in the Fock matrix to yield the KS matrix.<br />
The pure RH iterations presented above often do not<br />
converge. A powerful method for handling this divergence is<br />
not to construct the Fock matrix from the density matrix D n<br />
but rather from an average of all previous density matrices:<br />
n<br />
D¯n c i D i .<br />
i1<br />
The averaged density matrix D¯n is then used in place of the<br />
pure density matrix D n in Eq. 6 to obtain the Fock matrix<br />
F(D¯n) as<br />
n<br />
FD¯n c i FD i <br />
10<br />
i1<br />
and the iteration procedure is established. In the course of the<br />
TRSCF iterations, the following matrices are set up in the<br />
order indicated: D 1 , F(D 1 ), D 2 , F(D 2 ), D¯2 , F(D¯2), D 3 ,<br />
F(D 3 ), D¯3 , F(D¯3),.... Among these, D 1 , F(D 1 ), D 2 ,<br />
F(D 2 ), D 3 , F(D 3 ), . . . are saved during the iteration procedure.<br />
In the following, we describe improvements to the SCF<br />
diagonalization and density-subspace optimization steps. In<br />
Sec. II A, we describe how the trust-region RH TRRH<br />
method is used to generate new density matrices by a modification<br />
of the traditional RH method Eqs. 7 and 8. Next,<br />
in Sec. II B, we introduce the trust-region density-subspace<br />
minimization TRDSM method for calculating the averaged<br />
density matrix of Eq. 9. In the following, we use the indices<br />
i, j,k,l for occupied MOs and the indices a,b,c,d for the<br />
virtual MOs.<br />
A. The trust-region Roothaan–Hall method<br />
As discussed in Ref. 3, the traditional RH method may<br />
be viewed as a minimization of the sum of the orbital energies<br />
of the occupied MOs<br />
9<br />
E RH 2<br />
i<br />
i 2TrFD¯D, 11<br />
subject to orthonormality constraints on the occupied MOs<br />
i :<br />
i j ij .<br />
12<br />
Whereas D¯ is the current approximation to the HF/KS density<br />
matrix, usually obtained as a linear combination of the<br />
previous densities according to Eq. 9, the density matrix D<br />
to be optimized in Eq. 11 is related to the occupied MOs<br />
resulting from the diagonalization of F(D¯) as<br />
DC occ C T occ . 13<br />
To see this, consider the constrained minimization of E RH in<br />
Eq. 11 expressed in terms of the Lagrangian<br />
L2 TrFD¯D2 T TrCocc SC occ I N/2 , 14<br />
where the multipliers ij ensure orthonormality among the<br />
occupied MOs. Minimization of this Lagrangian leads to the<br />
standard RH equations:<br />
FD¯Cocc SC occ .<br />
15<br />
However, since E RH of Eq. 11 is only a crude model of the<br />
true energy E SCF the gradient is correct at D¯ assuming D¯ is<br />
idempotent, a global minimization of E RH according to Eq.<br />
15 may easily lead to steps that are too long to be trusted as<br />
they are outside the region where E RH is a good approximation<br />
to E SCF . Steps outside the trust region may often not<br />
lead to a reduction of the total energy E SCF .<br />
1. The level-shifted Roothaan–Hall equations<br />
To avoid too long steps, an additional constraint is imposed<br />
on the optimization of Eq. 11, namely, that the new<br />
density matrix D in Eq. 13 does not differ too much from<br />
the old matrix D¯. This condition is conveniently expressed in<br />
terms of the overlap between the density matrices in the S<br />
metric norm<br />
DD¯ S Tr DSD¯Sa N 2 Tr D¯SD¯S,<br />
16<br />
where Tr D¯SD¯S N/2 since D¯ is not necessarily idempotent.<br />
Note that, for D equal to an idempotent D¯, a is equal to<br />
one. For a sufficiently close to one, a step will therefore be<br />
taken in the local region. In practice, we define sufficiently<br />
close to one by the parameter a min 0.975.<br />
Introducing an undetermined multiplier associated<br />
with this new constraint, we obtain the following Lagrangian:<br />
L2 TrFD¯D2Tr SD¯SDa N 2 Tr D¯SD¯S <br />
2 TrC T occ SC occ I N/2 . 17<br />
Differentiating this Lagrangian with respect to the MO coefficients<br />
and setting the result equal to zero, we arrive at the<br />
level-shifted RH equations<br />
FD¯SD¯SC occ SC occ .<br />
18<br />
To interpret the level-shift term, we note that D¯S projects out<br />
the component of C occ that is occupied in D¯ assuming idempotent<br />
D¯), see Ref. 3. The level shift therefore works only on<br />
the occupied part of F(D¯), shifting all the occupied orbital<br />
energies and increasing the gap between the occupied and<br />
virtual MOs, in particular the HOMO-LUMO gap.<br />
Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Self-consistent field method<br />
19<br />
where i HOMO () and a LUMO () are the HOMO and LUMO<br />
orbital energies, respectively; in Fig. 1b, we have plotted<br />
the overlap between the old and new density matrices as<br />
given by<br />
DD¯ S<br />
a<br />
,<br />
DDS D¯D¯ S<br />
22<br />
where D()D() S is equal to N/2. For sufficiently large<br />
, the HOMO-LUMO gap Eq. 21 is linear in . This linearity<br />
of ai () for large arises from the dependence of<br />
the orbital energies on in Eq. 19, where is effectively<br />
subtracted from the occupied orbital energies. The MOs C¯occ<br />
occupied in D¯ satisfy the generalized eigenvalue equations<br />
SD¯SC¯occ SC¯occ ,<br />
23<br />
and become identical to the MOs C occ () obtained from Eq.<br />
19 when tends to infinity. The corresponding density is<br />
denoted<br />
T<br />
DC¯occ C¯occ<br />
24<br />
FIG. 1. For the fourth iteration of the rhodium calculation described in Sec.<br />
III we have displayed as a function of the level-shift parameter ; a the<br />
HOMO-LUMO gap ai , where min is the smallest accepted level-shift,<br />
b the overlap a between the old and new density matrices, where opt is<br />
the optimal level-shift, and c the change in the model energy E RH and the<br />
actual energy E RH SCF .<br />
Since the SCF energy E SCF is invariant with respect to<br />
an orthogonal transformation between the MOs, Eq. 18<br />
may be transformed to the canonical basis:<br />
FD¯SD¯SC occ SC occ ,<br />
where the diagonal matrix contains the orbital energies.<br />
2. Choice of the RH level-shift parameter<br />
19<br />
The density matrix generated from the restricted RH solution<br />
Eq. 19 depends on the level-shift parameter :<br />
DC occ C T occ . 20<br />
To see how is determined, we consider the determination<br />
of in the fourth iteration of the rhodium-complex calculation<br />
described in Sec. III. In Fig. 1a, we have plotted the<br />
HOMO-LUMO gap as a function of ,<br />
ai a LUMO i HOMO ,<br />
21<br />
and represents a purified D¯. In the linear regime of ai (),<br />
there is a continuous development of the occupied MOs from<br />
those occupied in D¯. As decreases and we enter the nonlinear<br />
regime at min , the MOs in Eq. 20 no longer correspond<br />
to those in Eq. 23. Comparing plot a and b in Fig.<br />
1, we note that the region a()a min in Fig. 1b corresponds<br />
roughly to the region min in Fig. 1a.<br />
As we insist on a controlled, continuous development of<br />
the MOs from those occupied in D¯, the level-shift parameter<br />
should be restricted to the linear regime min . To determine<br />
the optimal level-shift parameter opt , we therefore<br />
begin by establishing the onset of linearity min by linear<br />
extrapolation by means of two Fock/KS matrix diagonalizations,<br />
giving the two ai values marked by crosses and the<br />
linearly interpolated min value marked with an arrow. Next,<br />
since, in the linear interval, a small corresponds to a large<br />
step, we investigate whether min is acceptable by checking<br />
if a( min )a min . If this step is too long, we backtrack by<br />
increasing using inexact line search until an acceptable<br />
value opt is found such that a( opt )a min , requiring a few<br />
additional Fock/KS matrix diagonalizations. In Fig. 1b, the<br />
accepted opt is marked with an arrow.<br />
For a better understanding of this step, consider the Hessian<br />
of the E RH energy function:<br />
A RH ai,bj ij ab a i . 25<br />
By restricting the level-shift parameter to min where<br />
LUMO a () HOMO i ()0, we ensure that the effective Hessian<br />
is positive definite and that the model energy function<br />
E RH is reduced. We note that the Hessian of the true energy<br />
function E SCF is given by the more complicated expression<br />
A SCF ai,bj ij ab a i 4g aibj g abij g ajib . 26<br />
Often, the orbital energy difference dominates the Hessian.<br />
In such cases, we expect the above step to reduce the SCF<br />
energy E SCF as well as the model function E RH . In any case,<br />
when a sufficiently large level shift is added in Eq. 19, the<br />
Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
20 J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Thøgersen et al.<br />
Hessian structure of Eq. 25 becomes similar to that of the<br />
true energy function E SCF in Eq. 26. The steps generated<br />
from E RH with such level shifts will therefore have essentially<br />
the same direction as the ones generated from E SCF .<br />
By construction, the E RH energy function is lowered<br />
when is chosen according to the above prescription<br />
E RH 2TrFD¯DD¯0.<br />
27<br />
Since E RH is only a local model of the true energy function<br />
E SCF , the associated change in the true energy<br />
E RH SCF E SCF DE SCF D¯<br />
28<br />
may be either negative or positive, depending on how well<br />
E RH represents E SCF for the chosen step. However, for sufficiently<br />
small steps, E RH SCF 0, since the model function then<br />
represents the true energy well.<br />
Let us consider the relationship between the true lowering<br />
E RH<br />
SCF and the lowering predicted by the model function<br />
E RH . Introducing the presumably small differential density<br />
matrix<br />
DD¯<br />
29<br />
and using the identity Tr AG(B)Tr BG(A) valid for symmetric<br />
matrices A and B, we find that the change in the true<br />
energy Eq. 28 may be written in the form<br />
E RH SCF 2TrhDD¯TrD¯<br />
GD¯Tr D¯GD¯<br />
2 Trh2 TrGD¯Tr G,<br />
30<br />
which shows that the changes in the true energy and in the<br />
model energy are related as<br />
E RH SCF E RH Tr G.<br />
31<br />
If the last term which is second order in is negligible, the<br />
energy lowering predicted by the local model E RH becomes<br />
equal to E RH SCF . However, since the correction term is positive<br />
strictly positive in the absence of exchange, its presence<br />
in Eq. 31 shows that, for sufficiently large steps, a<br />
lowering of the model function may not lead to a lowering of<br />
the total energy. To avoid such steps, it would be useful to<br />
provide an alternative prediction of E RH<br />
SCF that is less expensive<br />
than the calculation of Tr G itself. Section II A 3 is<br />
concerned with this problem.<br />
To demonstrate the efficiency of the chosen level shift<br />
opt in the global region of a SCF optimization, we have for<br />
the fourth iteration of the rhodium-complex calculation plotted<br />
in Fig. 1c, E RH<br />
SCF and E RH as a function of . The<br />
energy gain E RH SCF is about optimal for the level shift opt .<br />
Increasing gives a smaller energy gain while decreasing <br />
gives a slight increase in the energy gain and from 4.5,<br />
RH is actually positive. Note also that for opt , E RH<br />
E SCF<br />
RH<br />
and E SCF start to differ indicating that the importance of<br />
Tr G increases. The step representing a RH iteration<br />
where 0 is far too long to be trusted and results in a<br />
significant increase of the total energy.<br />
3. Prediction of the energy close to the minimum<br />
To develop a better prediction of E RH<br />
SCF than E RH ,we<br />
note that the only part, that cannot easily be evaluated from<br />
known Fock-matrices, is the second-order contribution to Eq.<br />
31 from that part of that does not belong to the linear<br />
space spanned by the previous density matrices D i . To see<br />
this, we decompose the current density matrix D into two<br />
parts<br />
DD D ,<br />
32<br />
where D belongs to the linear space spanned by the previous<br />
density matrices and D belongs to its orthogonal complement.<br />
We then expand D in the following manner:<br />
n<br />
D <br />
i1<br />
c i D i ,<br />
33<br />
where the expansion coefficients c i () are determined in a<br />
least-squares manner<br />
n<br />
c i M 1 ij Tr D j SDS, M ij Tr D i SD j S.<br />
j1<br />
34<br />
The change in the SCF energy associated with the change of<br />
density matrix from D¯ to D may be expressed as<br />
E RH SCF E SCF D E SCF D¯2 TrD FD <br />
Tr D GD .<br />
35<br />
Ignoring the small term quadratic in D , we may now predict<br />
the change in the SCF energy at little cost from the<br />
expression<br />
E P SCF E SCF D E SCF D¯2 TrD FD , 36<br />
using only the density matrices and Fock/KS matrices of the<br />
previous iterations. In particular in the later parts of the iteration<br />
sequence, where the space spanned by the densities<br />
of the preceding RH iterations is large, an accurate estimate<br />
of E RH<br />
SCF may be obtained from this formula. In the following,<br />
we shall see how we may use this prediction to determine<br />
the level shift when min 0 and a(0)a min .<br />
P<br />
To illustrate how E SCF is used to find the level-shift<br />
parameter, consider as an example the determination of the<br />
level-shift parameter in the ninth iteration of the rhodiumcomplex<br />
calculation of Sec. III. The plot of the HOMO-<br />
LUMO gap in Fig. 2a shows that the allowed level-shift<br />
interval is 0. In Fig. 2b, we have plotted the overlap<br />
a() as a function of . Since a(0)a min , we should,<br />
according to the discussion in Sec. II A 2, use opt 0 to<br />
determine the step. In short, considerations based on the<br />
HOMO-LUMO gap and on the overlap with the averaged<br />
density matrix indicate that the next density matrix should be<br />
determined from the standard, unshifted RH equations.<br />
However, from the nine density matrices of the previous<br />
P<br />
RH iterations, we can use E SCF () to predict the change in<br />
E RH SCF () more accurately than with E RH (). Indeed, from<br />
P<br />
Fig. 2c, we see that E SCF () provides a good global representation<br />
of E RH SCF (), with a minimum close to the minimum<br />
of E RH SCF (). By contrast, the local model E RH ()<br />
Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Self-consistent field method<br />
21<br />
n<br />
D¯ c i D i .<br />
i1<br />
37<br />
Ideally, this averaged density should also fulfill the conditions<br />
Eqs. 3–5. The symmetry condition Eq. 3 is trivially<br />
satisfied since the averaged density Eq. 37 is a linear<br />
combination of symmetric density matrices. The trace condition<br />
Eq. 4 is also easily taken care of by imposing the<br />
restriction<br />
n<br />
i1<br />
c i 1<br />
38<br />
on the expansion coefficients<br />
n<br />
Tr D¯S c i Tr D i S N<br />
i1<br />
2 .<br />
39<br />
By contrast, the idempotency condition Eq. 5 cannot be<br />
imposed on the averaged density matrix. However, the idempotency<br />
may be significantly improved if, instead of working<br />
with D¯, we work with the purified density matrix 6<br />
D˜ 3D¯SD¯2D¯SD¯SD¯,<br />
40<br />
as proposed by Nunes and Vanderbilt. 7 The electronic energy<br />
may be expressed in terms of the purified average density<br />
matrix as<br />
ED˜ 2 TrhD˜ Tr D˜ GD˜ .<br />
41<br />
FIG. 2. For the ninth iteration of the rhodium calculation described in Sec.<br />
III we have displayed as a function of the level-shift parameter ; a the<br />
HOMO-LUMO gap ai , where min 0, b the overlap a between the old<br />
and new density matrices, where a min is the smallest accepted overlap and<br />
c the change in the model energy E RH , the actual energy E RH<br />
SCF and the<br />
P<br />
P<br />
predicted energy E SCF . opt is found at the minimum of E SCF ().<br />
gives a minimum at 0. Clearly, 0 should be avoided<br />
in the calculation since it would lead to an increase in the<br />
SCF energy. Instead, the value of the level-shift parameter<br />
P<br />
that corresponds to the minimum of E SCF denoted by opt )<br />
is chosen for the calculation of the next density matrix.<br />
This procedure may be summarized as follows. If min<br />
0 and a(0)a min , then we calculate the predicted energies<br />
P<br />
P<br />
P<br />
E SCF (0) and E SCF () with 0. If E SCF (0)<br />
P<br />
E SCF (), then we use D0. Otherwise, we estimate the<br />
P<br />
minimum opt of E SCF () by an inexact line search and<br />
use the density matrix D( opt ) at this minimum.<br />
B. Density-subspace minimization<br />
1. The DSM energy function<br />
Let us assume that we have carried out n RH iterations<br />
and that we have kept all previous density matrices D i and<br />
the corresponding Fock matrices F i . We would now like to<br />
construct an optimal density as a linear combination of the<br />
densities from these iterations according to Eq. 9,<br />
We note that the purified density is correct to first order in<br />
the expansion coefficients c i and that E(D˜ ) thus contains<br />
errors through second order in c i . To determine the best<br />
average density matrix Eq. 37, we shall minimize Eq. 41<br />
with respect to the expansion coefficients c i subject to the<br />
condition Eq. 38.<br />
One problem we encounter when minimizing Eq. 41 is<br />
that new Fock matrices F(D˜ ) need to be evaluated. To avoid<br />
this problem, we shall use an approximate form of Eq. 41.<br />
Since the purified density matrix D˜ is close to the original<br />
density matrix D¯, we can write it as<br />
D˜ D¯,<br />
42<br />
where is the correction term. Inserting Eq. 42 into Eq.<br />
41, we obtain<br />
E2 TrhD¯Tr D¯GD¯2 Trh<br />
2 TrGD¯Tr G.<br />
43<br />
Since is small, we may ignore the term quadratic in and<br />
arrive at the density-subspace minimization DSM energy<br />
function<br />
E DSM c2 TrhD¯Tr D¯GD¯2 Trh2 TrGD¯<br />
ED¯2 TrFD¯D˜ D¯.<br />
44<br />
Since is first order in the expansion coefficients c i , the<br />
DSM energy differs from the true energy to second and<br />
Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
22 J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Thøgersen et al.<br />
higher orders in c i . The first contribution to the DSM energy<br />
function may for example be evaluated using the energy expression<br />
of the EDIIS algorithm, 5<br />
Lc,,E 0 c T g 1 2 c T Hcc T 1<br />
1 2 c T Mch 2 ,<br />
52<br />
ED¯ <br />
i<br />
c i E SCF D i 1 2<br />
ij<br />
c i c j TrF i F j D i D j .<br />
45<br />
Using Eq. 40, we find that the second contribution may be<br />
evaluated as<br />
where 1 is a column vector with elements equal to 1. Differentiating<br />
this Lagrangian and setting the derivatives equal to<br />
zero, we obtain the equations<br />
L<br />
c gHcMc10,<br />
53<br />
2TrFD¯D˜ D¯2 <br />
ij<br />
c i c j Tr F i D j<br />
L<br />
cT 10,<br />
54<br />
6<br />
ijk<br />
c i c j c k Tr F i D j SD k<br />
L<br />
1 2 c T Mch 2 0.<br />
55<br />
4<br />
ijkl<br />
c i c j c k c l Tr F i D j SD k SD l .<br />
46<br />
All contributions to the DSM energy function are therefore<br />
easily calculated from the previous density and Fock/KS<br />
matrices.<br />
2. The trust-region DSM minimization<br />
We minimize the DSM energy functional by the trustregion<br />
method. 12 We thus consider the second-order Taylor<br />
expansion of the DSM energy in Eq. 44 about c 0 . Introducing<br />
the step vector<br />
ccc 0 ,<br />
we obtain<br />
47<br />
E DSM (2) cE 0 c T g 1 2 c T Hc,<br />
48<br />
where the energy, gradient, and Hessian at the expansion<br />
point are given by<br />
E 0 Ec 0 ,<br />
g Ec<br />
c<br />
cc 0<br />
, H 2 Ec<br />
c 2 cc 0<br />
. 49<br />
As starting point c 0 , we choose the density matrix with the<br />
lowest energy E SCF (D i ), usually from the last RH iteration.<br />
The trace condition Eq. 38 imply<br />
n<br />
i1<br />
c i 0.<br />
50<br />
We also introduce a trust region of radius h for E DSM (2) (c)<br />
and require that steps are always taken inside or to the<br />
boundary of this region. To determine a step to the boundary,<br />
we restrict the step to have the length h in the S metric norm<br />
of Eq. 34,<br />
c S 2 <br />
ij<br />
c i M ij c j h 2 . 51<br />
Introducing the undetermined multipliers and for the<br />
trace and step-size constraints, we arrive at the following<br />
Lagrangian for minimization on the boundary of the trust<br />
region:<br />
The optimization of the Lagrangian thus corresponds to the<br />
solution of the following set of linear equations:<br />
HM<br />
1 T<br />
1<br />
0<br />
c<br />
<br />
g 0 ,<br />
56<br />
where the multiplier is iteratively adjusted until the step is<br />
to the boundary of the trust region Eq. 55. The step-length<br />
restriction may be lifted by setting 0, as needed for steps<br />
inside the trust region.<br />
To understand the behavior of the step-length function,<br />
we consider first the generalized eigenvalue problem<br />
<br />
H 1<br />
v 1 T 0 M 0<br />
0 T<br />
v , 57<br />
where 0 is a column vector with zero elements, is a small<br />
positive constant, and the eigenvector is normalized such that<br />
v T v 2 1.<br />
58<br />
We first note that, for a finite , v0. Next, carrying out<br />
block multiplications in Eq. 57, we obtain<br />
Hv1Mv,<br />
1 T v,<br />
59<br />
60<br />
which upon elimination of from the first equation yields<br />
the relation<br />
Hv1 T v1 2 Mv.<br />
61<br />
Since (1 T v)1 is finite, we conclude that, as tends to zero,<br />
the eigenvalue tends to either plus or minus infinity<br />
1/2 . Next, substituting these values of into Eq. 60,<br />
we find that v tends to the zero vector with elements proportional<br />
to 1/2 and that , because of the normalization Eq.<br />
58, tends to 1. In short, the eigenvalue problem Eq. 57<br />
with 0 has two eigenvalues , whose eigenvectors<br />
have zero elements except for the last element, which is<br />
equal to 1. Finally, invoking the Hylleraas–Undheim interlace<br />
theorem, 10,11 we conclude that the remaining n1 finite<br />
eigenvalues of Eq. 57 bisects the n eigenvalues of the reduced<br />
eigenvalue problem<br />
HvMv.<br />
62<br />
Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Self-consistent field method<br />
23<br />
Let us now consider the step length c() S as a function<br />
of . In the diagonal representation of the augmented<br />
matrix in the linear equations Eq. 57, we may write these<br />
equations in the following uncoupled form:<br />
h i m i i i , i1,2,3,...,n1. 63<br />
Here, the h i and m i are the diagonal elements of the Hessian<br />
and metric matrices, respectively, of the generalized eigenvalue<br />
problem Eq. 57, whereas the i and i , respectively,<br />
are the corresponding elements of the solution and gradient<br />
vectors of Eq. 56. Since the last element of the gradient<br />
vector in Eq. 56 is zero, the gradient vector has no contributions<br />
from the eigenvectors with infinite eigenvalues<br />
1 n1 0, 1 n1 64<br />
assuming that the eigenvalues are sorted in increasing order<br />
1 2 ¯ n1 . In the diagonal representation, therefore,<br />
we may write the step norm in the form<br />
c S <br />
i2<br />
n<br />
m i i<br />
2<br />
h i m i 2 .<br />
65<br />
From this expression, we note that the step function consists<br />
of n branches separated by n1 asymptotes at the finite<br />
eigenvalues i . Moreover, it increases monotonically from<br />
zero to infinity as increases from minus infinity and approaches<br />
the lowest finite eigenvalue 2 . Therefore, there is<br />
always one and only one 2 that gives rise to a<br />
step of length h. As shown by Fletcher, 12 this value of <br />
corresponds to the global minimum on the boundary of the<br />
trust region.<br />
In practice, we cannot easily determine the eigenvalues<br />
i of the augmented eigenvalue problem Eq. 57. Instead,<br />
we determine the eigenvalues i of the reduced problem Eq.<br />
62 and restrict our search of to the smaller monotonic<br />
interval 1 . Since 1 2 , it is possible that no<br />
solution exists in this reduced interval. Mostly, however, this<br />
restriction is mild since the two eigenvalues are usually<br />
close. If no solution is found, we choose instead the slightly<br />
shorter step obtained with 1 .<br />
To illustrate how the level-shift parameter in Eq. 56<br />
is determined, we consider the first Fig. 3a and third Fig.<br />
3b DSM step in the eighth iteration of the rhodiumcomplex<br />
calculation in Sec. III. We have plotted the steplength<br />
function c() S as a function of . The plots consist<br />
of a series of branches between asymptotes where <br />
makes the matrix on the left-hand side of Eq. 56 singular.<br />
The lowest eigenvalue 1 is marked with a vertical dashed<br />
line in Figs. 3a and 3b. For minimization, the level-shift<br />
parameter is chosen in the interval min( 1 ,0),<br />
where 1 is the lowest eigenvalue of Eq. 62. The proper<br />
value is found where the step-length function crosses the line<br />
representing the trust radius h, as marked with a cross in Fig.<br />
3a. If the step that minimizes E DSM (2) is inside the trust region,<br />
0 is chosen as marked with a cross in Fig. 3b.<br />
The trust region is updated during the iterative procedure.<br />
FIG. 3. The step-length function c() S is plotted as a function of for<br />
the first a and third b DSM step in the eighth iteration of the rhodium<br />
calculation described in Sec. III. The trust radius h is represented by a<br />
horizontal line. The proper value is marked with a cross.<br />
3. Global optimization of the DSM function<br />
The optimization of the E DSM energy is carried out in the<br />
usual manner, requiring several trust-region steps, each of<br />
which involves the construction of the gradient g and the<br />
Hessian H, and the solution of the modified level-shifted<br />
Newton equations Eq. 56. After p iterations, the density is<br />
calculated from the coefficients<br />
p<br />
c p c (0) c i .<br />
66<br />
i1<br />
However, since E DSM itself is a rather crude model of the<br />
true energy function E SCF , it resembles E SCF only in a small<br />
region about the initial point c (0) . The DSM iterations are<br />
therefore terminated when the total step length c p c (0) <br />
exceeds some preset value k. If a minimum of E DSM is found<br />
inside the trust region c p c (0) k, then the step to the<br />
minimum is taken and the iterations are terminated. This is<br />
often the case.<br />
Occasionally, the iterations start where the lowest eigenvalue<br />
of the Hessian in Eq. 62 is negative. In the course of<br />
the iterations, the Hessian can become positive definite and a<br />
minimum is reached. In a few cases, however, a negative<br />
Hessian eigenvalue may persist, changing little from iteration<br />
to iteration. In our experience, a step along the eigenvector<br />
corresponding to the negative eigenvalue cannot be<br />
trusted. This direction is therefore projected out from the step<br />
and the DSM function is minimized in the orthogonal subspace.<br />
As an illustration, consider the first DSM step of the<br />
tenth SCF iteration of the rhodium-complex calculation in<br />
Sec. III. In Fig. 4, we have, for comparison, plotted the steplength<br />
functions with the negative component kept and projected<br />
out. The level shifts resulting from the two situations<br />
Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
24 J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Thøgersen et al.<br />
FIG. 4. The step-length function c() S is plotted as a function of with<br />
the direction corresponding to the negative Hessian eigenvalue kept — and<br />
projected out - --, respectively. The values resulting from the two<br />
situations are marked with crosses.<br />
are marked with crosses in Fig. 4. The level shift used in the<br />
DSM optimization is, in this particular case, 0.<br />
When the trust-region minimization is terminated, a new<br />
RH iteration is initiated by constructing a new density and<br />
associated Fock matrix<br />
n<br />
n<br />
D¯ c i D i ,<br />
i1<br />
F¯ c i FD i ,<br />
67<br />
i1<br />
where we have used the fact that the Fock matrix is linear in<br />
the density. By construction E DSM (c) is lowered at each iteration<br />
of the trust-region minimization. The total energy<br />
lowering at the pth iteration is given by<br />
E DSM E DSM c p E DSM c (0) .<br />
68<br />
Since E DSM is a local model to the true energy E SCF , the<br />
lowering of E DSM will also lead to a lowering of E SCF provided<br />
the total step is sufficiently short to be in the local<br />
region.<br />
4. Relationship to the DIIS method<br />
The optimal density has previously been determined using<br />
the DIIS scheme of Pulay. 4 In the DIIS method, the improved<br />
density matrix is obtained as a linear combination of<br />
the previous density matrices where the expansion coefficients<br />
are determined by minimizing the norm of the error<br />
vector, using the gradients of the previous iterations as error<br />
vectors. To highlight the difference between TRDSM and<br />
DIIS, we give below an alternative derivation of the DIIS<br />
algorithm.<br />
In an SCF calculation, the electronic gradient with the<br />
averaged density matrix D¯ in Eq. 37 may be expressed in<br />
the form, 3<br />
gD¯4D¯SFD¯FD¯SD¯.<br />
69<br />
To determine the best linear combination of densities D i ,we<br />
minimize the norm of the squared gradient<br />
gD¯ 2 16 TrD¯SFD¯FD¯SD¯2 .<br />
70<br />
Inserting the expansion Eq. 37, we obtain a quartic polynomial<br />
in c i ,<br />
FIG. 5. The convergence of calculations on the rhodium complex using<br />
AhlrichsVDZ basis Ref. 16 combined with STO-3G for Rh. The error in<br />
the total energy is given for the TRSCF, the standard DIIS, and the QRHF<br />
method as a function of the iteration number. Furthermore results are given<br />
where DIIS is applied after nine TRSCF iterations.<br />
gD¯ 2 16 Tr <br />
i<br />
c i gD i <br />
i, j<br />
c i c j D i SFD j D i <br />
2<br />
FD j D i SD i . 71<br />
To simplify this expression, we neglect all cubic and quartic<br />
terms<br />
gD¯ 2 app c i c j gD i gD j . 72<br />
i, j<br />
Optimization of Eq. 72 subject to the constraint Eq. 38<br />
gives the DIIS expression of the expansion coefficients in<br />
Eq. 37.<br />
III. APPLICATIONS<br />
In this section, we examine the convergence characteristics<br />
of the TRSCF algorithm. First, we consider a rhodiumcomplex<br />
optimization as an example of a difficult case; next,<br />
as a simpler case, we consider a calculation on H 2 O with the<br />
OH bond lengths stretched to double length. For comparison,<br />
we also give the convergence characteristics of the DIIS<br />
algorithm 4 and the quadratically convergent restricted step<br />
Hartree–Fock QRHF method. 13,14 All calculations are carried<br />
out using a local version of the DALTON program<br />
package. 17<br />
A. The rhodium complex calculation<br />
In Fig. 5, we have plotted the error in the energy at each<br />
iteration of TRSCF, DIIS, and QRHF optimizations of the<br />
rhodium complex with the geometry specified in Table I using<br />
the AhlrichsVDZ basis 16 combined with STO-3G on Rh.<br />
The starting orbitals have been obtained from diagonalizing<br />
the one-electron Hamiltonian.<br />
Clearly, the QRHF and DIIS methods do not work in this<br />
case. In particular, the DIIS method is unable to handle the<br />
global part of the optimization, where the initially indefinite<br />
Hessian changes its structure and becomes positive definite.<br />
Since the DIIS method relies solely on gradient information,<br />
it does not see the negative eigenvalues and produces steps<br />
that may or may not be in the right direction, leading to<br />
Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Self-consistent field method<br />
25<br />
TABLE I. Geometry of the rhodium complex.<br />
x y z<br />
Cl 2.783200 0.000000 0.000000<br />
C 0.000000 1.750000 0.000000<br />
C 0.000000 1.750000 0.000000<br />
C 2.510000 1.247077 0.000000<br />
C 2.510000 1.247077 0.000000<br />
C 3.960000 1.247077 0.000000<br />
C 3.960005 1.247074 0.000000<br />
C 4.685005 0.008663 0.000000<br />
C 6.585566 1.381712 0.000000<br />
C 7.224161 0.912908 0.000000<br />
H 1.802335 2.074803 0.000000<br />
H 1.965500 2.190178 0.000000<br />
H 4.323007 2.273792 0.000000<br />
H 4.504500 2.190178 0.000000<br />
H 6.215281 1.889842 0.889165<br />
H 6.215281 1.889842 0.889165<br />
H 7.169607 1.539271 0.889165<br />
H 7.169607 1.539271 0.889165<br />
H 7.674455 1.397244 0.000000<br />
H 8.164527 0.363696 0.000000<br />
N 1.790000 0.000000 0.000000<br />
N 6.124978 0.017359 0.000000<br />
O 0.122018 3.144673 0.000000<br />
O 0.122018 3.144673 0.000000<br />
Rh 0.0000000 0.000000 0.000000<br />
divergence. Moreover, in this DIIS calculation, no level<br />
shifts have been applied in the RH part of the optimization,<br />
again leading to steps in the wrong direction. In short, the<br />
DIIS method cannot be used for optimizations as complex as<br />
the rhodium calculation. However, if the DIIS method is<br />
started after the SCF local region has been reached by the<br />
TRSCF algorithm, then the DIIS algorithm converges nicely<br />
since the Hessian has the correct structure. In Fig. 5, we have<br />
also plotted the errors in a calculation where the DIIS<br />
method is started after nine TRSCF iterations. It then converges<br />
in roughly the same manner as the pure TRSCF<br />
method.<br />
In the QRHF calculation, the total energy reduces slowly<br />
and monotonically during the iteration procedure. However,<br />
the resulting energy lowering is much too slow to be of any<br />
practical value. Thus, after 14 iterations, the energy has decreased<br />
by only 37 E h , which is insignificant compared with<br />
the 237 E h needed for convergence.<br />
To understand the difference between the QRHF and<br />
TRSCF optimizations, let us recall the main features of the<br />
two methods. Since the QRHF method is based on a local<br />
quadratic model of E SCF , the QRHF orbital rotations are<br />
correct to first order. However, no global information about<br />
E SCF is available and only small steps can be trusted in the<br />
optimization. When QRHF steps are taken to the boundary of<br />
the trust region, level-shifted Newton equations are solved<br />
with the Hessian of Eq. 26. By contrast, in the TRSCF<br />
method, the RH optimization is based on the local energy<br />
function E RH , which has the same gradient as E SCF but a<br />
slightly different Hessian—compare Eqs. 25 and 26.<br />
More important, E RH shares some global features with E SCF .<br />
In the RH diagonalization step, a global optimization is carried<br />
out for E RH . When an RH step is taken to the boundary<br />
of the trust region of E RH , a level-shifted Fock eigenvalue<br />
equation is solved where the level-shift parameter effectively<br />
introduces a shift in the Hessian of E RH Eq. 25. The similarity<br />
of the Hessians of E SCF and E RH makes the directions<br />
of the steps taken by the QRHF and RH methods very similar<br />
for sufficiently large level shifts, the essential difference<br />
being the global character of the RH steps and the local<br />
character of the QRHF steps. It is this local character of the<br />
QRHF steps that prevents the QRHF method from being<br />
efficient for systems as difficult as the rhodium complex.<br />
Let us now consider the individual TRSCF iterations as<br />
listed in Table II. The optimization begins with orbitals that<br />
diagonalize the one-electron Hamiltonian, giving a start energy<br />
of 5 466.530 208 964 75 E h . In Table II, the SCF energy<br />
lowering E SCF is divided into two contributions, one<br />
from the RH step and one from the DSM step. Recalling<br />
from Eq. 24 that D() n is the purified D¯n ,<br />
E DSM SCFn1<br />
E SCF D n E SCF D n 73<br />
becomes a realistic measure of the energy change in the<br />
DSM part of the iteration. Similarly,<br />
RH<br />
E SCFn1<br />
E SCF D n1 E SCF D n <br />
74<br />
becomes a realistic measure of the change in the RH part.<br />
Clearly, the sum of Eqs. 73 and 74 is equal to the total<br />
change E SCF . These exact energy changes should be compared<br />
with the energy changes in the local models E RH and<br />
E DSM given in Eqs. 27 and 68, respectively, also listed<br />
in the table. Note that, to obtain E SCF D(), we must carry<br />
out an additional energy calculation, which is here done only<br />
for the purpose of this analysis.<br />
For the DSM method, we have also indicated in Table II<br />
how the trust-region optimization was terminated (exit DSM ):<br />
M indicates that a minimum was determined in the full<br />
space; PM indicates that a minimum was obtained in the<br />
reduced space with the direction corresponding to the negative<br />
Hessian eigenvalue projected out; and L indicates that<br />
the iterations were terminated because the maximum step<br />
length k was reached. For the RH steps, we have also listed<br />
the level-shift parameter opt and the corresponding overlap<br />
a( opt ) of Eq. 22.<br />
The TRSCF iterations converge linearly, with a reduction<br />
in the error of about a factor 2–4 at each iteration.<br />
Moreover, the energy lowerings of the local models E RH<br />
and E DSM are in good agreement with the actual SCF energy<br />
changes, in the local as well as in the global part of the<br />
optimization. Both the predicted and the actual energy<br />
changes are negative in all iterations. In the global region,<br />
E RH<br />
SCF is usually significantly larger than E DSM SCF , whereas,<br />
in the local region, they have similar sizes.<br />
Except for three iterations in the global part of the SCF<br />
optimization, the DSM trust-region method finds a minimum<br />
within the step-length limit k. In the intermediate region, we<br />
encounter components of the step vector that cannot be<br />
trusted and have been projected out as described in Sec.<br />
II B 3. The DSM iterations then reach a minimum in the orthogonal<br />
subspace.<br />
Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
26 J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Thøgersen et al.<br />
TABLE II. Convergence details for the TRSCF calculation on the rhodium complex using AhlrichsVDZ basis combined with STO-3G on Rh. Energies given<br />
in atomic units.<br />
DSM<br />
It. E SCF E SCF<br />
E DSM RH<br />
E SCF<br />
E RH RH<br />
opt<br />
a( RH opt )<br />
Exit DSM<br />
1 18.94647615033 0.00000000009 0.00000000000 18.94647615024 19.21320649447 17.47 0.99382<br />
2 45.45858825211 8.95768890498 7.10309975657 36.50089934714 38.75977508968 14.44 0.98630 M<br />
3 59.81037380731 12.93651600370 8.85502694483 46.87385780361 51.53623100635 11.68 0.97940 M<br />
4 63.34486220663 24.25263285599 21.63716388564 39.09222935064 48.71127100240 7.28 0.97288 L<br />
5 30.22875461345 12.81783382045 12.23686585427 17.41092079300 21.38161936631 2.63 0.97384 L<br />
6 11.56061105704 5.64904464510 4.74940263974 5.91156641194 7.60366893231 0.90 0.97552 L<br />
7 4.61334906659 1.90220393646 1.51155035145 2.71114513013 3.30373325651 0.24 0.97792 M<br />
8 2.16270415323 0.44637212140 0.44849600108 1.71633203184 1.49977814394 0.07 0.97876 M<br />
9 0.60805181167 0.29078332276 0.21298647367 0.31726848890 0.60770324492 1.30 0.99823 M<br />
10 0.16667264229 0.00294157325 0.00194422453 0.16373106904 0.22325882198 0.70 0.99934 PM<br />
11 0.05893002647 0.00782290321 0.00662821837 0.05110712327 0.03977595787 0.00 0.99955 PM<br />
12 0.01821537974 0.00935849099 0.00823957093 0.00885688875 0.00980424864 0.00 0.99989 PM<br />
13 0.00829012952 0.00417695835 0.00382848541 0.00411317118 0.00413942925 0.00 0.99995 PM<br />
14 0.00336772651 0.00246626574 0.00222734467 0.00090146077 0.00176102559 0.00 0.99998 PM<br />
15 0.00144190516 0.00106346997 0.00091468267 0.00037843519 0.00066804948 0.00 1.00000 PM<br />
16 0.00049317801 0.00040627140 0.00039284830 0.00008690661 0.00013209160 0.00 1.00000 PM<br />
17 0.00005633666 0.00003203569 0.00002863768 0.00002430097 0.00003124073 0.00 1.00000 PM<br />
18 0.00001495119 0.00000990523 0.00000917530 0.00000504595 0.00000926762 0.00 1.00000 PM<br />
19 0.00000549749 0.00000312992 0.00000277915 0.00000236757 0.00000276315 0.00 1.00000 M<br />
20 0.00000196603 0.00000126150 0.00000121565 0.00000070454 0.00000067573 0.00 1.00000 M<br />
21 0.00000038264 0.00000022841 0.00000020736 0.00000015423 0.00000016335 0.00 1.00000 M<br />
22 0.00000008720 0.00000004496 0.00000004404 0.00000004225 0.00000004536 0.00 1.00000 M<br />
23 0.00000002788 0.00000001171 0.00000001049 0.00000001617 0.00000001603 0.00 1.00000 M<br />
24 0.00000001286 0.00000000813 0.00000000800 0.00000000472 0.00000000514 0.00 1.00000 M<br />
25 0.00000000294 0.00000000131 0.00000000127 0.00000000163 0.00000000186 0.00 1.00000 M<br />
26 0.00000000119 0.00000000073 0.00000000072 0.00000000045 0.00000000056 0.00 1.00000 M<br />
27 0.00000000035 0.00000000019 0.00000000019 0.00000000016 0.00000000022 0.00 1.00000 M<br />
In the beginning of the SCF optimization, large level<br />
shifts are applied in the RH diagonalization to ensure a continuous<br />
development of the MOs. Thus, in the first few iterations,<br />
the overlap constant a( opt ) is significantly larger than<br />
the minimum accepted overlap of 0.975. However, the levelshift<br />
parameter decreases during the subsequent SCF iterations<br />
until, in the local region, no level shift is required and<br />
conventional RH iterations are carried out. To summarize,<br />
the TRSCF method gives a monotonic and significant energy<br />
lowering both in the RH and in the DSM part of the optimization.<br />
B. The water calculation<br />
To demonstrate the performance of the TRSCF method<br />
in a simpler case, we consider optimizations of H 2 O with the<br />
OH bonds stretched to twice the equilibrium value 195.10<br />
pm. In Figs. 6a and 6b, we have plotted the errors in the<br />
energy during TRSCF, DIIS, and QRHF optimizations in the<br />
cc-pVDZ basis. 15 In Fig. 6a, the initial guess of the orbitals<br />
are the Hückel orbitals as implemented in the DALTON program.<br />
With these initial orbitals, the TRSCF and DIIS methods<br />
converge in a very similar manner to within a threshold<br />
of 10 10 in ten iterations. In this case, therefore, gradient<br />
information is sufficient for convergence. Although the<br />
QRHF method outperforms the TRSCF and DIIS methods in<br />
terms of iterations, this is of no practical value since, in each<br />
QRHF step, about the same number of new Fock matrices<br />
are needed to solve the Newton equations as is required to<br />
find the optimized Hartree–Fock wave function with the<br />
TRSCF and DIIS methods.<br />
FIG. 6. The convergence of calculations on water with stretched bonds<br />
using the cc-pVDZ basis and a aHückel start guess and b a one-electron<br />
Hamiltonian start guess. The error in the total energy is given for the<br />
TRSCF, the standard DIIS and the QRHF method as a function of the<br />
iteration number.<br />
Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Self-consistent field method<br />
27<br />
In Fig. 6b, we have plotted the error of the energy in<br />
H 2 O optimizations starting with the orbitals that diagonalize<br />
the one-electron Hamiltonian. In this case, convergence to<br />
10 10 is reached in 13 iterations with the TRSCF method and<br />
in 18 iterations with the DIIS method. The main reason for<br />
the better performance of the TRSCF algorithm is that, in the<br />
global region, it gives a significant energy lowering in each<br />
step, whereas the DIIS algorithm shows a much less systematic<br />
behavior.<br />
IV. CONCLUSION<br />
A conventional SCF optimization consists of a sequence<br />
of iterations, each of which begins with a Roothaan–Hall<br />
RH diagonalization step, where a Fock/KS matrix is diagonalized<br />
to obtain an improved density matrix, followed by an<br />
averaging step, where the optimal density matrix is determined<br />
in the subspace of the density matrices of the previous<br />
RH diagonalization steps. In this paper, we have introduced a<br />
trust-region SCF TRSCF algorithm, where improvements<br />
have been made to both the diagonalization and the averaging<br />
steps. In both steps, local energy model functions are<br />
constructed which have the same gradient as the true energy<br />
function E SCF but approximate Hessians. Recognizing the<br />
locality of these energy functions, trust regions are introduced<br />
as regions where they represent a good approximation<br />
to E SCF and only steps inside these trust regions are allowed.<br />
For the density-subspace minimization step, an energy<br />
function is constructed and minimized with respect to the<br />
coefficients of the linear combination of the previous density<br />
matrices. Its functional form is based on a purified averaged<br />
density matrix that is idempotent to first order. The advantages<br />
of this model compared to EDIIS is the built-in density<br />
purification, which helps to avoid problems arising from<br />
non-idempotency. In addition, information about the Hessian<br />
is extracted and used, leading to a monotonic and stable convergence.<br />
The RH diagonalization step corresponds to a minimization<br />
of an energy function E RH that represents the sum of the<br />
orbital energies of the occupied MOs. Since this very simple<br />
energy function is a local model function for E SCF , large<br />
steps cannot be trusted. To generate steps to the boundary of<br />
the trust region, level-shifted RH equations are solved where<br />
the level shifts are determined in a systematic and general<br />
manner, leading to a decrease in the model energy at each<br />
iteration. If sufficiently small steps are taken, a similar decrease<br />
is obtained in the SCF energy.<br />
In the TRSCF algorithm a few diagonalizations are required<br />
in each SCF iteration to obtain solutions for the levelshifted<br />
RH equations in order to determine the optimal density<br />
matrix. The number of diagonalizations may be reduced<br />
in the local SCF region solving RH equations with zero level<br />
shift with little consequence for the convergence. In the local<br />
SCF region one may also safely use the DIIS algorithm if<br />
desired.<br />
The advantages of the TRSCF algorithm are demonstrated<br />
by calculations on a rhodium complex and on a water<br />
molecule with stretched bonds. In the rhodium-complex optimization,<br />
the TRSCF algorithm converges monotonically<br />
and fast, with a significant decrease in the energy in both the<br />
RH part and DSM part at each iteration. By contrast, convergence<br />
is not obtained with the DIIS method for this complex.<br />
For the simpler water molecule, the TRSCF and DIIS methods<br />
behave in a more similar manner, the TRSCF method<br />
converging slightly faster than the DIIS method when the<br />
initial orbitals are obtained by diagonalizing the one-electron<br />
Hamiltonian. With the Hückel guess, the water convergence<br />
is essentially obtained in the same number of steps for<br />
the TRSCF and DIIS methods. In short, it appears that the<br />
TRSCF algorithm, and its use of local energy model functions<br />
to obtain significant reductions in E SCF in each iteration,<br />
constitutes a significant step towards a black-box optimization<br />
of SCF wave functions.<br />
ACKNOWLEDGMENTS<br />
This work has been supported by the Danish Natural<br />
Research Council Grant No. 21-02-0467 and the Carlsbergfondet.<br />
We also acknowledge support from the Danish Center<br />
for Scientific Computing DCSC. D.Y. acknowledges<br />
support from the Robert A. Welch Foundation, Grant No.<br />
A-770.<br />
1 C. C. J. Roothaan, Rev. Mod. Phys. 23, 691951.<br />
2 G. G. Hall, Proc. R. Soc. London, Ser. A 205, 5411951.<br />
3 T. Helgaker, P. Jørgensen, and J. Olsen, Molecular Electronic Structure<br />
Theory Wiley, Chichester, 2000.<br />
4 P. Pulay, Chem. Phys. Lett. 73, 393 1980; J. Comput. Chem. 3, 556<br />
1982.<br />
5 K. N. Kudin, G. E. Scuseria, and E. Cances, J. Chem. Phys. 116, 8255<br />
2002.<br />
6 R. McWeeny, Rev. Mod. Phys. 32, 335 1960.<br />
7 R. W. Nunes and D. Vanderbilt, Phys. Rev. B 50, 176111994.<br />
8 A. D. Daniels and G. E. Scuseria, Phys. Chem. Chem. Phys. 2, 2173<br />
2000.<br />
9 C. Ochsenfeld and M. Head-Gordon, Chem. Phys. Lett. 270, 3991997.<br />
10 E. A. Hylleraas and B. Undheim, Z. Phys. 65, 759 1930.<br />
11 J. K. L. MacDonald, Phys. Rev. 43, 830 1933.<br />
12 R. Fletcher, Practical Methods of Optimization, 2nd ed. Wiley, New<br />
York, 1987.<br />
13 G. B. Bacskay, Chem. Phys. 61, 385 1981.<br />
14 H. J. Aa. Jensen and P. Jørgensen, J. Chem. Phys. 80, 1204 1984.<br />
15 T. H. Dunning, J. Chem. Phys. 90, 1007 1989.<br />
16 A. Schafer, H. Horn, and R. Ahlrichs, J. Chem. Phys. 97, 25711992.<br />
17 DALTON, a molecular electronic structure program, Release 1.2 2001,<br />
written by T. Helgaker, H. J. Aa. Jensen, P. Jørgensen et al. http://<br />
www.kjemi.uio.no/software/dalton.<br />
Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
Part 1<br />
The Trust-region Self-consistent Field Method in Kohn-Sham Density-functional Theory,<br />
L. Thøgersen, J. Olsen, A. Köhn, P. Jørgensen, P. Sałek, and T. Helgaker,<br />
J. Chem. Phys. 123, 074103 (2005)
THE JOURNAL OF CHEMICAL PHYSICS 123, 074103 2005<br />
The trust-region self-consistent field method<br />
in Kohn–Sham density-functional theory<br />
Lea Thøgersen, a Jeppe Olsen, Andreas Köhn, and Poul Jørgensen<br />
Department of Chemistry, University of Århus, DK-8000 Århus C, Denmark<br />
Paweł Sałek<br />
Laboratory of Theoretical Chemistry, The Royal Institute of Technology, Roslagstullbacken 15,<br />
Stockholm, S-10691 Sweden<br />
Trygve Helgaker<br />
Department of Chemistry, University of Oslo, P.O. Box 1033 Blindern, N-0315 Norway<br />
Received 20 May 2005; accepted 7 June 2005; published online 22 August 2005<br />
The trust-region self-consistent field TRSCF method is extended to the optimization of the Kohn–<br />
Sham energy. In the TRSCF method, both the Roothaan–Hall step and the density-subspace<br />
minimization step are replaced by trust-region optimizations of local approximations to the Kohn–<br />
Sham energy, leading to a controlled, monotonic convergence towards the optimized energy.<br />
Previously the TRSCF method has been developed for optimization of the Hartree–Fock energy,<br />
which is a simple quadratic function in the density matrix. However, since the Kohn–Sham energy<br />
is a nonquadratic function of the density matrix, the local energy functions must be generalized for<br />
use with the Kohn–Sham model. Such a generalization, which contains the Hartree–Fock model as<br />
a special case, is presented here. For comparison, a rederivation of the popular direct inversion in<br />
the iterative subspace DIIS algorithm is performed, demonstrating that the DIIS method may be<br />
viewed as a quasi-Newton method, explaining its fast local convergence. In the global region the<br />
convergence behavior of DIIS is less predictable. The related energy DIIS technique is also<br />
discussed and shown to be inappropriate for the optimization of the Kohn–Sham energy. © 2005<br />
American Institute of Physics. DOI: 10.1063/1.1989311<br />
I. INTRODUCTION<br />
Computational methods rigorously based on the laws of<br />
quantum mechanics are becoming an evermore important<br />
component of scientific and technological progress in many<br />
branches of natural science, including biochemistry and materials<br />
science. Quantum-chemical codes, in particular, are<br />
today routinely used to perform calculations on molecules<br />
containing hundreds of atoms. Furthermore, with the advent<br />
of density-functional theory DFT methods, molecules with<br />
more complex electronic structure and larger parts of potential<br />
surfaces may be calculated than with the Hartree–Fock<br />
method. Most of these calculations are performed by nonspecialists,<br />
not trained in quantum chemistry or in numerical<br />
simulations. An important challenge is thus to develop<br />
quantum-chemical techniques that allow the user to focus on<br />
the physical and chemical interpretations of the results of the<br />
calculations by eliminating or at least minimizing the need to<br />
understand the details of the numerical algorithms.<br />
A central numerical task of the Hartree–Fock wavefunction<br />
theory and Kohn–Sham DFT is the minimization of<br />
the electronic energy function with respect to the density<br />
matrix of a single-determinant reference wave function. In<br />
its original formulation, the self-consistent field SCF<br />
method for optimizing Hartree–Fock and Kohn–Sham energies<br />
E SCF consists of a sequence of Roothaan–Hall<br />
a Electronic mail: lea@chem.au.dk<br />
iterations. 1,2 At each iteration, the Fock/Kohn–Sham matrix<br />
is first constructed from the current approximate atomicorbital<br />
AO density matrix; next, an improved AO density<br />
matrix is generated from the molecular orbitals MOs obtained<br />
by diagonalization of this Fock/Kohn–Sham matrix.<br />
Unfortunately, this simple SCF scheme converges only in<br />
simple cases. To improve upon its convergence, the optimization<br />
is modified by constructing the Fock/Kohn–Sham matrix<br />
not directly from the AO density matrix of the last diagonalization<br />
but rather from an averaged density matrix,<br />
calculated in the subspace of the density matrices of the current<br />
and previous iterations. In practice, the averaged AO<br />
density matrix is calculated by the direct inversion in iterative<br />
subspace DIIS method of Pulay, 3 nowadays implemented<br />
in most electronic-structure programs. In the DIIS<br />
method, the averaged density matrix is a linear combination<br />
of density matrices, where the expansion coefficients are obtained<br />
by minimizing the norm of the corresponding linear<br />
combination of the gradients.<br />
Over the years, several attempts have been made to improve<br />
upon the DIIS method. In particular, Kudin et al. have<br />
proposed the energy DIIS EDIIS method, 4 where the<br />
gradient-norm minimization is replaced by a minimization of<br />
an approximation to the true energy function E SCF , where the<br />
expansion coefficients of the averaged density matrix are<br />
used as variational parameters. For the special case of two<br />
density matrices such an approach was first developed by<br />
Karlström. 5<br />
0021-9606/2005/1237/074103/17/$22.50<br />
123, 074103-1<br />
© 2005 American Institute of Physics<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-2 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />
Recently, we introduced the trust-region self-consistent<br />
field TRSCF method 6 for SCF density-matrix optimizations.<br />
In the TRSCF method, the diagonalization step trustregion<br />
Roothaan–Hall TRRH and the density-optimization<br />
step trust-region density-subspace minimization TRDSM<br />
are realized as minimizations of local energy model functions<br />
of E SCF . The local energy functions are expanded about<br />
the current AO density matrix and have the same gradients as<br />
the true energy E SCF but approximate Hessians. In the course<br />
of the SCF optimization, each step is restricted to be within<br />
the trust region of the current model, that is, within the region<br />
where the model accurately represents the true energy<br />
function. In TRDSM the steplength is controlled through a<br />
standard trust-region optimization 7 and in TRRH the<br />
steplength is controlled through a level shift. 8 In this manner,<br />
a reliable and systematic energy lowering of E SCF is ensured<br />
at each iteration.<br />
In the first implementation of the TRSCF method, the<br />
focus was on the optimization of the Hartree–Fock energy. In<br />
this paper, the focus is on the optimization of the Kohn–<br />
Sham energy. In the Kohn–Sham theory, the energy difference<br />
between the highest occupied MO and lowest unoccupied<br />
MO the HOMO-LUMO gap is usually much smaller<br />
than that in the Hartree–Fock theory, making the optimization<br />
more difficult. Here, we investigate the consequences of<br />
this smaller HOMO-LUMO gap for the global and local convergence<br />
characteristics for the Roothaan–Hall optimization<br />
step. In the Hartree–Fock theory, the energy function is quadratic<br />
in the density matrix, whereas, in the Kohn–Sham<br />
theory, it becomes a nonquadratic function because of the<br />
exchange-correlation contribution to the energy. In our previous<br />
implementation of the TRSCF method, the model<br />
function used to determine the averaged density matrix was<br />
specially designed for the Hartree–Fock theory, assuming<br />
that the energy depends quadratically on the density matrix.<br />
For the Kohn–Sham theory, the model function must be generalized.<br />
Such a generalization is presented here.<br />
In this paper, the DIIS algorithm is also rederived to<br />
understand better when it can safely be applied. In particular,<br />
we find that the DIIS method may be viewed as a quasi-<br />
Newton method in the local region, explaining its fast local<br />
convergence. The convergence characteristics of the DIIS<br />
method in the global region are less predictable.<br />
Recently, and along the same lines as our TRRH method,<br />
Francisco et al. introduced their globally convergent trustregion<br />
methods for SCF, 9 where the standard fixed-point<br />
Roothaan–Hall step is replaced by a trust-region optimization<br />
of a model energy function. Any acceleration scheme,<br />
such as DIIS, EDIIS, and the TRDSM method, can then be<br />
combined with this method.<br />
After an introduction to the SCF problem in Sec. II, we<br />
examine the Roothaan–Hall scheme in Sec. III. In particular,<br />
we identify the model energy function that is effectively being<br />
optimized in the diagonalization step, demonstrating how<br />
convergence can be improved upon by level shifting. In Sec.<br />
IV, we consider the density-matrix averaging step. We establish<br />
the model energy function of the weights of the density<br />
matrices and perform an order analysis of the resulting<br />
scheme, demonstrating that it represents a balanced approximation;<br />
next, we compare our local energy function with the<br />
EDIIS function, showing that the latter misses a term that is<br />
necessary for calculating the correct gradient. After a brief<br />
discussion of configuration shifts in Sec. V, we present in<br />
Sec. VI a rederivation of the DIIS algorithm, establishing its<br />
equivalence with the quasi-Newton method in the local region.<br />
Section VII contains some convergence examples for<br />
the DFT calculations, using the TRSCF algorithm and some<br />
of its alternatives. Finally, Sec. VIII contains some concluding<br />
remarks.<br />
II. THE KOHN–SHAM ENERGY AND THE<br />
ROOTHAAN–HALL METHOD<br />
For a closed-shell system with N/2 electron pairs, the<br />
Kohn–Sham energy excluding the nuclear-nuclear repulsion<br />
contribution is given by 10<br />
E KS D =2TrhD +TrDGD + E XC D.<br />
Here D is the scaled one-electron density matrix in the AO<br />
basis, D= 1 2 DAO ; h is the one-electron Hamiltonian matrix in<br />
this basis; and the elements of GD are given by<br />
G D =2<br />
<br />
g D − g D ,<br />
<br />
where g are the two-electron AO integrals. The first term<br />
in Eq. 2 represents the Coulomb contribution and the second<br />
term the contribution from exact exchange, with =1 in<br />
the Hartree–Fock theory, =0 in the pure DFT, and 0 in<br />
the hybrid DFT. The exchange-correlation energy E XC D in<br />
Eq. 1 is a functional of the electron density. In the localdensity<br />
approximation LDA, the exchange-correlation energy<br />
is local in the density, whereas, in the generalized gradient<br />
approximation GGA, it is also local in the squared<br />
density gradient, that is, it may be expressed as<br />
E XC D = fx,xdx.<br />
Here the electron density x and its squared gradient norm<br />
x are given by<br />
x = T xDx,<br />
x = x · x,<br />
1<br />
2<br />
3<br />
4a<br />
4b<br />
where x is a column vector containing the AOs. We note<br />
that the exchange-correlation energy density fx,x in<br />
Eq. 3 is a nonlinear and nonquadratic function of x and<br />
x. In the following, we shall therefore rely on an expansion<br />
of E XC D around some reference density matrix D 0 ,<br />
E XC D = E XC D 0 + D − D 0 T 1<br />
E XC<br />
+ 1 2 D − D 0 T E 2 XC D − D 0 + ¯ , 5<br />
where the derivatives E n XC<br />
have been evaluated at D=D 0 and<br />
where, for convenience, we have used a vector-matrix notation<br />
for D, E 1 XC<br />
, and E 2 XC<br />
.<br />
The first derivative of E KS D with respect to the density<br />
matrix D is then given by<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-3 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />
E 1 KS D = E KSD<br />
=2FD, 6<br />
D<br />
where we have introduced the Fock/Kohn–Sham matrix,<br />
FD = h + GD + 1 2 E XC<br />
1 D.<br />
We note, that for the energy in Eq. 1 to be a valid Kohn–<br />
Sham energy, the density matrix D must satisfy the symmetry,<br />
trace, and idempotency conditions,<br />
D T = D,<br />
Tr DS = N 2 ,<br />
DSD = D,<br />
7<br />
8a<br />
8b<br />
8c<br />
where S is the AO overlap matrix. Therefore, we cannot<br />
carry out a free minimization of the total energy in Eq. 1,<br />
but must restrict ourselves to those changes in the density<br />
matrix that comply with these requirements.<br />
The Kohn–Sham energy E KS is traditionally optimized<br />
self-consistently by fixed-point iterations. From the current<br />
approximation D 0 to the density matrix, the Kohn–Sham matrix<br />
FD 0 is calculated from Eq. 7, followed by the solution<br />
of the Roothaan–Hall generalized eigenvalue<br />
equations: 1,2<br />
FD 0 C occ = SC occ ,<br />
where C occ is the set of occupied MOs and is a diagonal<br />
matrix containing the associated eigenvalues orbital energies.<br />
An improved density matrix is next calculated from the<br />
occupied MOs as<br />
D = C occ C T occ , 10<br />
and the Roothaan–Hall fixed-point iteration is established by<br />
constructing the Kohn–Sham matrix FD from this density<br />
matrix, followed by diagonalization according to Eq. 9.<br />
Note that, since<br />
C occ UU T C T<br />
occ = C occ C T occ , 11<br />
where U is unitary, the Kohn–Sham density matrix in Eq.<br />
10 and hence the energy are invariant to unitary transformations<br />
among the occupied MOs.<br />
The naive Roothaan–Hall fixed-point iteration outlined<br />
above converges only in simple cases. To improve upon this<br />
scheme, the new Kohn–Sham matrix is usually not calculated<br />
directly from the density matrix obtained by diagonalization<br />
of the previous Kohn–Sham matrix, but rather from<br />
the density matrix obtained by diagonalizing some linear<br />
combinations of the current and n previous Kohn–Sham matrices,<br />
n<br />
F¯ = F0 + c i FD i .<br />
12<br />
i=0<br />
Typically, the coefficients c i are obtained by the DIIS method<br />
as the weights of an improved density matrix,<br />
9<br />
n<br />
D¯ = D 0 + c i D i .<br />
i=0<br />
13<br />
Upon diagonalization of F¯ according to Eq. 9, the new<br />
density matrix is obtained from Eq. 10, thereby establishing<br />
the iterations. In general, the averaged density matrix in<br />
Eq. 13 is not idempotent and therefore does not represent a<br />
valid density matrix; moreover, since the Kohn–Sham matrix<br />
unlike the Fock matrix is nonlinear in the density matrix,<br />
the averaged Kohn–Sham matrix in Eq. 12 is different from<br />
FD¯ . For these reasons, we cannot associate the averaged<br />
Kohn–Sham matrix in Eq. 12 uniquely with a valid Kohn–<br />
Sham matrix. Usually, this does not matter much since the<br />
subsequent diagonalization of the Kohn–Sham matrix nevertheless<br />
produces a valid density matrix according to Eq. 10.<br />
In the following, we shall disregard the complications arising<br />
from the use of the averaged Kohn–Sham matrix in Eq. 12,<br />
noting that the errors introduced by this approach may easily<br />
be corrected for, if necessary.<br />
In the remainder of this paper, we discuss the TRSCF<br />
method, which differs from the traditional SCF scheme by<br />
the consistent use of trust-region techniques for optimization<br />
control, both in the Roothaan–Hall diagonalization step in<br />
Eq. 9 and in the construction of the averaged density matrix<br />
in Eq. 13. In particular, the traditional Roothaan–Hall eigenvalue<br />
problem is replaced by a level-shifted eigenvalue<br />
problem, where the level shift is determined from trustregion<br />
considerations, resulting in the TRRH step. Similarly,<br />
the averaged density matrix is determined by a TRDSM<br />
technique rather than by the traditional DIIS method. As we<br />
shall see, the combined use of the TRRH and TRDSM<br />
schemes in the TRSCF method leads to a highly efficient and<br />
robust SCF scheme, characterized, in its most robust implementation,<br />
by a monotonic convergence towards the optimized<br />
Kohn–Sham energy.<br />
III. TRUST-REGION ROOTHAAN–HALL OPTIMIZATION<br />
A. The trust-region Roothaan–Hall method<br />
We begin by noting that the solution of the traditional<br />
Roothaan–Hall eigenvalue problem in Eq. 9 may be regarded<br />
as the minimization of the sum of the energies of the<br />
occupied MOs, 11<br />
E RH D =2 i =2TrF 0 D,<br />
14<br />
i<br />
subject to MO orthonormality constraints,<br />
C T occ SC occ = I N/2 , 15<br />
where F 0 is typically obtained as a weighted sum of the<br />
Kohn–Sham matrices such as F¯ in Eq. 12. Since Eq. 14<br />
represents a crude model of the true Kohn–Sham energy<br />
with the same first-order term but different zero- and<br />
second-order terms as discussed in Sec. III B, it has a rather<br />
small trust radius. A global minimization of E RH D, asaccomplished<br />
by the solution of the Roothaan–Hall eigenvalue<br />
problem in Eq. 9, may therefore easily lead to steps that are<br />
longer than the trust radius and hence unreliable. To avoid<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-4 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />
such steps, we shall impose on the optimization of Eq. 14<br />
the constraint that the new density matrix D does not differ<br />
much from the old matrix D 0 , that is, the S norm of the<br />
density difference should be equal to a small number ,<br />
D − D 0 2 S =TrD − D 0 SD − D 0 S =−2TrD 0 SDS + N<br />
= . 16<br />
The optimization of Eq. 14 subject to the constraints in<br />
Eqs. 15 and 16 may be carried out by introducing the<br />
Lagrangian<br />
L =2TrF 0 D −2Tr DSD 0 S − 1 2N − <br />
−2TrC T occ SC occ − I N/2 , 17<br />
where is the undetermined multiplier associated with the<br />
constraint in Eq. 16, whereas the symmetric matrix contains<br />
the multipliers associated with the MO orthonormality<br />
constraints. Differentiating this Lagrangian with respect to<br />
the MO coefficients and setting the result equal to zero, we<br />
arrive at the level-shifted Roothaan–Hall equations,<br />
F 0 − SD 0 SC˜ occ = SC˜ occ.<br />
18<br />
Since the density matrix in Eq. 10 is invariant to unitary<br />
transformations among the occupied MOs in C˜ occ, we<br />
may transform this eigenvalue problem to the canonical basis,<br />
F 0 − SD 0 SC occ = SC occ ,<br />
19<br />
where the diagonal matrix contains the orbital energies.<br />
Note that, since D 0 S projects onto the part of C occ that is<br />
occupied in D 0 see Ref. 11, the level-shift parameter <br />
shifts only the energies of the occupied MOs. Therefore, the<br />
role of is to modify the difference between the energies of<br />
the occupied and virtual MOs, in particular, the HOMO-<br />
LUMO gap.<br />
Clearly, the success of the TRRH method will depend on<br />
our ability to make a judicious choice of the level-shift parameter<br />
in Eq. 19. In our standard TRRH implementation,<br />
we determine by requiring that D does not differ<br />
much from D 0 in the sense of Eq. 16, thereby ensuring a<br />
continous and controlled development of the density matrix<br />
from the initial guess to the converged one. In the following<br />
sections we discuss how is determined in this standard<br />
implementation.<br />
In view of the relative crudeness of the E RH D model, a<br />
more robust approach consists of performing a line search<br />
along the path defined by to obtain the minimum of the<br />
Kohn–Sham energy E KS D. Strictly speaking, this optimization<br />
is not a line search but rather a one-parameter optimization.<br />
One-parameter optimizations have previously<br />
been used by Seeger and Pople 12 to stabilize convergence of<br />
the RH procedure.<br />
For → Eq. 19 becomes equivalent to solving the<br />
eigenvalue equation,<br />
0<br />
SD 0 SC occ = SC 0 occ , 20<br />
where has eigenvalue 1 for the set of orbitals that are<br />
occupied in D 0 and eigenvalue 0 for the set of virtual orbitals.<br />
Equation 20 thus effectively divide the molecular orbitals<br />
into a set that is occupied and a set that is unoccupied,<br />
where the density D 0 is obtained from the occupied set,<br />
D 0 = C 0 occ C 0 occ T . 21<br />
Since F 0 is the gradient of E KS at D 0 , the step from Eq. 19<br />
for large is in the steepest-descent direction and will therefore<br />
give a decrease in the Kohn–Sham energy compared to<br />
the energy at D 0 . However, this TRRH line-search TRRH-<br />
LS algorithm is more expensive than the standard method,<br />
requiring the repeated construction of the Kohn–Sham matrix<br />
at each SCF iteration.<br />
B. Comparison of the Roothaan–Hall and Kohn–Sham<br />
energy functions<br />
To understand better our strategy for determining the<br />
level-shift parameter in the Kohn–Sham energy optimizations,<br />
we here examine the Roothaan–Hall model energy of<br />
Eq. 14 in more detail, comparing it with the true Kohn–<br />
Sham energy of Eq. 1. Expanding the Kohn–Sham and<br />
Roothaan–Hall energies about the reference density matrix<br />
D 0 and neglecting the differences between F 0 and FD 0 <br />
noted in Sec. II, we obtain<br />
E KS D = E KS D 0 +2TrFD 0 D − D 0 <br />
+TrD − D 0 GD − D 0 + E XC D − E XC D 0 <br />
−TrD − D 0 E 1 XC D 0 ,<br />
22<br />
E RH D = E RH D 0 +2TrFD 0 D − D 0 .<br />
23<br />
These expansions have the same first-order term 2 Tr FD 0 <br />
D−D 0 but different zero- and second-order terms. In an<br />
orthonormal MO basis, we may express any valid density<br />
matrix D in terms of the reference density matrix D 0 as<br />
DK = exp− KD 0 expK,<br />
24<br />
where the antisymmetric rotation matrix may be written in<br />
the form<br />
K = 0 − T<br />
. 25<br />
0<br />
The diagonal block matrices representing rotations among<br />
the occupied MOs and among the virtual MOs are zero since<br />
the density matrix in Eq. 10 is invariant to such rotations<br />
see Eq. 11. In terms of K, the first-order Roothaan–Hall<br />
and Kohn–Sham energies may be written as<br />
2TrFD 0 D − D 0 =2TrFD 0 <br />
exp− KD 0 expK − D 0 26<br />
and thus share a series of higher-order terms in K. If these<br />
shared higher-order terms are larger than the higher-order<br />
terms that occur only in the Kohn–Sham energy in Eq. 22,<br />
then the energy changes predicted by the Roothaan–Hall<br />
function in Eq. 23 will be a good approximation to the<br />
changes in the Kohn–Sham energy, even for large<br />
rotations K.<br />
Let us now compare the derivatives of the Roothaan–<br />
Hall and Kohn–Sham energies with respect to the orbital-<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-5 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />
rotation parameters ai in this paper, i, j, k, and l denote the<br />
occupied indices and a, b, c, and d denote the virtual indices.<br />
As already established, the two energy functions have<br />
the same gradients,<br />
E 1 KS ai =<br />
E 1 RH ai =<br />
E KS<br />
=−4F ai ,<br />
ai<br />
=0<br />
ERH<br />
<br />
ai<br />
=−4F ai . =0<br />
27a<br />
27b<br />
The Hessians are most conveniently expressed in a basis<br />
where the occupied-occupied and virtual-virtual blocks of<br />
the Kohn–Sham matrix are diagonal,<br />
F ab = ab a ,<br />
28a<br />
F ij = ij i .<br />
28b<br />
Since, at convergence where F is fully diagonal, the diagonal<br />
elements a and i become the orbital energies, we shall refer<br />
to these as the pseudo-orbital energies or sometimes just the<br />
orbital energies. In this basis, the Hessians of the two energy<br />
functions become<br />
E 2 KS aibj =<br />
2 E KS <br />
=4 ij ab a − i + M aibj ,<br />
ai bj=0<br />
29a<br />
E 2 RH aibj = 2 E RH <br />
=4 ij ab a − i , 29b<br />
ai bj=0<br />
where<br />
M aibj =16g aibj −4g abij + g ajib + E 2 XC D aibj . 30<br />
Clearly, the Roothaan–Hall Hessian in Eq. 29b is positive<br />
definite whenever the energies of the occupied orbitals are<br />
lower than the energies of the virtual orbitals, that is, whenever<br />
the HOMO-LUMO gap is positive. Furthermore, if the<br />
differences a − i in the Hessians are large compared to M aibj<br />
in Eq. 30, then E 2 RH<br />
is a good approximation to E 2 KS<br />
.<br />
C. Quadratically convergent trust-region optimization<br />
To minimize the Roothaan–Hall energy in Eq. 14, consider<br />
the second-order expansion in the orbital-rotation parameters<br />
,<br />
E RH 2 = E RH + T E 1 RH + 1 2 T E 2 RH .<br />
31<br />
The unconstrained Newton step is obtained by setting the<br />
gradient equal to zero,<br />
E 2<br />
RH<br />
<br />
= E RH<br />
1 + E 2 RH =0.<br />
32<br />
Solution of these equations yields the Newton step, with its<br />
fast second-order convergence in the local region. In the global<br />
region, far away from the true minimum, it is not reasonable<br />
to accept large steps since the expansion in Eq. 31 is<br />
only a valid approximation to E RH D for h, where h is<br />
the trust radius. Furthermore, if E 2 RH<br />
is indefinite, the Newton<br />
step in Eq. 32 may not reduce the energy. Therefore, if the<br />
Hessian is not positive definite or if the Newton step is too<br />
large, we solve instead a modified set of equations, where we<br />
minimize Eq. 31 subject to the constraint =h. To accomplish<br />
this, we introduce an undetermined multiplier <br />
and set up the Lagrangian<br />
L, = E RH 2 + 1 2 T − h 2 ,<br />
33<br />
whose stationary points are determined from the equation<br />
L,<br />
= E 1<br />
RH + E 2 RH + =0,<br />
34<br />
leading to the level-shifted Newton step,<br />
=−E 2 RH + I −1 E 1 RH .<br />
35<br />
The multiplier is chosen such that =h and such that the<br />
energy change predicted by E RH 2 is negative. Consider the<br />
first- and second-order changes of the Roothaan–Hall energy,<br />
E RH 1 − E RH = T E 1 RH =− T E 2 RH + I, 36a<br />
E RH 2 − E RH = T E 1 RH + 1 2 T E 2 RH <br />
=− 1 2 T E 2 RH + I − 1 2 T . 36b<br />
2<br />
If E RH<br />
is positive definite, both corrections are negative for<br />
2<br />
0; if E RH<br />
is indefinite, they are negative for − 1 ,<br />
where 1 is the lowest negative eigenvalue i.e., the HOMO-<br />
LUMO gap. In general, therefore, we choose such that<br />
max0,− 1 . As discussed in Ref. 6, it is always possible<br />
to find a level-shift parameter that satisfies this requirement.<br />
D. The quadratically convergent SCF method<br />
It is possible to optimize the Hartree–Fock and Kohn–<br />
Sham energies in Eq. 1 directly, without invoking the<br />
Roothaan–Hall energy function in Eq. 14. In the secondorder<br />
trust-region Newton method, the optimization then<br />
consists of a sequence of level-shifted Newton iterations. At<br />
each iteration, the linear equation in Eq. 35 is solved, replacing<br />
E RH<br />
1 2 1<br />
and E RH<br />
by E KS<br />
and E 2 KS<br />
, respectively. The<br />
resulting optimization scheme is known as the quadratically<br />
convergent SCF QC-SCF method. 13,14 The method is quadratically<br />
convergent in the local region and has a dynamic<br />
update of the trust region as discussed by Fletcher. 7<br />
E. The level-shift parameter in the TRRH method<br />
1. The global region<br />
A TRRH diagonalization step determined with =0 in<br />
Eq. 19 corresponds to the global minimum of E RH D.<br />
Therefore, when we impose the constraint in Eq. 16 on the<br />
difference between the old and new density matrices, then<br />
the step-size control is applied to a global optimization of<br />
E RH D. By contrast, in the quadratically convergent trustregion<br />
optimization of E RH in Eq. 35, step-size control<br />
is applied to a local model of E RH , that is, to the optimization<br />
of the second-order Taylor expansion of the energy<br />
E RH 2 in Eq. 31 inside the trust region.<br />
In the quadratically convergent trust-region method, we<br />
direct the step towards the minimum by choosing the level-<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-6 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />
shift parameter in Eq. 35 such that the lowest diagonal<br />
element of the Hessian LUMO a − HOMO i + becomes positive.<br />
Alternatively, in the Kohn–Sham diagonalization step in Eq.<br />
19, we may ensure positive definiteness by monitoring the<br />
dependence of the pseudo-orbital energies on the levelshift<br />
parameter in Eq. 19, adjusting it such that the<br />
HOMO-LUMO gap,<br />
ai = LUMO a − HOMO i ,<br />
37<br />
becomes positive. The configuration that defines the HOMO-<br />
LUMO gap is identified from the eigenvalues of Eq. 20 that<br />
are equal to one. Insisting on a smooth development of the<br />
MOs from those that are occupied in D 0 to those that are<br />
obtained by diagonalizing Eq. 19, we restrict to the interval<br />
min , where min is the smallest value for<br />
which the HOMO-LUMO gap is positive. In addition, the<br />
step must be constrained such that Eq. 16 is fulfilled. In<br />
passing, we note that the reference density matrix D 0 may<br />
not always be idempotent, for example, it may be D¯ of Eq.<br />
13, in which case its eigenvalues are not exactly 1. In such<br />
cases, the matrix<br />
D¯ 0 idem = C 0 occ C 0 occ T 38<br />
constructed from the eigenvectors of Eq. 20 with D 0 replaced<br />
by D¯ represents a purification of D¯ .<br />
The constraint on the change in the AO density in Eq.<br />
16 refers to a change which may arise not only from small<br />
changes in many MOs but also from large changes in a few<br />
MOs or even in a single MO. In the TRRH algorithm, we<br />
shall require that the changes in the individual MOs are all<br />
small. Expanding the MO new i , obtained by diagonalization<br />
of Eq. 19, in the old MOs, we obtain<br />
new i = j<br />
old j new i old j + old a new i old a ,<br />
a<br />
39<br />
where the first summation is over the occupied MOs and the<br />
second over the virtual MOs. The squared norm of the projection<br />
of new i onto the MO space associated with D 0 is<br />
therefore<br />
a orb i = old j new i 2 .<br />
40<br />
j<br />
To ensure small individual MO changes at each iteration to<br />
within a unitary transformation of the occupied MOs, we<br />
shall therefore require<br />
orb = min<br />
a min<br />
i<br />
a orb i A orb min ,<br />
41<br />
where A orb min is close to 1. This constraint also ensures that the<br />
HOMO-LUMO gap in Eq. 37 stays positive.<br />
The Hessians of E RH and E KS in Eq. 29 both contain<br />
the orbital-energy difference term, while the Hessian of E KS<br />
also contains the terms M aibj of Eq. 30. When is large<br />
compared to the M aibj terms, the step generated by the levelshifted<br />
diagonalization in Eq. 19 is then of the same quality<br />
as that generated by a quadratically convergent trust-region<br />
optimization of E KS . However, since the step-size control in<br />
Eq. 22 is imposed on the global optimization, the quality of<br />
the step may be further improved relative to that obtained in<br />
a QC-SCF optimization of the Kohn–Sham energy. When the<br />
level shift is determined in the global region such that<br />
a orb min A orb min we see often not just this one orbital but many for<br />
which a orb i A orb min . In this way a large number of orbitals<br />
change significantly.<br />
2. The local region<br />
To investigate the local convergence of the TRRH algorithm<br />
in Eq. 19, we first note that, in the local region near<br />
convergence, the gradient in Eq. 6 and thus the blocks F ov<br />
and F vo between the occupied and virtual orbitals in the<br />
Kohn–Sham matrix in the representation of Eq. 28,<br />
F = o<br />
F ov<br />
F vo v<br />
, 42<br />
are small, see Eq. 27. Writing the unitary transformation of<br />
F generated by K in Eq. 25 as<br />
expKF exp− K = o<br />
F ov<br />
F vo v<br />
+ − T F vo − T v<br />
o F ov<br />
<br />
+ T − F ov o T<br />
+ O 2 , 43<br />
− v F vo <br />
we find that, to first order, the block diagonalization of the<br />
Kohn–Sham matrix may be accomplished by solving the following<br />
set of linear equations:<br />
F vo + o − v = 0.<br />
44<br />
Since these equations are identical to the Newton equation in<br />
Eq. 32, we conclude that, in the local region where the<br />
higher-order terms in may be neglected, the block diagonalization<br />
of the Kohn–Sham matrix is equivalent to the solution<br />
of the equation<br />
=−E 2 RH −1 E 1 RH .<br />
45<br />
Let these equations determine the step of iteration n and<br />
expand the Kohn–Sham gradient at iteration n+1 about iteration<br />
point n,<br />
1<br />
E KSn+1<br />
1<br />
= E KSn<br />
1<br />
= E KSn<br />
2<br />
+ E KSn n + O 2 <br />
2<br />
− E KSn<br />
2<br />
E RHn<br />
Using Eqs. 27 and 29, we then obtain<br />
1<br />
E KSn+1<br />
1<br />
= E KSn<br />
2<br />
− E RHn<br />
1<br />
−1 E RHn + O 2 . 46<br />
2<br />
+ M n E RHn −1 1<br />
E KSn<br />
2<br />
=−M n E RHn −1 1<br />
E KSn , 47<br />
having neglected terms proportional to O 2 . Therefore, if<br />
2<br />
M n E RHn<br />
−1 has eigenvalues larger than 1, a simple TRRH<br />
sequence will diverge. This is particularly a problem in the<br />
Kohn–Sham theory, where the HOMO-LUMO gap the lowest<br />
eigenvalue of E 2 RH<br />
often is small compared to the contribution<br />
from M. To improve upon the local convergence,<br />
we may increase the HOMO-LUMO gap by level shifting,<br />
thereby reducing the magnitude of the eigenvalues of M n<br />
2<br />
E RHn<br />
−1 . We note that, when the simple TRRH sequence<br />
diverges, the TRSCF algorithm may still converge as TRRH<br />
mainly serves to provide a new density and TRDSM then<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-7 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />
FIG. 1. a The HOMO-LUMO gap ai , b the minimum overlap a orb min of<br />
the new occupied orbitals with the previous set of occupied orbitals, and c<br />
the changes in the model energy E RH —- and the Kohn–Sham energy<br />
E RH KS ---. All as a function of the level-shift parameter in the TRRH step<br />
of the seventh iteration of the zinc complex calculation seen in Fig. 5.<br />
optimizes the combination of the various densities.<br />
F. Examples of the trust-region<br />
Roothaan–Hall algorithm<br />
To illustrate how the TRRH algorithm is employed in the<br />
different parts of a Kohn–Sham energy optimization, we here<br />
consider how the level-shift parameter is determined in two<br />
iterations of the zinc complex calculation depicted in Sec.<br />
VII, Fig. 5. We first consider iteration 7, which is in the<br />
global region of the optimization, and then proceed to iteration<br />
22, as an example of a step in the local region.<br />
In Figs. 1a and 1b, we have plotted the HOMOorb<br />
LUMO gap ai of Eq. 37 and the overlap parameter a min<br />
of Eq. 41, respectively, as functions of the level-shift parameter<br />
. The corresponding changes in the Kohn–Sham<br />
energy E RH KS dash line and in the Roothaan–Hall model<br />
energy E RH full line of Eqs. 22 and 23 are plotted<br />
in Fig. 1c. We note that the change in the Kohn–Sham<br />
energy has been calculated as<br />
E RH KS = E KS D − E KS D¯ 0 idem ,<br />
48<br />
where D and D¯ 0 idem are the density matrices calculated<br />
from the solutions to the eigenvalue problems in Eqs. 19<br />
and 20, respectively.<br />
FIG. 2. a The HOMO-LUMO gap ai , b the minimum overlap a orb min of<br />
the new occupied orbitals with the previous set of occupied orbitals, and c<br />
the changes in the model energy E RH —- and the Kohn–Sham energy<br />
E RH KS ---. All as a function of the level-shift parameter in the TRRH step<br />
of the 22nd iteration of the zinc complex calculation seen in Fig. 5.<br />
In Fig. 1a, we see that, in iteration 7, ai is linear<br />
for 2.2, as the density matrix changes smoothly with<br />
decreasing from that of Eq. 20 to that obtained by applying<br />
the Aufbau principle to the solution of Eq. 19. For <br />
2.2, the occupied and virtual orbitals defined by the previous<br />
density interchange. The value of =5.078 used in this<br />
iteration was chosen from the requirement a orb min =A orb min =0.98<br />
in Eq. 41, restricting the new orbital component to 0.02.<br />
Figure 1c shows that an even lower energy would have<br />
been obtained by reducing the level shift to about 2.4, but it<br />
would be very difficult to identify this optimal value of <br />
without constructing additional Kohn–Sham matrices, since<br />
the Roothaan–Hall model energy is not accurate for small .<br />
In short, the identification of from the overlap requirement<br />
a orb min =A orb min appears to be a good and secure way to control the<br />
step sizes in the optimization.<br />
Figures 2a–2c are equivalent to Figs. 1a–1c, but<br />
for iteration 22 in the local part of the optimizaton. Notably,<br />
the linear regime of ai in Fig. 2a now extends to<br />
include =0, which corresponds to an unconstrained<br />
Roothaan–Hall step. Also, since a orb min =1.0000 for =0, we<br />
can no longer determine the level shift from the overlap criterion<br />
a orb min =A orb min . From Fig. 2c, we see that E RH KS dash<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-8 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />
TABLE I. Convergence details for the TRRH steps in the TRSCF calculation<br />
on the zinc complex in Fig. 5. Energies given in a.u.<br />
Iteration RH a orb min RH <br />
RH<br />
E KS<br />
line takes on its minimum value at =1.3; for smaller ,<br />
the energy increases, giving a total increase of 6.0·10 −5 E h<br />
for =0.<br />
The TRRH energy increase in the local part of the SCF<br />
optimization is particularly prominent for the DFT calculations.<br />
In the Hartree–Fock calculations, the TRRH model<br />
energy describes the SCF energy equally well in the local<br />
and global regions of the optimization. To avoid the increase<br />
in energy, we could add a constant minimum level shift, but<br />
this may in some cases slow down the convergence. Typically,<br />
the increase in the Kohn–Sham energy in the TRRH<br />
steps in the local region of the optimization is compensated<br />
by a larger energy decrease in the TRDSM step, ensuring<br />
an overall decrease in the Kohn–Sham energy in the iteration.<br />
In Table I, we have listed the values of several parameters<br />
characterizing the TRRH steps in the TRSCF iterations<br />
of the zinc complex calculation. In the first 17 iterations, the<br />
constraint a orb min =A orb min is active and determines the level-shift<br />
parameter. Note that, in the global region, E RH is a reasonable<br />
good approximation to E RH KS . After iteration 17, the<br />
local region of the Kohn–Sham energy optimization is approached<br />
and E RH is no longer a good approximation to<br />
E RH KS . In this region, the Kohn–Sham energy increases and it<br />
is the TRDSM algorithm that ensures the calculations convergence<br />
see Sec. VII, Table IV.<br />
IV. TRUST-REGION DENSITY-SUBSPACE<br />
MINIMIZATION IN DFT<br />
E RH<br />
2 22.57 0.994 −8.366 865 −8.411 913<br />
3 26.71 0.980 −20.122 850 −20.895 267<br />
4 30.54 0.980 −31.041 569 −35.286 269<br />
5 19.21 0.980 −27.278 985 −31.363 274<br />
6 10.31 0.980 −15.101 958 −18.277 717<br />
7 5.07 0.980 −10.675 155 −13.082 691<br />
8 2.96 0.980 −6.749 189 −7.197 438<br />
9 2.18 0.981 −3.181 254 −4.589 630<br />
10 4.68 0.980 0.394 694 −3.712 621<br />
11 1.40 0.980 −1.676 644 −2.885 580<br />
12 1.40 0.980 −1.743 634 −1.775 556<br />
13 0.93 0.980 −0.402 427 −0.843 260<br />
14 0.78 0.980 −0.376 675 −0.622 386<br />
15 0.54 0.981 −0.211 002 −0.227 722<br />
16 0.15 0.982 0.029 066 −0.199 268<br />
17 0.07 0.980 0.010 452 −0.068 243<br />
18 0.00 0.991 0.043 376 −0.037 071<br />
19 0.00 0.997 0.012 644 −0.009 493<br />
20 0.00 0.999 0.001 104 −0.000 931<br />
21 0.00 0.999 0.000 352 −0.000 249<br />
22 0.00 0.999 0.000 059 −0.000 049<br />
23 0.00 0.999 0.000 010 −0.000 006<br />
24 0.00 1.000 0.000 000 −0.000 000<br />
After a sequence of the Roothaan–Hall iterations, we<br />
have determined a set of the density matrices D i and a corresponding<br />
set of the Kohn–Sham matrices F i =FD i . The<br />
question then arises as to how to make the best use of the<br />
information contained in these collected density and Kohn–<br />
Sham matrices.<br />
A. Parametrization of the DSM density matrix<br />
Taking D 0 as the reference density matrix, we write the<br />
improved density matrix as a linear combination of the current<br />
and previous density matrices,<br />
n<br />
D¯ = D 0 + c i D i ,<br />
49<br />
i=0<br />
which, ideally, should satisfy the symmetry, trace, and idempotency<br />
conditions in Eq. 8 of a valid Kohn–Sham density<br />
matrix. Whereas the symmetry condition in Eq. 8a is trivially<br />
satisfied for any such linear combination, the trace condition<br />
in Eq. 8b holds only for combinations that satisfy the<br />
restriction<br />
n<br />
c i =0,<br />
50<br />
i=0<br />
leading to a set of n+1 constrained parameters c i with 0<br />
in. Alternatively, an unconstrained set of n parameters c i<br />
with 1in can be used, with c 0 defined so that the trace<br />
condition is fulfilled,<br />
n<br />
c 0 =− c i .<br />
51<br />
i=1<br />
In terms of these independent parameters, the density matrix<br />
D¯ becomes<br />
D¯ = D 0 + D + ,<br />
where we have introduced the notations<br />
n<br />
D + = c i D i0 ,<br />
i=1<br />
D i0 = D i − D 0 .<br />
52<br />
53a<br />
53b<br />
Unlike the symmetry and trace conditions in Eqs. 8a<br />
and 8b, the idempotency condition in Eq. 8c is in general<br />
not fulfilled for linear combinations of D i . Still, for any averaged<br />
density matrix D¯ in Eq. 52 that does not fulfill the<br />
idempotency condition, we may generate a purified density<br />
matrix with a smaller idempotency error by the<br />
transformation, 15<br />
D˜ =3D¯ SD¯ −2D¯ SD¯ SD¯ .<br />
54<br />
The purification of the density matrix has previously been<br />
used in connection with minimization of energy<br />
functions. 16–19<br />
Introducing the idempotency correction,<br />
D = D˜ − D¯ ,<br />
55<br />
we may then write the purified averaged density matrix in<br />
the form<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-9 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />
D˜ = D 0 + D + + D .<br />
56<br />
In the following, we shall analyze the relative magnitudes of<br />
the terms D + and D entering Eq. 56.<br />
B. Order analysis of the purified averaged<br />
density matrix<br />
For simplicity, we shall work in the orthonormal MO<br />
basis that diagonalizes the reference density matrix,<br />
D 0 = I 0<br />
57<br />
0 0,<br />
and consider the case with only one additional density matrix<br />
D 1 . According to Eq. 24, an antisymmetric matrix K of the<br />
form in Eq. 25 exists such that<br />
D 1 = exp− KD 0 expK<br />
= D 0 + − T − <br />
T<br />
− T + O 3 , 58<br />
giving rise to the following averaged density matrix:<br />
D¯ = D 0 + cD 10 = D 0 + − cT − c<br />
T<br />
− c c T + Oc 3 .<br />
The idempotency error of D¯ is given by<br />
59<br />
D¯ D¯ − D¯ = c 2 − c T 0<br />
0 T + Oc 4 , 60<br />
showing that D¯ is idempotent only to first order in . To<br />
reduce the idempotency error, we subject D¯ to the purification<br />
in Eq. 54, obtaining<br />
D˜ =3D¯ 2 −2D¯ 3 = D 0 + T − c2 T − c T<br />
− c c 2 + Oc 3 .<br />
<br />
Finally, comparing Eqs. 59 and 61, we obtain<br />
D˜ = D¯ + Oc 2 ,<br />
61<br />
62<br />
demonstrating that the impure and purified average density<br />
matrices differ by terms proportional to c 2 . Since the<br />
McWeeny purification in Eq. 54 converges quadratically,<br />
we conclude that the idempotency error of Eq. 62 is proportional<br />
to c 2 4 .<br />
In a more general analysis, we would not assume an<br />
orthonormal basis and we would also include several density<br />
matrices D i =exp−K i D 0 expK i . The essential result is<br />
then that we may write Eq. 56 as<br />
n<br />
D˜ = D 0 + <br />
i=1<br />
n<br />
c i D i0 + O c i D i0 2,<br />
i=1<br />
63<br />
where we have used the fact that D i0 is proportional to i .<br />
We conclude that while D + is linear in c i and D i0 , the idempotency<br />
correction D to D¯ is linear in c i but quadratic in D i0 .<br />
The conclusions to be derived from this analysis are summarized<br />
in Table II.<br />
TABLE II. Comparison of the properties of the unpurified density D¯ and the<br />
purified density D˜ .<br />
C. Construction of the DSM energy function<br />
Having established a useful parametrization of the averaged<br />
density matrix in Eq. 52 and having considered its<br />
purification in Eq. 54, let us now consider how to determine<br />
the best set of coefficients c i . Expanding the energy for<br />
the purified averaged density matrix in Eq. 56 around the<br />
reference density matrix D 0 , we obtain to second order<br />
ED˜ = ED 0 + D + + D T E 0<br />
1<br />
+ 1 2 D + + D T E 0 2 D + + D . 64<br />
To evaluate the terms containing E 0 1 and E 0 2 , we make the<br />
identifications,<br />
E 0 1 =2F 0 ,<br />
E 0 2 D + =2F + + OD + 2 ,<br />
65<br />
66<br />
which follow from Eq. 6 and from the second-order Taylor<br />
expansion of E 1 0<br />
about D 0 , and where we have generalized<br />
the notation in Eq. 53a to the Kohn–Sham matrix F +<br />
= n<br />
i=1<br />
c i F i0 . Ignoring the terms quadratic in D in Eq. 64<br />
and quadratic in D + in Eq. 66, we then obtain for the DSM<br />
energy,<br />
E DSM c = ED 0 +2TrD + F 0 +TrD + F +<br />
+2TrD F 0 +2TrD F + .<br />
67<br />
Finally, for a more compact notation, we introduce the<br />
weighted Kohn–Sham matrix,<br />
n<br />
F¯ = F0 + F + = F 0 + c i F i0 ,<br />
68<br />
i=1<br />
and find that the DSM energy may be written in the form<br />
E DSM c = ED¯ +2TrD F¯ ,<br />
69<br />
where the first term is quadratic in the expansion coefficients<br />
c i ,<br />
ED¯ = ED 0 +2TrD + F 0 +TrD + F + ,<br />
70<br />
and the second, idempotency-correction term is quartic in<br />
these coefficients:<br />
2TrD F¯ =Tr6D¯ SD¯ −4D¯ SD¯ SD¯ −2D¯ F¯ .<br />
D¯<br />
Differences D¯ −D 0 =Oc D˜ −D¯ =Oc 2 <br />
Idempotency error D¯ SD¯ −D¯ =Oc 2 D˜ SD˜ −D˜ =Oc 2 4 <br />
Trace error Tr D¯ S− N 2=0 TrD˜ S− N 2=Oc 2 4 <br />
71<br />
The derivatives of E DSM (c) are straightforwardly obtained<br />
by inserting the expansions of F¯ and D¯ , using the independent<br />
parameter representation.<br />
D˜<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-10 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />
TABLE III. Convergence details for the TRDSM steps in the TRSCF calculation<br />
on the zinc complex in Fig. 5. Energies given in a.u.<br />
Iteration D S<br />
2<br />
D + S<br />
2<br />
DSM<br />
E KS<br />
E DSM<br />
3 1.612 753 6.129 310 −48.255 717 −49.742 656<br />
4 1.488 082 12.140 844 −105.996 850 −111.554 301<br />
5 0.206 716 1.594 214 −43.136 482 −41.110 879<br />
6 1.504 099 3.162 679 −26.390 457 −26.511 025<br />
7 0.096 714 1.468 925 −14.755 377 −14.499 582<br />
8 0.110 282 1.525 848 −7.711 220 −7.278 600<br />
9 0.086 759 1.569 113 −5.289 340 −5.165 696<br />
10 0.423 825 1.614 867 −2.684 359 −3.500 173<br />
11 0.196 628 1.002 744 −1.053 899 −1.126 867<br />
12 0.111 409 0.867 238 −1.054 903 −0.936 180<br />
13 0.093 520 0.729 574 −0.658 907 −0.621 180<br />
14 0.054 596 0.324 338 −0.293 889 −0.238 992<br />
15 0.045 721 0.201 434 −0.213 251 −0.170 060<br />
16 0.026 474 0.242 928 −0.104 012 −0.096 482<br />
17 0.011 746 0.071 203 −0.100 694 −0.093 602<br />
18 0.001 512 0.022 758 −0.043 180 −0.042 748<br />
19 0.000 687 0.040 675 −0.057 441 −0.056 819<br />
20 0.000 122 0.011 897 −0.016 501 −0.016 416<br />
21 0.000 025 0.001 164 −0.001 471 −0.001 453<br />
22 0.000 001 0.000 308 −0.000 428 −0.000 427<br />
23 0.000 000 0.000 050 −0.000 076 −0.000 076<br />
24 0.000 000 0.000 009 −0.000 012 −0.000 012<br />
25 0.000 000 0.000 000 −0.000 000 −0.000 000<br />
D. Optimization of the DSM energy<br />
The energy function E DSM c in Eq. 69 provides an<br />
excellent approximation to the exact Kohn–Sham energy<br />
E KS c about D 0 , with an error cubic in D + . It can be optimized<br />
by the trust-region method, as described in Ref. 6,<br />
yielding an improved density matrix D˜ , from which the<br />
Kohn–Sham matrix of the next TRRH iteration is constructed.<br />
However, to avoid the expensive calculation of the<br />
Kohn–Sham matrix from D˜ , we use instead in our TRDSM<br />
implementation the averaged Kohn–Sham matrix in Eq. 68.<br />
As in the TRRH step in Sec. III A, the averaged density<br />
matrix D¯ may also be determined by a line search. Here, the<br />
line search is made in the direction defined by the first step<br />
of the TRDSM algorithm, that is, the step at the expansion<br />
point D 0 . As in the TRRH step, such a line search is guaranteed<br />
to reduce the Kohn–Sham energy. We denote this line<br />
search algorithm TRDSM-LS.<br />
In the DSM scheme, we assume that the idempotency<br />
correction D =D˜ −D¯ is small relative to D + =D¯ −D 0 , both<br />
when discarding the terms quadratic in D in Eq. 64 and<br />
when constructing the Kohn–Sham matrix from D¯ rather<br />
than from D˜ in the subsequent Roothaan–Hall iteration. As is<br />
seen from Eq. 63, this assumption holds if the old density<br />
matrices D i are similar to D 0 . Formally, therefore, we should<br />
include in the TRDSM only density matrices that are similar<br />
to D 0 . In particular, if the orbital occupations change in the<br />
course of the Roothaan–Hall iterations, we should discard all<br />
density matrices that represent the old occupations.<br />
To demonstrate the validity of the assumption, that D is<br />
small compared to D + , we have in Table III listed D S<br />
2<br />
FIG. 3. The ratio between the norms of the idempotency correction to the<br />
density D S 2 =D˜ −D¯ S 2 and the density change D + S 2 =D¯ −D 0 S 2 in the<br />
TRDSM steps of the zinc complex calculation seen in Fig. 5.<br />
=TrD SD S and D + 2 S =TrD + SD + S at each iteration of the<br />
zinc complex calculation of Sec. VII. From Fig. 3, where the<br />
ratio D 2 S /D + 2 S is plotted, we see that, apart from iteration<br />
6, this ratio is always smaller than 0.3 and that it rapidly<br />
converges to zero in the local region. The neglect of the<br />
terms that are quadratic in D in the TRDSM method is thus<br />
well justified. In Table III, we have also listed the model<br />
energy change E DSM and the actual energy change E DSM KS ,<br />
obtained as the difference between the Kohn–Sham energies<br />
calculated from the idempotent D¯ obtained as in Eq. 38<br />
and from D 0 : E DSM KS =E KS D¯ 0 idem −E KS D 0 . Clearly,<br />
E DSM c is an extremely good representation of E KS c for<br />
the step sizes taken by the TRDSM algorithm, as expected<br />
since E DSM c and E KS c differ in terms that are cubic in D + .<br />
E. Comparison of the DSM and EDIIS energies<br />
Neglecting the idempotency correction in the DSM energy<br />
in Eq. 69, we are left with ED¯ . In the Hartree–Fock<br />
theory, this remaining term may be expressed in several<br />
equivalent ways. First, it may be written as the energy of the<br />
weighted density matrix,<br />
E HF D¯ =2TrhD¯ +TrD¯ GD¯ ,<br />
72<br />
where the weighted density matrix is defined as note the<br />
difference from Eq. 49<br />
n<br />
D¯ = d i D i ,<br />
i=0<br />
n<br />
d i =1.<br />
i=0<br />
73<br />
In their development of the EDIIS method, Kudin et al. 4<br />
suggested the alternative form<br />
n<br />
E EDIIS D¯ = d i E SCF D i − 1 n<br />
i=0<br />
2 Tr d i d j F ij D ij , 74<br />
i,j=0<br />
where E SCF D may be the Hartree–Fock energy or the<br />
Kohn–Sham energy. In the Hartree–Fock theory, Eqs. 70,<br />
72, and 74 are equivalent since the Fock matrix is linear<br />
in the density matrix. By contrast, in the DFT, where the<br />
Kohn–Sham matrix contains terms that are nonlinear in the<br />
density matrix, these expressions are not equivalent. Below,<br />
we discuss some of the consequences of their nonequivalence<br />
in the DFT.<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-11 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />
Eliminating d 0 =1− n<br />
i=1 d i from Eq. 74, we may express<br />
the EDIIS energy in the independent representation of<br />
Eqs. 52 and 53,<br />
n<br />
E EDIIS D¯ = E SCF D 0 + d i E SCF D i − E SCF D 0 <br />
i=1<br />
n<br />
− <br />
i=1<br />
n<br />
d i Tr F i0 D i0 + d i d j Tr F j0 D j0<br />
n<br />
i,j=1<br />
− 1 d i d j Tr F ij D ij .<br />
2<br />
i,j=1<br />
75<br />
Comparing this expression with ED¯ of Eq. 70, wefind<br />
that they have the same values at the expansion point D 0 but<br />
that their first derivatives differ since<br />
ED¯ <br />
c k<br />
=2TrF 0 D k0 , 76a<br />
E EDIIS D¯ <br />
= E SCF D k − E SCF D 0 −TrF k0 D k0 . 76b<br />
d k<br />
In the Hartree–Fock theory, it is easy to see that Eqs. 76a<br />
and 76b are identical.<br />
The DSM gradient is<br />
E DSM c ED¯ <br />
= +2 Tr D F¯<br />
. 77<br />
c k c k c k<br />
Since E DSM is equal to E KS to first order, we have that<br />
E DSM c<br />
= E KS<br />
. 78<br />
c k c k<br />
The EDIIS gradient at the expansion point is thus not equal<br />
to the KS gradient as the last nonzero term in Eq. 77 the<br />
term resulting from the idempotency correction is missing.<br />
Further the correct gradient in the DSM can only be obtained<br />
in the DFT if Eq. 76a and not Eq. 76b is used. It is thus<br />
incorrect to use Eq. 76a in the DFT even though Eqs. 76a<br />
and 76b are equivalent in Hartree–Fock.<br />
V. CONFIGURATION SHIFT<br />
IN THE TRSCF ALGORITHM<br />
Since the TRSCF method has been designed for a<br />
smooth and controlled convergence of the density matrix, it<br />
does not allow for the abrupt changes in the orbitals associated<br />
with configuration shifts. Nevertheless, it may sometimes<br />
be advantageous to allow such shifts, as illustrated in<br />
Fig. 4, where we compare two cadmium complex calculations<br />
see Sec. VII for details. The “no-shift” optimization<br />
proceeds carefully, allowing only small changes in the density<br />
matrix at each iteration, whereas the “do-shift” optimization<br />
is more daring, accepting abrupt configuration shifts<br />
that reduce the total energy.<br />
In Fig. 4a, we have plotted the error in the energy at<br />
each iteration of the two optimizations. The first 13 iterations<br />
are identical; the optimizations are in the global region and<br />
orb<br />
the level shift is determined from the requirement a min<br />
FIG. 4. The TRSCF cadmium complex calculation described in Sec. VII. a<br />
The convergence without abrupt configuration shift and with abrupt<br />
configuration shift . b and c contain details of the TRRH step in<br />
iteration 14; b the minimum overlap a orb<br />
min for the new occupied orbitals<br />
with the previous set of occupied orbitals and c the changes in the model<br />
energy E RH — and the actual energy E RH KS ---. All as a function of the<br />
level-shift parameter .<br />
=A orb min =0.98. In iteration 14, the two optimizations differ. To<br />
understand the reasons for these differences, we have in Fig.<br />
4b plotted a orb min and in Fig. 4c E RH full line and<br />
E RH KS dash line as functions of . For =0.25, there is an<br />
abrupt shift in a orb min from 0.99 to 0.00, representing a configuration<br />
shift where the LUMO for 0.25 becomes the<br />
HOMO for 0.25. From Fig. 4c, we see that this shift<br />
lowers the Kohn–Sham total energy. Because of the abrupt<br />
change in a orb min at =0.25, we are unable to identify<br />
a orb min =0.98. In the no-shift calculation, is chosen larger<br />
than 0.25, whereas, in the do-shift calculation, the undamped<br />
Roothaan–Hall step is taken with =0.<br />
As the DSM energy model assumes small changes in the<br />
density matrix, the density matrices of all previous iterations<br />
are discarded in iteration 14 of the do-shift calculation, and a<br />
rapid convergence to the optimized state is seen from that<br />
point. In the no-shift calculation, an a orb min profile similar to<br />
that of iteration 14 is obtained in the next few iterations. In<br />
these iterations, the lowest Hessian eigenvalue is −0.95 a.u.<br />
and the optimization proceeds towards a stationary point.<br />
Finally, in iteration 22, the TRSCF algorithm identifies this<br />
stationary point as a saddle point, moves out of this region,<br />
and converges rapidly to the same minimum as the do-shift<br />
optimization.<br />
As this example illustrates, it is important to recognize<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-12 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />
and accept a favorable configuration shift. A configuration<br />
shift may be recognized when an a orb min profile has an<br />
abrupt change where on the right-hand side a orb min is close to 1<br />
and on the left-hand side a orb min is close to 0. To maintain the<br />
high degree of control characteristic of the TRSCF method,<br />
the energy of the new configuration is checked before the<br />
shift is accepted, at the cost of an additional Kohn–Sham<br />
matrix build. As seen from Fig. 4a, this check is well worth<br />
the effort, saving more than ten iterations, and thus it is made<br />
an integrated part of our TRSCF implementation.<br />
VI. THE DIIS METHOD VIEWED<br />
AS A QUASI-NEWTON METHOD<br />
Since its introduction by Pulay in 1980, the DIIS method<br />
has been extensively and successfully used to accelerate the<br />
convergence of SCF optimizations. We here present a rederivation<br />
of the DIIS method to demonstrate that, in the iterative<br />
subspace of density matrices, it is equivalent to a quasi-<br />
Newton method. From this observation, we conclude that, in<br />
the local region of the SCF optimization, the DIIS steps can<br />
be used safely and will lead to fast convergence. The convergence<br />
of the DIIS algorithm in the global region is also<br />
discussed and is much more unpredictable.<br />
We assume that, in the course of the SCF optimization,<br />
we have determined a set of n+1 AO density matrices<br />
D 0 ,D 1 ,D 2 ,...,D n and the associated Kohn–Sham or Fock<br />
matrices FD 0 ,FD 1 ,FD 2 ,...,FD n . Since the electronic<br />
gradient gD is given by 11<br />
gD =4SDFD − FDDS,<br />
79<br />
we also have available the corresponding gradients<br />
gD 0 ,gD 1 ,gD 2 ,...,gD n . We now wish to determine a<br />
corrected density matrix,<br />
n<br />
D¯ = D 0 + c i D i0 , D i0 = D i − D 0 , 80<br />
i=1<br />
that minimizes the norm of the gradient gD¯ . For this purpose,<br />
we parameterize the density matrix in terms of an antisymmetric<br />
matrix X=−X T and the current density matrix<br />
D 0 as 11<br />
DX = exp− XSD 0 expSX.<br />
81<br />
With each old density matrix D i , we now associate an antisymmetric<br />
matrix X i such that<br />
D i = exp− X i SD 0 expSX i = D 0 + D 0 ,X i S + OX 2 i .<br />
82<br />
Introducing the averaged antisymmetric matrix,<br />
n<br />
X¯ = c i X i ,<br />
i=1<br />
we obtain<br />
83<br />
n<br />
DX¯ = D 0 + c i D 0 ,X i S + OX¯ 2 ,<br />
i=1<br />
84<br />
where we have used the S-commutator expansion of DX¯ <br />
analogeous to Eq. 82. Our task is hence to determine X¯ in<br />
Eq. 83 such that DX¯ minimizes the gradient norm<br />
gDX¯ . In passing, we note that, whereas D¯ is not in<br />
general idempotent and therefore not a valid density matrix,<br />
DX¯ is a valid, idempotent density matrix for all choices of<br />
c i .<br />
Expanding the gradient in Eq. 79 about the currentdensity<br />
matrix D 0 , we obtain<br />
gDX¯ = gD 0 + HD 0 X¯ + OX¯ 2 ,<br />
85<br />
where HD is the Jacobian matrix. Neglecting the higherorder<br />
terms, our task is therefore to minimize the norm of the<br />
gradient,<br />
n<br />
gc = gD 0 + c i HD 0 X i ,<br />
86<br />
i=1<br />
with respect to the elements of c. For an estimate of<br />
HD 0 X i , we truncate the expansion,<br />
gD i = gD 0 + HD 0 X i + OX i 2 ,<br />
and obtain the quasi-Newton condition,<br />
gD i − gD 0 = HD 0 X i .<br />
Inserting this condition into Eq. 86, we obtain<br />
n<br />
gc = gD 0 + <br />
i=1<br />
n<br />
c i gD i − gD 0 = c i gD i ,<br />
i=0<br />
87<br />
88<br />
89<br />
where we have introduced the parameter c 0 =1− n<br />
i=1 c i . The<br />
minimization of gc=gc may therefore be carried out as<br />
a least-squares minimization of gc in Eq. 89 subject to the<br />
constraint<br />
n<br />
c i =1.<br />
90<br />
i=0<br />
If we consider gD i as an error vector for the density matrix<br />
D i , this procedure becomes identical to the DIIS method.<br />
From Eq. 86 we also see that DIIS may be viewed as a<br />
minimization of the residual for the Newton equation in the<br />
subspace of the density matrix differences D i −D 0 , i=1, n,<br />
where the quasi-Newton condition is used to set up the subspace<br />
equations. Since the quasi-Newton steps are reliable<br />
only in the local region of the optimization, we conclude that<br />
the DIIS method can be used safely only in this region, when<br />
the electronic Hessian is positive definite.<br />
The optimal combination of the density matrices is obtained<br />
in the DIIS method, by carrying out a least-squares<br />
minimization of the gradient norm subject to the constraint in<br />
Eq. 90. However, since a small gradient norm in the global<br />
region does not necessarily imply a low Kohn–Sham energy,<br />
the DIIS convergence may be unpredictable. Furthermore,<br />
we may encounter regions where the gradient norms are<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-13 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />
ethylenediamine tetra-acetic acid EDTA. Next, in Sec.<br />
VII B, we consider the calculations on five different systems.<br />
All calculations have been carried out with a local version of<br />
the DALTON program package. 20 Unless otherwise indicated,<br />
the starting orbitals have been obtained by diagonalization of<br />
the one-electron Hamiltonian.<br />
FIG. 5. The convergence of different algorithms in a LDA/6-31G computation<br />
with core Hamiltonian start guess for the zinc complex depicted in the<br />
lower left corner. The algorithms being QC-SCF , DIIS , TRSCF<br />
, and TRSCF-LS .<br />
similar but the energies different. The DIIS method may then<br />
diverge, not being able to identify the density matrix of lowest<br />
energy, as illustrated in Sec. VII.<br />
VII. APPLICATIONS<br />
to<br />
In this section, we give numerical examples to illustrate<br />
the convergence characteristics of the Kohn–Sham TRSCF<br />
calculations, comparing with the DIIS and QC-SCF calculations.<br />
Comparisons are also made with the TRSCF-LS technique,<br />
where the TRRH-LS and TRDSM-LS line-search<br />
methods of Secs. III A and IV D are combined to set up an<br />
expensive but highly robust method, in which the lowest<br />
Kohn–Sham energy is identified by a line search at each step.<br />
In Sec. VII A, we discuss the calculations on the zinc complex<br />
in Fig. 5, where Zn 2+ is complexated with<br />
ethylenediamine-N, N -disuccinic acid EDDS, an isomer<br />
<br />
A. Calculations on the zinc complex<br />
In Fig. 5, we have plotted the error in the Kohn–Sham<br />
energy at each iteration of LDA/6-31G calculations on the<br />
zinc complex. The standard TRSCF method performs<br />
almost as well as the very smooth but much more expensive<br />
TRSCF-LS method , giving a somewhat higher energy<br />
between iterations 13 and 22. By contrast, the DIIS method<br />
shows no sign of converging; after 100 iterations, the<br />
Kohn–Sham gradient norm is still about 20. Whereas the<br />
smooth TRSCF convergence arises because Hessian information<br />
is used to ensure downhill TRRH and TRDSM steps<br />
at each iteration, no such information is employed in the<br />
DIIS method. Finally, the QC-SCF method converges<br />
but exceedingly slow—even after 90 iterations it has not<br />
reached the quadratically convergent local region! The difficulties<br />
experienced with the QC-SCF method illustrate<br />
clearly that the use of Hessian information by itself is no<br />
guarantee of fast convergence.<br />
More details about the TRSCF zinc complex calculation<br />
are given in Tables I–V and in Figs. 1–3 and 6, partly discussed<br />
in Secs. III F and IV D. In Table IV, we have listed<br />
the changes in the Kohn–Sham energy generated separately<br />
in the TRRH E RH KS and TRDSM E DSM KS steps at each<br />
SCF iteration, and likewise the norms of the changes in the<br />
TABLE IV. Convergence details for the TRSCF calculation on the zinc complex in Fig. 5. Energies given in a.u.<br />
DSM<br />
Iteration E KS E KS<br />
RH<br />
E KS<br />
2<br />
D¯ n−D n S DSM<br />
D n+1 −D¯ n S<br />
2<br />
2 −8.366 865 0.000 000 −8.366 865 0.000 000 0.197 607<br />
3 −68.378 567 −48.255 717 −20.122 850 6.129 310 1.141 536<br />
4 −137.038 420 −105.996 850 −31.041 569 12.140 844 1.265 250<br />
5 −70.415 468 −43.136 482 −27.278 985 1.594 214 1.031 844<br />
6 −41.492 416 −26.390 457 −15.101 958 3.162 679 1.467 802<br />
7 −25.430 533 −14.755 377 −10.675 155 1.468 925 1.364 944<br />
8 −14.460 409 −7.711 220 −6.749 189 1.525 848 1.249 827<br />
9 −8.470 594 −5.289 340 −3.181 254 1.569 113 1.040 337<br />
10 −2.289 664 −2.684 359 0.394 694 1.614 867 0.817 844<br />
11 −2.730 543 −1.053 899 −1.676 644 1.002 744 1.060 298<br />
12 −2.798 537 −1.054 903 −1.743 634 0.867 238 0.632 009<br />
13 −1.061 335 −0.658 907 −0.402 427 0.729 574 0.410 434<br />
14 −0.670 565 −0.293 889 −0.376 675 0.324 338 0.351 715<br />
15 −0.424 253 −0.213 251 −0.211 002 0.201 434 0.203 170<br />
16 −0.074 945 −0.104 012 0.029 066 0.242 928 0.302 723<br />
17 −0.090 241 −0.100 694 0.010 452 0.071 203 0.175 917<br />
18 0.000 195 −0.043 180 0.043 376 0.022 758 0.126 709<br />
19 −0.044 797 −0.057 441 0.012 644 0.047 885 0.032 787<br />
20 −0.015 396 −0.016 501 0.001 104 0.011 897 0.002 976<br />
21 −0.001 118 −0.001 471 0.000 352 0.001 164 0.000 668<br />
22 −0.000 368 −0.000 428 0.000 059 0.000 308 0.000 111<br />
23 −0.000 066 −0.000 076 0.000 010 0.000 050 0.000 019<br />
24 −0.000 011 −0.000 012 0.000 000 0.000 009 0.000 001<br />
25 −0.000 000 −0.000 000 0.000 000 0.000 000 0.000 000<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-14 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />
TABLE V. The density of each iteration compared to the optimized one.<br />
Iteration D conv −D n S<br />
2<br />
a orb min conv,n<br />
2 66.952 673 0.0965<br />
3 65.174 713 0.0955<br />
4 56.502 973 0.0927<br />
5 51.210 143 0.1017<br />
6 48.482 773 0.1411<br />
7 42.682 641 0.1394<br />
8 35.617 332 0.1992<br />
9 26.551 913 0.3183<br />
10 18.298 431 0.4094<br />
11 14.152 342 0.4983<br />
12 9.767 169 0.6927<br />
13 6.184 621 0.6859<br />
14 3.844 299 0.9187<br />
15 2.240 436 0.9194<br />
16 1.018 810 0.9771<br />
17 0.200 374 0.9952<br />
18 0.064 181 0.9984<br />
19 0.043 906 0.9967<br />
20 0.011 531 0.9996<br />
21 0.001 092 0.9999<br />
22 0.000 309 0.9999<br />
23 0.000 053 0.9999<br />
24 0.000 009 0.9999<br />
25 0.000 000 0.9999<br />
2<br />
density matrix in the TRRH D n+1 −D¯ n S RH<br />
and TRDSM<br />
2<br />
D¯ n−D n S DSM<br />
steps. Remarkably, the TRDSM step consistently<br />
reduces the energy more than the TRRH step. Indeed,<br />
after iteration 15, each TRRH step increases rather<br />
than decreases the energy. Apparently, in the local region, the<br />
role of the TRRH step is reduced to that of improving that<br />
variational space of the subsequent TRDSM step. From the<br />
table, we also see that the largest changes in the density<br />
matrix are generated by the TRDSM step rather than by the<br />
TRRH step.<br />
For the TRRH and TRDSM steps, we have at each iteration<br />
determined the overlap a orb i in Eq. 40 of each generated<br />
occupied orbital new i with the previous orbitals old j . In Fig.<br />
6, the number of orbitals at each iteration with a orb i 0.98<br />
i.e., with large rotations is illustrated in a bar chart. As we<br />
require a orb i 0.98 in the Roothaan–Hall steps, the TRRH<br />
FIG. 6. The number of occupied orbitals in the TRRH and TRDSM steps<br />
with an overlap less than 0.98 to the previous set of occupied orbitals for<br />
each step in the SCF iteration.<br />
orb<br />
bars simply represent the number of orbitals with a i<br />
0.98. In the TRDSM step, however, no such restrictions<br />
are imposed and a large number of orbitals with a orb i 0.98<br />
are observed. Indeed, in the first few DSM steps, overlaps as<br />
small as 0.76 occur, leading to far larger changes than those<br />
accepted in the Roothaan–Hall step, emphasizing the important<br />
role played by the TRDSM step in achieving orbital<br />
reorganizations in a controlled manner.<br />
In Table V, we have listed the norm of the difference<br />
between the current-density matrix D n at each iteration and<br />
the final converged density matrix D conv ; also, we have listed<br />
a orb min conv,n, which is the smallest overlap in the sense of<br />
Eq. 41 of the current occupied orbitals, with the converged<br />
ones. Clearly, very large changes occur in the density matrix<br />
and the orbitals in the course of the optimization, in particular,<br />
during the first 17 iterations; in the remaining iterations,<br />
only small adjustments are made. In spite of the large overall<br />
changes made to the orbitals, they have been accomplished<br />
in a controlled and reliable manner.<br />
In Fig. 7, we have plotted the errors for the same LDA/<br />
6-31G optimization as in Fig. 5, but with the starting orbitals<br />
obtained from a Hückel calculation rather than from the diagonalization<br />
of the one-electron Hamiltonian. Convergence<br />
is now faster, with the TRSCF-LS and TRSCF methods<br />
behaving in the same smooth manner as before. More<br />
importantly, with this improved starting guess, the DIIS<br />
method converges in almost the same number of iterations<br />
as the TRSCF method, although less smoothly.<br />
Finally, in Fig. 8, we have the same plot as in Fig. 7, but<br />
in the STO-3G rather than 6-31G basis still with a Hückel<br />
guess. Somewhat surprisingly, convergence is more difficult<br />
in this smaller basis. Indeed, after 100 iterations, the DIIS<br />
method has not yet converged, with a Kohn–Sham gradient<br />
norm as large as 10. The standard TRSCF method <br />
still converges, but now in a less smooth manner than the<br />
TRSCF-LS method. As mentioned in Sec. III E 2, when<br />
the HOMO-LUMO gap is particularly small, it may sometimes<br />
be necessary to enforce a minimum TRRH level shift<br />
to achieve convergence. Indeed, in the TRSCF optimization<br />
in Fig. 8, we require 0.1 throughout the calculation.<br />
B. Calculations on a variety of molecules<br />
In Fig. 9, we have plotted the errors in the energy at each<br />
SCF iteration, for a variety of molecules at the LDA level of<br />
theory: the zinc complex from Fig. 5 in the 6-31G basis<br />
set; the rhodium complex from Ref. 6 in the Ahlrichs-<br />
VDZ basis 21 with STO-3G on the rhodium atom; a cadmium<br />
complexed with an imidazole ring in the STO-3G basis;<br />
the CH 3 CHO molecule in the cc-pVTZ basis, 22 and the<br />
H 2 O molecule in the cc-pVTZ basis.<br />
For the TRSCF-LS method, convergence is smooth for<br />
all systems, as expected. Likewise, in the TRSCF calculations<br />
with no restrictions enforced on the TRRH level-shift<br />
parameter, convergence is still good although not as smooth<br />
as in the TRSCF-LS calculations. The behavior of the DIIS<br />
method is somewhat more erratic, in particular, in the global<br />
region; in the local region, it converges as well as the TRSCF<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-15 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />
FIG. 7. The convergence of different algorithms in a LDA/6-31G computation<br />
with Hückel start guess for the zinc complex in Fig. 5. The algorithms<br />
being DIIS , TRSCF , and TRSCF-LS .<br />
method. These observations are in agreement with our discussion<br />
in Sec. VI. The DIIS zinc complex calculation does<br />
not converge as discussed above.<br />
In Fig. 9, we have also included the results from the<br />
DIIS-TRRH optimizations. These calculations differ from<br />
the DIIS calculations in that we have used a level-shift parameter<br />
in the Roothaan–Hall diagonalization step; alternatively,<br />
DIIS-TRRH may be viewed as different from TRSCF<br />
in that we have replaced the TRDSM steps by DIIS steps.<br />
Somewhat surprisingly, only the water calculation converges<br />
with the DIIS-TRRH method. To understand this behavior,<br />
we note that, in the global region, the TRRH method typically<br />
produces gradients that do not change much, even<br />
though large changes may occur in the energy. In such cases,<br />
the DIIS method may stall, not being able to identify a good<br />
combination of density matrices.<br />
This behavior is illustrated in Table VI, where we have<br />
listed the gradient norm and Kohn–Sham energy of the first<br />
six iterations of the cadmium complex calculation in Fig. 9.<br />
The TRSCF and DIIS-TRRH gradients stay almost the same<br />
during these iterations, stalling the DIIS-TRRH optimization<br />
but not the TRSCF optimization, whose energy decreases in<br />
each iteration. In the pure DIIS optimization, by contrast, the<br />
gradient changes significantly from iteration to iteration; at<br />
the same time, the energy decreases at each iteration except<br />
the fifth, where also the gradient norm increases. Eventually,<br />
DIIS enters the local region with its rapid rate of convergence<br />
although we note, in the DIIS panel in Fig. 9, a sudden,<br />
large increase in the energy for the cadmium complex<br />
FIG. 9. The convergence in LDA calculations for a variety of molecules<br />
using the TRSCF-LS, TRSCF, DIIS, and DIIS-TRRH approaches, respectively.<br />
The molecules being a zinc complex , rhodium complex ,<br />
cadmium complex , CH 3 CHO , and H 2 O .<br />
calculation in iterations 10 and 11. However, these<br />
changes are accompanied with large increases in the gradient<br />
norm, allowing DIIS to recover safely.<br />
VIII. CONCLUSIONS<br />
FIG. 8. The convergence of different algorithms in a LDA/STO-3G computation<br />
with Hückel start guess for the zinc complex in Fig. 5. The algorithms<br />
being DIIS , TRSCF , and TRSCF-LS .<br />
In this paper, the trust-region SCF TRSCF algorithm<br />
introduced in Ref. 6 has been further developed to make it<br />
applicable to the optimization of the Kohn–Sham energy. In<br />
the TRSCF method, both the Roothaan–Hall step and the<br />
density-subspace minimization DSM step are replaced by<br />
optimizations of local energy models of the Hartree–Fock/<br />
Kohn–Sham energy E SCF . These local models have the same<br />
gradient as the true energy E SCF but an approximate Hessian.<br />
Restricting the steps of the TRSCF algorithm to the trust<br />
region of these local models, that is, to the region where the<br />
local models approximate E SCF well, smooth and fast convergence<br />
may be obtained to the optimized energy.<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-16 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />
TABLE VI. The gradient norm g=4SDF−FDS in the first six iterations of the cadmium complex calculations<br />
seen in Fig. 9.<br />
DIIS DIIS-TRRH TRSCF<br />
Iteration<br />
E KS g E KS g E KS g<br />
1 −5597.0 7.8 −5597.0 7.8 −5597.0 7.8<br />
2 −5502.3 14.9 −5598.4 7.2 −5598.3 7.1<br />
3 −5602.1 9.7 −5600.3 8.5 −5603.7 9.3<br />
4 −5628.5 2.1 −5599.9 7.7 −5611.1 9.1<br />
5 −5627.4 3.5 −5599.9 7.8 −5616.8 7.7<br />
6 −5628.8 0.8 −5600.2 8.1 −5622.7 7.5<br />
conv no conv conv<br />
In the previous implementation of the TRSCF algorithm,<br />
the focus was on the optimization of the Hartree–Fock energy.<br />
As the Kohn–Sham energy is nonquadratic in the density<br />
matrix, the local DSM energy model has been generalized<br />
and is now expanded about the current-density matrix<br />
D 0 in the subspace of the density matrices D i of the previous<br />
iterations. To satisfy the idempotency condition, the energy<br />
model function is parametrized in terms of a purified averaged<br />
density matrix. The local energy function is correct to<br />
second order in D i −D 0 and can be set up solely in terms of<br />
the density matrices and Kohn–Sham matrices of the previous<br />
iterations. In the Hartree–Fock theory, the new local energy<br />
model is identical to the one previously used in TRSCF<br />
optimizations.<br />
The EDIIS function is discussed in the context of the<br />
proposed model. In the Hartree–Fock theory, the EDIIS function<br />
is obtained from our proposed energy function by neglecting<br />
terms that result from the purification of the density<br />
matrix; the EDIIS function therefore does not reproduce the<br />
Hartree–Fock gradient at the expansion point. In the DFT,<br />
the EDIIS function is inappropriate for other reasons as well.<br />
A rederivation of the original DIIS algorithm is also performed<br />
to understand when it can safely be applied. In particular,<br />
it is shown that the DIIS method may be viewed as a<br />
quasi-Newton method, thus explaining its fast local convergence.<br />
In the global region, its behavior is less predictable,<br />
although we note that its gradient-norm minimization mechanism<br />
usually allows it to recover safely from sudden, large<br />
increases in the total energy brought on by the Roothaan–<br />
Hall iterations.<br />
The TRSCF scheme is tested both in a computationally<br />
demanding, robust line-search implementation TRSCF-LS,<br />
and in our standard implementation, where only the Fock/<br />
Kohn–Sham matrices of previous iterations are used. Our<br />
test calculations indicate not only that the TRSCF-LS<br />
method is a highly stable and robust method, but also that the<br />
standard TRSCF implementation converges rapidly in most<br />
cases, with little degradation relative to the TRSCF-LS<br />
scheme.<br />
Relative to these schemes, the DIIS method is somewhat<br />
more erratic since it makes no use of Hessian information<br />
and therefore cannot predict reliably what directions will reduce<br />
the total energy. For example, in situations where the<br />
energy changes in the course of the iterations but the gradient<br />
does not, the DIIS algorithm is unable to identify the density<br />
matrix with the lowest energy and may diverge. Nevertheless,<br />
the DIIS method handles most optimizations amazingly<br />
well, which is particularly impressive in view of its very<br />
simplicity; never has so few lines of code done so much<br />
good for so many calculations. In general, however, it is<br />
outperformed by the TRSCF method, which introduces Hessian<br />
information at little extra cost, and is well founded in<br />
the global as well as local regions of the optimization.<br />
The current formulation of TRSCF requires a few diagonalizations<br />
in each TRRH step, and to obtain linear scaling<br />
these diagonalizations should be avoided. An even more efficient<br />
algorithm may be obtained if the Roothaan–Hall and<br />
DSM steps are integrated in such a manner that the information<br />
from the previous density matrices are directly used in<br />
the Roothaan–Hall optimization step. Work along these lines<br />
is in progress.<br />
ACKNOWLEDGMENTS<br />
We thank Peter Taylor, Ditte Jørgensen, and Stephan<br />
Sauer for providing some of the test examples. This work has<br />
been supported by the Danish Natural Research Council. We<br />
also acknowledge support from the Danish Center for Scientific<br />
Computing DCSC.<br />
1 C. C. J. Roothaan, Rev. Mod. Phys. 23, 691951.<br />
2 G. G. Hall, Proc. R. Soc. London A205, 541 1951.<br />
3 P. Pulay, Chem. Phys. Lett. 73, 393 1980; J. Comput. Chem. 3, 556<br />
1982.<br />
4 K. N. Kudin, G. E. Scuseria, and E. Cancès, J. Chem. Phys. 116, 8255<br />
2002.<br />
5 G. Karlström, Chem. Phys. Lett. 67, 348 1979.<br />
6 L. Thøgersen, J. Olsen, D. Yeager, P. Jørgensen, P. Sałek, and T. Helgaker,<br />
J. Chem. Phys. 121, 162004.<br />
7 R. Fletcher, Practical Methods of Optimization, 2nd ed. Wiley, New<br />
York, 1987.<br />
8 V. R. Saunders and I. H. Hillier, Int. J. Quantum Chem. 7, 6991973.<br />
9 J. B. Francisco, J. M. Martínez, and L. Martínez, J. Chem. Phys. 121, 22<br />
2004.<br />
10 W. Koch and M. C. Holthausen, A Chemist’s Guide to Density Functional<br />
Theory Wiley-VCH, Weinheim, 2000.<br />
11 T. Helgaker, P. Jørgensen, and J. Olsen, Molecular Electronic Structure<br />
Theory Wiley & Son, ltd., Chichester, 2000.<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
074103-17 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />
12 R. Seeger and J. A. Pople, J. Chem. Phys. 65, 265 1976.<br />
13 G. B. Bacskay, Chem. Phys. 61, 385 1981; J. Phys. France 35, 639<br />
1982.<br />
14 P. Jørgensen, P. Swanstrøm, and D. Yeager, J. Chem. Phys. 78, 347<br />
1983.<br />
15 R. McWeeny, Rev. Mod. Phys. 32, 335 1960.<br />
16 X. P. Li, W. Nunes, and D. Vanderbilt, Phys. Rev. B 47, 10891 1993.<br />
17 J. M. Millam and G. E. Scuseria, J. Chem. Phys. 106, 5569 1997.<br />
18 C. Ochsenfeld and M. Head-Gordon, Chem. Phys. Lett. 270, 3391997.<br />
19 X. Li, J. M. Millam, G. E. Scuseria, M. J. Frisch, and H. B. Schlegel, J.<br />
Chem. Phys. 119, 7651 2003.<br />
20 T. Helgaker, H. J. Jensen, P. Jørgensen et al., DALTON, a molecular electronic<br />
structure program, Release 2.0, 2004; http://www.kjemi.uio.no/<br />
software/dalton<br />
21 A. Schäfer, H. Horn, and R. Ahlrichs, J. Chem. Phys. 97, 2571 1992.<br />
22 T. H. Dunning, J. Chem. Phys. 90, 10071989.<br />
Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp
Part 3<br />
A Coupled Cluster and Full Configuration Interaction Study of CN and CN - ,<br />
L. Thøgersen and J. Olsen,<br />
Chem. Phys. Lett. 393, 36 (2004)
Chemical Physics Letters 393 (2004) 36–43<br />
www.elsevier.com/locate/cplett<br />
A coupled cluster and full configuration interaction<br />
study of CN and CN<br />
Lea Thøgersen, Jeppe Olsen *<br />
Department of Chemistry, Theoretical Chemistry, University of Aarhus, DK-8000 Aarhus, Denmark<br />
Received 30 April 2004; in final form 27 May 2004<br />
Abstract<br />
Full configuration interaction (FCI) and coupled cluster (CC) calculations are carried out for the CN radical and CN using the<br />
cc-pVDZ and an augmented cc-pVDZ basis set. In addition, CC calculations including up to quadruple excitations are carried out<br />
using the cc-pVTZ basis. At the FCI level, the equilibrium distance is 1.1969 A, the harmonic frequency is 2020.1 cm 1 , the<br />
electronic contribution to the atomization energy is 667 kJ/mol and the vertical electron affinity is 0.12962 E h . The contributions<br />
from quadruple and quintuple excitations to the harmonic frequency are found to be 20 and 5 cm 1 , respectively. The quadruple<br />
excitations give a contribution of 4 kJ/mol to the atomization energy and 0.00013 E h to the vertical electron affinity. None of the<br />
calculations indicate that the convergence of the CC hierarchy is slower for open-shell than for closed-shell systems.<br />
Ó 2004 Elsevier B.V. All rights reserved.<br />
1. Introduction<br />
* Corresponding author. Fax: +45-861-961-99.<br />
E-mail address: jeppe@chem.au.dk (J. Olsen).<br />
The last decade has witnessed significant improvements<br />
in the reliability of ab initio quantum chemical<br />
predictions of spectroscopical and thermochemical data.<br />
For closed shell molecules, equilibrium geometries [1],<br />
harmonic frequencies [2] and reaction enthalpies [3,4]<br />
may often be calculated with an accuracy that is equal to<br />
or better than the experimental accuracy. Of central<br />
importance for this development has been the developments<br />
of hierarchies of basis sets [5], and CC methods<br />
[6–8]. The coupled cluster (CC) method mostly used for<br />
accurate calculations is the CCSD(T) method [9] which<br />
augments the CC method including single and double<br />
excitations (CCSD) [10] with a perturbative estimate of<br />
triples contributions. For closed shell molecules, the<br />
CCSD(T) method often exaggerates the contributions<br />
from triple excitations [11]. As the signs of the triple and<br />
quadruple corrections usually are identical, CCSD(T)<br />
often gives results that are better than the CC method<br />
including all single, double, and triple excitations<br />
(CCSDT). The CCSD(T) method therefore often provides<br />
results in surprisingly good agreement with the<br />
much more expensive CC method including up to quadruple<br />
excitations (CCSDTQ) [12]. Using triple-f basis<br />
sets, the CCSD(T) method is especially accurate for<br />
properties like internuclear distances and frequencies, as<br />
the remaining basis-set errors and correlation errors<br />
here usually are of opposite signs [1].<br />
For open-shell molecules, CC methods with and<br />
without spin-adaptation have been developed [7,13], and<br />
the accuracy of CC calculations often matches the accuracy<br />
obtained for closed shell molecules. In a study of<br />
the atomization energies of 11 small molecules [2], Feller<br />
and Sordo did not observe any systematic difference<br />
between the accuracies obtained for closed- and openshell<br />
molecules when the CCSDT method is used. The<br />
performance of methods including perturbative estimates<br />
of triple excitations as the CCSD(T) method is<br />
less convincing for open-shell molecules. In a systematic<br />
study of the performance of the CCSD(T) method for<br />
the calculation of spectroscopical constants for 33 small<br />
radicals [14], it was observed that the CCSD(T) method<br />
did not provide constants that were significant more<br />
accurate than those obtained with the CCSD method.<br />
Several workers have suggested other methods combining<br />
CCSD with the perturbative treatment of triple<br />
0009-2614/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved.<br />
doi:10.1016/j.cplett.2004.06.001
L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43 37<br />
excitations, but these alternative corrections do not<br />
systematically perform better than the CCSD(T) method<br />
[15].<br />
The Schr€odinger equation within the Born–Oppenheimer<br />
approximation may be solved in a given oneelectron<br />
basis set using full configuration interaction<br />
(FCI) calculations. In an FCI calculation, the wave<br />
function includes all Slater determinants with correct<br />
spin, symmetry and number of electrons. For a given<br />
basis-set, FCI calculations eliminate the error due to<br />
truncation of the many-electron basis, and provide<br />
therefore important benchmarks for approximate orbital-based<br />
methods. As the number of determinants in<br />
the FCI expansion increase exponentially with the<br />
number of basis functions and electrons, FCI calculations<br />
may only be carried out for small molecules using<br />
basis sets of double- or triple-f quality. For small closed<br />
shell molecules, a number of FCI calculations have been<br />
published [16,17], and these have given additional insight<br />
into the accuracy of standard correlation methods.<br />
For open-shell molecules, the number of FCI calculations<br />
is more limited. Except for a recent FCI investigation<br />
of the geometry of the CCH radical [18], no FCI<br />
calculations have been published for open-shell molecules<br />
with eight or more valence electrons using a correlation-consistent<br />
basis-set [5]. The present study fills<br />
this gab by providing an FCI benchmark for the openshell<br />
molecule CN using the cc-pVDZ basis [5]. This<br />
molecule is sufficiently small to allow FCI calculations<br />
at numerous geometries, allowing the determination of<br />
the FCI results for the equilibrium bond length, harmonic<br />
frequency, and dissociation energy, as well as the<br />
complete potential curve. We will furthermore study the<br />
convergence of the CC energy as a function of the excitation-level<br />
to see if an open-shell molecule exhibits the<br />
same convergence pattern as previously determined for<br />
closed-shell molecules [19–23]. The vertical electron affinity<br />
will also be examined using CC and FCI calculations.<br />
As the cc-pVDZ basis does not provide accurate<br />
geometries or energetics [8], we will obtain the equilibrium<br />
geometry, harmonic frequency and dissociation<br />
energy using the cc-pVTZ basis set [5] and CC calculations<br />
including up to quadruple excitations. We hope<br />
that the data obtained here will assist in the analysis of<br />
the accuracy of various open-shell perturbation and CC<br />
methods, and especially the methods supplementing<br />
CCSD with perturbative estimates of triple excitations.<br />
2. Computational methods<br />
The FCI and CC calculations were carried out using<br />
the LUCIA<br />
program [24]. The algorithms for performing<br />
configuration interaction calculations are based on extensive<br />
modifications of the algorithms originally published<br />
in [25]. The CC code allows arbitrary excitation<br />
levels out from a single closed shell or high-spin open<br />
shell determinant. In contrast to the initial general CC<br />
codes [19], the present codes [26] exhibit the same scaling<br />
as the standard spin–orbital codes using explicitly coded<br />
contractions. Another set of general CC codes with the<br />
right scaling has been developed by Kallay and coworkers<br />
[20,21], and a less efficient general CC code has<br />
been developed by Hirata and Bartlett [22].<br />
All calculations kept the lowest two sigma-orbitals,<br />
corresponding to 1s(C) and 1s(N), doubly occupied. The<br />
open-shell configuration interaction and CC calculations<br />
used orbitals from restricted Hartree–Fock calculations.<br />
No spin-adaptation was done in the open-shell<br />
CC calculations. The integrals and HF-orbitals were<br />
obtained using the DALTON<br />
program [27].<br />
In the following, the different spaces of determinants<br />
or excitations are denoted SD, SDT, SDTQ, SDTQ5,<br />
SDTQ56, SDTQ567 for the spaces including up to<br />
2,3,4,5,6,7 excitations from the occupied spin–orbitals.<br />
For open-shell molecules, an alternative way of classifying<br />
excitations is to consider changes in orbital-occupations<br />
instead of spin–orbital occupations [28]. All CI<br />
calculations in the following are based on changes of<br />
orbital-occupations, whereas we will discuss CC calculations<br />
based on both divisions of excitations. Excitation<br />
spaces based on changes of spin–orbital occupations will<br />
be denoted (spin–orb), whereas the spaces based on<br />
changes of orbital occupations will be denoted (orb).<br />
Thus, the CCSD(spin–orb) excitation space contains all<br />
single and double spin–orbital excitations.<br />
Using the cc-pVDZ basis FCI, CI and CC calculations<br />
were carried out. To examine the contributions<br />
from quadruple excitations in a larger basis, CCSD,<br />
CCSDT, and CCSDTQ calculations were performed<br />
with the cc-pVTZ basis. For calculations of the electron<br />
affinity, the aug-cc-pVDZ [29] basis set without diffuse<br />
d-functions was used for CN and CN . The latter basis<br />
is in the following called the aug 0 -cc-pVDZ basis.<br />
3. Results<br />
3.1. Convergence of CC and CI at the experimental<br />
equilibrium geometry<br />
At the experimental equilibrium distance (1.1718 A)<br />
[30], the FCI wave function and energy was obtained<br />
with an energy convergence threshold of 10 9 E h . The<br />
FCI energy was obtained as )92.493262415 E h . At the<br />
same internuclear distance, single reference CI and CC<br />
energies were obtained with excitation levels from 2 to 7.<br />
In Table 1, we give the deviations of the CI, CC(orb)<br />
and CC(spin–orb) energies from the FCI energy. Fig. 1<br />
is a single-logarithmic plot of these deviations.<br />
The coupled-cluster energies using orbital-occupations<br />
to define the excitation level are slightly below the
38 L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43<br />
Table 1<br />
Deviations of single reference CI- and CC-energies (E h ) from the FCI energy for CN<br />
Largest exc. level E CI E FCI E CC ðorbÞ E FCI E CC ðspin–orbÞ E FCI<br />
2 0.038240 0.015534 0.016517<br />
3 0.022604 0.001563 0.001637<br />
4 0.002391 0.000207 0.000230<br />
5 0.000583 0.000019 0.000021<br />
6 0.000031 0.000001 0.000002<br />
7 0.000002 – –<br />
0.1<br />
0.01<br />
Coupled Cluster(spin-orb)<br />
Coupled cluster(orb)<br />
Configuration Interaction<br />
Deviation from FCI energy<br />
0.001<br />
0.0001<br />
1e-05<br />
1e-06<br />
2 3 4 5 6<br />
Excitation level<br />
Fig. 1. The deviations (E h ) of CI and CC energies from the FCI energy as a function of excitation level for CN using the cc-pVDZ basis set.<br />
energies using the smaller spaces based on spin–orbital<br />
occupations. However, the differences between the two<br />
choices are not significant compared to the deviations<br />
from the FCI energy. Up to CCSDTQ5, the differences<br />
between the two forms constitute at most 10% of the<br />
deviation from the FCI energy. For the CCSDTQ56<br />
expansions, the large difference between the two deviations<br />
in Table 1 is caused by roundoff errors. Including<br />
an additional digit, the CCSDTQ56 deviations are<br />
0.0000015 and 0.0000013 E h for the spin–orbital and<br />
orbital based divisions, respectively.<br />
The CI-curves exhibit the behavior predicted by<br />
perturbation theory [31]: the even-order excitations give<br />
significantly larger reductions in the deviations than the<br />
odd-order excitations. For CC expansions, perturbation<br />
theory also predicts that adding even order excitations<br />
give larger reductions in the deviations than adding odd<br />
order excitations [8,31]. This is not observed in Fig. 1, as<br />
the deviations of the CC energies nearly form straight<br />
lines. Comparing the convergence of the CI and CC<br />
hierarchies, it is observed that the CCSDT deviation is<br />
slightly smaller than the CISDTQ error, and that the CC<br />
energy obtained using up to n-fold excitations is as accurate<br />
as the CI energy using up to n þ 1 fold excitations,<br />
but less accurate than the CI energy using up to<br />
n þ 2 fold excitations. To obtain an accuracy of 1 mE h<br />
or less, one must include up to quadruple excitations for<br />
the CC expansion, and up to quintuple excitations for<br />
the CI expansion.<br />
The convergence patterns for CI and CC discussed<br />
above are very similar to the convergence patterns previously<br />
reported for N 2 [23]. The similarity between the<br />
convergences of CN and N 2 is more than qualitative. If<br />
one combines the deviation curve for N 2 [23] with the<br />
present deviation curve for CN in a single figure, the two<br />
deviation curves are virtually identical. The deviations<br />
of the CCSDT energies are thus 0.00156 E h and 0.00163<br />
E h for CN and N 2 , respectively, and for a given excitation<br />
level the deviations for CN and N 2 differ by at<br />
most 10%.
L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43 39<br />
From the above comparisons, it may be concluded,<br />
that the open-shell nature of CN does not lead to slower<br />
convergence of the CC hierarchy than previously observed<br />
for N 2 . However, it should be noted, that the<br />
convergence of the CC hierarchy for N 2 is rather slow<br />
compared to the convergence observed for e.g., H 2 O [19]<br />
and F 2 .<br />
3.2. The potential curve for CN<br />
FCI calculations were carried out at a number of<br />
internuclear distances. To obtain accurate spectroscopic<br />
constants, the CC energies were converged to 10 9 E h<br />
Table 2<br />
FCI energies (E h ) as a function of internuclear distance ( A) for CN<br />
using the cc-pVDZ basis<br />
R E R E<br />
0.9 )92.169732 1.2118 )92.494065103<br />
1.0 )92.384032 1.2169 )92.493765608<br />
1.0918 )92.469313943 1.2369 )92.491837833<br />
1.1318 )92.485677652 1.2518 )92.489691918<br />
1.1518 )92.490432414 1.30 )92.479361<br />
1.1569 )92.491327096 1.40 )92.447147<br />
1.1718 )92.493262415 1.50 )92.408657<br />
1.1769 )92.493704946 1.60 )92.370048<br />
1.1869 )92.494267979 1.7577 )92.316388<br />
1.1918 )92.494402963 2.05065 )92.255688<br />
1.1919 )92.494404785 2.3436 )92.241450<br />
1.1969 )92.494449358 2.9295 )92.240346<br />
1.2019 )92.494404774 3.5154 )92.239697<br />
1.2069 )92.494274026<br />
for internuclear distances close to the experimental value.<br />
For the remaining geometries the energy was converged<br />
to 10 6 E h . The obtained FCI energies are listed<br />
in Table 2. The graph of the FCI potential curve is given<br />
in Fig. 2.<br />
To associate the various internuclear distances with a<br />
degree of bond-breaking it is useful to examine the coefficient<br />
of the Hartree–Fock determinant in the FCI<br />
wave-function. Around the equilibrium geometry, the<br />
weight of the HF-determinant is about 0.92. Increasing<br />
the internuclear distance leads to a steady lowering of<br />
this weight and at 1.3 and 1.8 A, the weights are 0.79<br />
and 0.57, respectively. From 1.8 to 2.0 A the weight<br />
drops sharply so the weight at 2.0 A is 0.25 and at 2.5 A<br />
less than 0.04. We may therefore say that the bond is<br />
half broken at 1.8 A and broken at 2.5 A.<br />
In addition to FCI calculations, CCSD(orb),<br />
CCSDT(orb) and CCSDTQ(orb) calculations were<br />
performed at the various internuclear distances up to 1.8<br />
A. Although it is possible to converge the CC equations<br />
for larger distances, we find this of less interest, due to<br />
the breakdown of the single-reference approximation. In<br />
Fig. 3, we plot the deviations of the CCSDT and<br />
CCSDTQ energies from the FCI energy, and in Table 3,<br />
we list the non-parallelity error (NPE), i.e., the difference<br />
between the largest and smallest deviation from the<br />
FCI energy.<br />
At the equilibrium distance, both deviation curves in<br />
Fig. 3 have a positive curvature. For internuclear distances<br />
larger than the equilibrium distance, both the<br />
CCSDT and CCSDTQ deviation curves are nearly<br />
-92.15<br />
-92.20<br />
-92.25<br />
FCI energy<br />
-92.30<br />
-92.35<br />
-92.40<br />
-92.45<br />
-92.50<br />
0.5 1 1.5 2 2.5 3 3.5 4<br />
Internuclear distance<br />
Fig. 2. The FCI potential curve for CN using the cc-pVDZ basis. The energies are in Hartrees and the inter-nuclear distances are in A.
40 L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43<br />
0.007<br />
0.006<br />
CCSDT<br />
CCSDTQ<br />
Deviation from FCI energy<br />
0.005<br />
0.004<br />
0.003<br />
0.002<br />
0.001<br />
0<br />
0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8<br />
Internuclear distance<br />
Fig. 3. The difference between the CCSDT and CCSDTQ energies and the FCI energy for CN using the cc-pVDZ basis. The energies are in Hartrees<br />
and the inter-nuclear distances are in A.<br />
Table 3<br />
Non-parallelity error (NPE) (E h ) for CCSD, CCSDT, and CCSDTQ<br />
Method<br />
NPE<br />
CCSD 0.042326<br />
CCSDT 0.006355<br />
CCSDTQ 0.001742<br />
linear functions of the internuclear distance. Actually,<br />
the slope of the CCSDT deviation is smaller for larger<br />
internuclear distances than for the equilibrium distance.<br />
The analogous CCSDT- and CCSDTQ-curves for the<br />
nitrogen molecule exhibit maxima for an internuclear<br />
distance around 1.5 A, (3 au) [23].<br />
3.3. Spectroscopical constants for CN<br />
Equilibrium geometries and harmonic frequencies<br />
were obtained for the CCSD, CCSDT, CCSDTQ and<br />
FCI methods using quartic interpolation of the energies.<br />
The harmonic frequency for a given method was evaluated<br />
at the equilibrium geometry of this method. In<br />
Table 4 we list the obtained equilibrium distances and<br />
frequencies. In addition, the table contains the CCSD,<br />
CCSDT and CCSDTQ results for the cc-pVTZ basis.<br />
We will first discuss the results obtained using the ccpVDZ<br />
basis. The CC calculations using orbital-based<br />
excitation spaces are slightly more accurate than those<br />
using spin–orbital-based excitation spaces, but the differences<br />
are small compared to the size of the deviations.<br />
We will therefore, discuss only the spin–orbital based<br />
Table 4<br />
Equilibrium distance ( A) and harmonic frequency (cm 1 ) for CN<br />
CCSD(orb) cc-pVDZ 1.1860 2111<br />
CCSDT(orb) cc-pVDZ 1.1946 2043<br />
CCSDTQ(orb) cc-pVDZ 1.1964 2025<br />
CCSD(spin–orb) cc-pVDZ 1.1855 2114<br />
CCSDT(spin–orb) cc-pVDZ 1.1944 2046<br />
CCSDTQ(spin–orb) cc-pVDZ 1.1964 2026<br />
FCI cc-pVDZ 1.1969 2020.1<br />
CCSD(spin–orb) cc-pVTZ 1.1688 2136<br />
CCSDT(spin–orb) cc-pVTZ 1.1783 2067<br />
CCSDTQ(spin–orb) cc-pVTZ 1.1804 2045<br />
Expt. 1.1718 2069<br />
excitation spaces. Since the deviation curves for the CC<br />
energies are increasing functions, the CC equilibrium<br />
distances are necessarily shorter than the FCI equilibrium<br />
distance. The causes of the errors of the harmonic<br />
frequencies will be discussed in detail below. At the<br />
CCSD level, the distance is 0.01 A shorter than the FCI<br />
value and the harmonic frequency is about 90 cm 1<br />
larger than the FCI value, stressing the inaccuracy of<br />
this method for predicting equilibrium properties. The<br />
errors are significantly reduced by the CCSDT method<br />
with errors of 0.0025 A and 26 cm 1 for the equilibrium<br />
distance and frequency, respectively. The errors are<br />
further reduced by about a factor of five by using the<br />
CCSDTQ instead of the CCSDT method. At the<br />
CCSDTQ level, the equilibrium geometry is only 0.0005<br />
R eq<br />
x e
L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43 41<br />
A smaller than the FCI value, but the frequency is 6<br />
cm 1 too large. The deviations of the various CC<br />
methods obtained here for CN are very similar to the<br />
previously obtained deviations for N 2 . Thus, it has<br />
previously been reported that the contribution from<br />
connected quadruple excitations to the harmonic frequency<br />
for this molecule [23,32] is 20 cm 1 .<br />
It is currently not feasible to obtain FCI energies for<br />
CN in the cc-pVTZ basis with an accuracy that is<br />
sufficient to obtain the frequency with an accuracy of<br />
1cm 1 or less. One can instead estimate the convergence<br />
by examining the changes in the constants through the<br />
CC hierarchy. It is seen from Table 4 that the changes<br />
between the CCSDT and CCSDTQ results are very<br />
similar in the cc-pVDZ and cc-pVTZ basis sets. In both<br />
basis sets, the quadruple excitations increase the distance<br />
by 0.0020 A and reduce the harmonic frequency<br />
by about 20 cm 1 . This suggests that it may be feasible<br />
to obtain the quadruple corrections to these constants in<br />
rather small basis sets. It should be noted that although<br />
the quadruple corrections to the properties are rather<br />
constant, the quadruples corrections to the raw energies<br />
are very different in the two basis sets.<br />
The errors of the harmonic frequencies arise from<br />
two sources. First of all, the positive curvatures of the<br />
CC deviation curves around the equilibrium geometries<br />
lead to CC frequencies that are larger than the FCI<br />
frequency. Furthermore, as the third derivative of the<br />
energies with respect to the distance in general is large<br />
and negative, the somewhat shorter internuclear distances<br />
obtained with the CC methods than with FCI<br />
lead also to frequencies that are too large. These two<br />
sources of errors may be analyzed in the cc-pVDZ basis<br />
by evaluating the CC frequencies at the FCI equilibrium<br />
geometry. For the orbital based methods one then obtains<br />
the frequencies 2035, 2027 and 2022 cm 1 for the<br />
CCSD, CCSDT and CCSDTQ methods. Whereas, the<br />
CCSDT frequency evaluated at the optimized CCSDT<br />
distance deviates from the FCI frequency by 23 cm 1 ,<br />
the CCSDT frequency evaluated at the FCI geometry<br />
thus deviates by only 7 cm 1 . Although the errors connected<br />
with the positive curvatures of the deviation<br />
curves are not vanishing, the major errors of the frequencies<br />
seem to arise from the errors of the equilibrium<br />
distances.<br />
The experimental values for the equilibrium distance<br />
and the harmonic frequency are 1.1718 A and 2069<br />
cm 1 , respectively, [30]. Comparing the results obtained<br />
using the cc-pVTZ basis to the experimental values, it is<br />
observed that the CCSDT results are in better agreement<br />
with experiment than the CCSDTQ results. A<br />
better estimate of the importance of the quadruples<br />
corrections may be obtained using CCSDT results for<br />
large basis sets. Feller and Sordo [2] have calculated the<br />
CCSDT spectroscopic constants for CN using the augcc-pVQZ<br />
basis and obtained the equilibrium distance<br />
1.1739 A and the harmonic frequency of 2082 cm 1 .<br />
Adding our quadruples correction to these CCSDT results<br />
gives an equilibrium geometry of 1.1759 A and a<br />
harmonic frequency of 2060 cm 1 . To obtain spectroscopic<br />
constants that are significantly more accurate<br />
than the CCSDT results, other corrections, most important<br />
core-correlation contributions, must be included<br />
together with the quadruple excitations.<br />
3.4. Atomization energy<br />
It has previously been reported that quadruple and<br />
even quintuple excitations may be important to obtain<br />
atomization energies with high accuracy [3,4,12] In<br />
Table 5, we list the atomization energies using the<br />
CCSD, CCSDT, CCSDTQ, and FCI approaches with<br />
the cc-pVDZ basis and the CCSD, CCSDT, and<br />
CCSDTQ approaches with the cc-pVTZ basis set. All<br />
molecular calculations were carried out at the experimental<br />
equilibrium distance.<br />
It is again noticed that there are no significant difference<br />
between the results obtained using the CC(orb)<br />
and CC(spin–orb) approaches. The two approaches<br />
differ by only 0.1 kJ/mol at the CCSDT and CCSDTQ<br />
levels.<br />
The quadruple excitations change the atomization<br />
energy by 4 kJ/mol with both the cc-pVDZ and the ccpVTZ<br />
basis sets. These results are in agreement with<br />
previous calculations of the contributions from connected<br />
quadruple excitations [4]. From the difference<br />
between CCSDTQ and the FCI atomization energy, it is<br />
seen that the quintuple excitations contribute 0.5 kJ/mol<br />
to the atomization energy. The above contribution from<br />
quadruple and quintuple excitations are very similar to<br />
the results previously reported for N 2 [3]. The contribution<br />
from higher excitations to the atomization energy<br />
of CN has previously been studied by Feller and Sordo<br />
[2]. They obtained a significantly smaller contribution<br />
from quadruple excitations, 0.3 kcal/mol or 1.2 kJ/mol.<br />
There are several experimental measurements of the<br />
atomization energies, and Feller and Sordo [2] quotes<br />
Table 5<br />
The electronic contribution to the dissociation energy (kJ/mol) for CN<br />
CCSD(orb) cc-pVDZ 631.6<br />
CCSDT(orb) cc-pVDZ 663.0<br />
CCSDTQ(orb) cc-pVDZ 666.5<br />
CCSD(spin–orb) cc-pVDZ 629.2<br />
CCSDT(spin–orb) cc-pVDZ 662.9<br />
CCSDTQ(spin–orb) cc-pVDZ 666.4<br />
FCI cc-pVDZ 667.0<br />
CCSD(spin–orb) cc-pVTZ 674.2<br />
CCSDT(spin–orb) cc-pVTZ 714.4<br />
CCSDTQ(spin–orb) cc-pVTZ 718.5<br />
D e
42 L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43<br />
values in the range 745–762 kJ/mol for the experimental<br />
electronic contribution. Adding our estimate of quadruples<br />
correction to the estimated CCSDT limit of 748<br />
kJ/mol result of Feller and Sordo gives a value of 752 kJ/<br />
mol for the electronic atomization energy for CN.<br />
3.5. The vertical electron affinity<br />
An FCI calculation for the CN anion using the aug 0 -<br />
cc-pVDZ basis was carried out at the experimental<br />
equilibrium geometry of the radical. The FCI calculation<br />
contains about 20 billion Slater determinants and<br />
sparsity of the CI-vectors was only used to reduce discstorage,<br />
not computation time. This FCI calculation<br />
represents one of the largest FCI calculations we hitherto<br />
have carried out. The FCI energy for the anion was<br />
obtained as )92.627391(2) E h . Combining this energy<br />
with the FCI energy of )92.497766 E h for the radical in<br />
the same basis set leads to an FCI value of 0.12962 E h<br />
for the vertical electron affinity. CC expansions using<br />
spin–orbital occupations for restrictions of excitations<br />
were also carried out for the radical and the anion in the<br />
aug 0 -cc-pVDZ basis and the resulting electron affinities<br />
are given in Table 6.<br />
As the differences between the CC calculations using<br />
orbital and spin–orbital restrictions already have been<br />
shown to be small, no orbital-restricted calculations<br />
were carried out. Already at the CCSD level, the calculated<br />
electron affinity differs from the FCI affinity by<br />
less than 1 mE h , and at the CCSDT level the calculated<br />
electron affinity differs from the FCI result by less than<br />
0.1 mE h . The deviations of the CC energies from the<br />
FCI energies for the radical and the anion are also listed<br />
in Table 6. It is seen that the high accuracy of the CC<br />
affinities is caused by cancellation of the errors of the<br />
radical and anion – the deviation of the affinity is<br />
roughly an order of magnitude smaller than the deviation<br />
of the individual energies. It is also interesting to see<br />
that the electron affinity converges from above – the CC<br />
affinities are larger than the FCI affinity. As seen from<br />
the other columns of Table 6, the CC expansion converges<br />
slightly faster for the anion than for the radical.<br />
The faster convergence of the anion may seem surprising<br />
as the anion contains one more electron than the radical<br />
but is probably caused by CN being slightly more<br />
multiconfigurational than the anion. The electron affinity<br />
of CN calculated using CC calculations in large<br />
basis sets has been the subject of several recent studies<br />
[33,34]. These studies also found small contributions to<br />
the electron affinity from triple excitations.<br />
4. Conclusion<br />
Full configuration interaction calculations using the<br />
cc-pVDZ basis and CC calculations using the cc-pVDZ<br />
and cc-pVTZ basis sets have been carried out for the CN<br />
radical at various geometries. Single reference configuration<br />
interaction calculations were also carried out<br />
using the cc-pVDZ basis at the experimental internuclear<br />
distance. At the CCSDT level, the energies differ<br />
from the FCI energy by 1.5 mE h , and at the CCSDTQ<br />
level, the energies are 0.2 mE h from the FCI energy. The<br />
CC energies converge toward the FCI energy in an approximately<br />
linear fashion with a decrease in the deviation<br />
by about a factor of 10 for each added excitation<br />
level. This is in contrast to an analysis based on perturbation<br />
theory, predicting that adding even orders<br />
give larger decreases in the deviations than adding odd<br />
orders. The observed convergence for CN in the ccpVDZ<br />
basis is very similar to the convergence previously<br />
reported for N 2 , indicating that the open-shell nature of<br />
CN does not affect the convergence. A comparison of<br />
the FCI and CC energies at various internuclear distances,<br />
reveals that the deviations of the CC approaches<br />
do not occur suddenly for large internuclear distances.<br />
The deviations are instead nearly linear functions of the<br />
internuclear distance.<br />
At the FCI level, the equilibrium geometry and harmonic<br />
frequency are obtained as 1.1969 A and 2020.1<br />
cm 1 , respectively. The CCSDT and CCSDTQ frequencies<br />
are 25 and 5 cm 1 above the FCI value, respectively.<br />
The quadruple corrections to both the<br />
equilibrium distance and the harmonic frequency were<br />
found to be nearly identical in the cc-pVDZ and ccpVTZ<br />
basis sets. The major errors of the CC frequencies<br />
come from the errors of the distances where these are<br />
evaluated.<br />
For the electronic contribution to the atomization<br />
energy, a value of 667.0 kJ/mol is obtained at the FCI<br />
level using the cc-pVDZ basis set. The CCSDT and<br />
CCSDTQ atomization energies are 4 and 0.5 kJ/mol<br />
below the FCI atomization energy, respectively. The<br />
quadruple contributions in the cc-pVDZ and cc-pVTZ<br />
Table 6<br />
The vertical electron affinity (E h ) of CN calculated in the aug 0 -cc-pVDZ basis<br />
EA EA EA FCI E CN EFCI CN<br />
E CN –EFCI<br />
CN<br />
CCSD(spin–orb) 0.13025 0.00063 0.01529 0.01466<br />
CCSDT(spin–orb) 0.12977 0.00014 0.00154 0.00140<br />
CCSDTQ(spin–orb) 0.12966 0.00003 0.00020 0.00016<br />
FCI 0.12962
L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43 43<br />
basis are determined as 3.5 and 4.1 kJ/mol, respectively,<br />
indicating that a reliable estimate of quadruple contributions<br />
may be obtained using rather small basis sets.<br />
The FCI vertical electron affinity is obtained in the<br />
aug 0 -cc-pVDZ basis as 0.12962 E h . Due to extensive<br />
cancellations of errors, the FCI affinity is accurately<br />
calculated at the CCSD and CCSDT levels with a contribution<br />
from quadruple and higher excitations of<br />
0.00014 E h . The CC hierarchy approaches the FCI affinity<br />
from above, as the deviations for the anion are<br />
slightly smaller than for the radical.<br />
Acknowledgements<br />
The work has been supported by the Danish Research<br />
Council (Grant No. 9901973). The calculations<br />
were carried out at the centre for supercomputing at<br />
University of Aarhus (CSCAA). The support from the<br />
Danish Centre for Supercomputing (DCSC) is gratefully<br />
acknowledged.<br />
References<br />
[1] F. Pawlowski, P. Jørgensen, J. Olsen, F. Hegelund, T. Helgaker,<br />
J. Gauss, K.L. Bak, J.F. Stanton, J. Chem. Phys. 116 (2002) 6482.<br />
[2] D. Feller, J.A. Sordo, J. Chem. Phys. 113 (2000) 485.<br />
[3] T. Helgaker, W. Klopper, A. Halkier, K.L. Bak, P. Jørgensen,<br />
J. Olsen, in: J. Cioslowski, (Ed.), Understanding Chemical<br />
Reactivity, vol. 22, Kluwer, Dordrecht, p. 1, 2001.<br />
[4] A.D. Boese, M. Oren, O. Atasoylu, J.M.L. Martin, M. Kallay,<br />
J. Gauss, J. Chem. Phys. 120 (2004) 4129.<br />
[5] T.H. Dunning Jr., J. Chem. Phys. 90 (1989) 1007.<br />
[6] R.J. Bartlett, in: D.R. Yarkony (Ed.), Modern Electronic Structure<br />
Theory, Part I, 1047, World Scientific, Singapore, 1995.<br />
[7] J. Paldus, X. Li, Adv. Chem. Phys. 110 (1999) 1.<br />
[8] T. Helgaker, P. Jørgensen, J. Olsen, Molecular Electronic-Structure<br />
Theory, Wiley, 2000.<br />
[9] K. Raghavachari, G.W. Trucks, J.A. Pople, M. Head-Gordon,<br />
Chem. Phys. Lett. 157 (1989) 479.<br />
[10] G.D. Purvis, R.J. Bartlett, J. Chem. Phys. 76 (1982) 1910.<br />
[11] K.L. Bak, P. Jorgensen, J. Olsen, T. Helgaker, W. Klopper,<br />
J. Chem. Phys. 112 (2000) 9229.<br />
[12] T.A. Ruden, T.U. Helgaker, P. Jørgensen, J. Olsen, Chem. Phys.<br />
Lett. 371 (2003) 62.<br />
[13] P.G. Szalay, J. Gauss, J. Chem. Phys. 107 (1997) 9028.<br />
[14] E.F.C. Byrd, D. Sherrill, M. Head-Gordon, J. Phys. Chem. A. 105<br />
(2001) 9736.<br />
[15] S.R. Gwaltney, M. Head-Gordon, J. Chem. Phys. 115 (2001)<br />
2014.<br />
[16] J. Olsen, P. Jørgensen, H. Koch, A. Balkova, R.J. Bartlett,<br />
J. Chem. Phys. 104 (1996) 8007.<br />
[17] H. Larsen, J. Olsen, P. Jørgensen, O. Christiansen, J. Chem. Phys.<br />
113 (2000) 6677.<br />
[18] P.G. Szalay, L.S. Thøgersen, J. Olsen, M. Kallay, J. Gauss,<br />
J. Phys. Chem. A. 105 (2004) 3030.<br />
[19] J. Olsen, J. Chem. Phys. 113 (2000) 7140.<br />
[20] M. Kallay, P.R. Surjan, J. Chem. Phys. 113 (2000) 1359.<br />
[21] M. Kallay, P.R. Surjan, J. Chem. Phys. 115 (2001) 2945.<br />
[22] S. Hirata, R.J. Bartlett, Chem. Phys. Lett. 321 (2000) 216.<br />
[23] J.W. Krogh, J. Olsen, Chem. Phys. Lett. 344 (2001) 578.<br />
[24] LUCIA, a general CI and CC code written by J. Olsen, University<br />
of Aarhus with contributions from H. Larsen, M. F€ulscher.<br />
[25] J. Olsen, B.O. Roos, P. Jørgensen, H.J.Aa. Jensen, J. Chem. Phys.<br />
89 (1988) 2185.<br />
[26] J. Olsen, unpublished.<br />
[27] T. Helgaker et al DALTON, an ab initio electronic structure<br />
program, Release 1.2. see http://www.kjemi.uio.no/software/dalton/dalton.html,<br />
2001.<br />
[28] X. Li, J. Paldus, J. Chem. Phys. 101 (1994) 8812.<br />
[29] R.A. Kendall, T.H. Dunning, R.J. Harrison, J. Chem. Phys. 96<br />
(1992) 6796.<br />
[30] K.P. Huber, G. Herzberg, Molecular Spectra and Molecular<br />
Structure V. Constants of Diatomic Molecules, Van Nostrand<br />
Reinhold, New York, 1979.<br />
[31] W. Kutzelnigg, Theoret. Chim. Acta. 80 (1991) 349.<br />
[32] S.A. Kucharski, J.D. Watts, R.J. Bartlett, Chem. Phys. Lett. 302<br />
(1999) 295.<br />
[33] P. Neogrady, M. Medved, I. Cernusak, M. Urban, Mol. Phys. 100<br />
(2002) 541.<br />
[34] J.A. Sordo, J. Chem. Phys. 114 (2001) 1974.
Part 3<br />
Equilibrium Geometry of the Ethynyl (CCH) Radical,<br />
P. G. Szalay, L. Thøgersen, J. Olsen, M. Kállay and J. Gauss,<br />
J. Phys. Chem. A 108, 3030 (2004).
3030 J. Phys. Chem. A 2004, 108, 3030-3034<br />
Equilibrium Geometry of the Ethynyl (CCH) Radical †<br />
Péter G. Szalay, ‡ Lea S. Thøgersen, § Jeppe Olsen, § Mihály Kállay, | and Ju1rgen Gauss* ,|<br />
Department of Theoretical Chemistry, EötVös Loránd UniVersity, H-1518 Budapest, P.O. Box 32, Hungary,<br />
Department of Chemistry, Aarhus UniVersity, DK-8000 Aarhus C, Denmark, and Institut für Physikalische<br />
Chemie, UniVersität Mainz, D-55099 Mainz, Germany<br />
ReceiVed: September 27, 2003; In Final Form: January 15, 2004<br />
The equilibrium geometry of the ethynyl (CCH) radical has been obtained using the results of high-level<br />
quantum chemical calculations and the available experimental data. In a purely quantum chemical approach,<br />
the best theoretical estimates (1.208 Å for r CC and 1.061-1.063 Å for r CH ) have been obtained from CCSD-<br />
(T), CCSDT, MR-AQCC, and full CI calculations with basis sets up to core-polarized pentuple-zeta quality.<br />
In a mixed theoretical-experimental approach, empirical equilibrium geometrical parameters (1.207 Å for<br />
r CC and 1.069 Å for r CH ) have been obtained from a least-squares fit to the experimental rotational constants<br />
of four isotopomers of CCH which have been corrected for vibrational effects using computed vibrationinteraction<br />
constants. These geometrical parameters lead to a consistent picture with remaining discrepancies<br />
between theory and experiment of 0.001 Å for the CC and 0.006-0.008 Å for the CH distances, respectively.<br />
The corresponding r s and r 0 geometries are shown not to be representative for the true equilibrium structure<br />
of CCH.<br />
I. Introduction<br />
Considerable effort has been devoted to the determination<br />
of the structure of the ethynyl (CCH) radical in its 2 Σ + electronic<br />
ground state from the experimental 1 and the theoretical side. 2-7<br />
Presently, experimental values for ground-state rotational constants<br />
(B 0 ) for four isotopomers of CCH have been determined.<br />
For CCH, a value of 43 674.528 94(115) MHz has been reported<br />
by Müller et al. 8 in agreement with earlier measured values. 9-11<br />
For 13 CCH and C 13 CH, values of 42 077.462(1) and 42 631.382-<br />
(1) MHz have been obtained by McCarthy et al. 12 in excellent<br />
agreement with a previous report of Bogey et al. 1,13 Finally,<br />
for the deuterated form CCD, a value of 36 068.0310(96) MHz<br />
has been reported by Bogey et al. 14<br />
On the basis of the available experimental rotational constants,<br />
Bogey et al. 1 determined a so-called substitution (r s ) structure.<br />
However, the obtained bond distances are not in satisfactory<br />
agreement with corresponding calculated equilibrium values; 2-7<br />
in particular, the CH distance was unusually short (1.046 Å vs<br />
calculated values of 1.062-1.070 Å). As has been already<br />
pointed out by Bogey et al., 14 the observed discrepancy is<br />
probably due to the large amplitude bending motion in CCH<br />
which is not adequately accounted for in the substitution<br />
approach 15 that provides the r s structure. Thus, determination<br />
of the true equilibrium geometry is necessary to get a reliable<br />
picture of the structure of the ethynyl radical.<br />
Although the available rotational constants form a solid basis<br />
for the experimental determination of the r 0 and r s geometry,<br />
respectively, there is not enough experimental information<br />
available to determine the equilibrium geometry. In particular,<br />
the vibrational contributions to the rotational constants, which<br />
in principle can be determined via the complete set of vibrationrotation<br />
interaction constants, 16 cannot be obtained from the<br />
available experimental data.<br />
† Part of the special issue “Fritz Schaefer Festschrift”.<br />
‡ Eötvös Loránd University.<br />
§ Aarhus University.<br />
| Universität Mainz.<br />
As has been suggested long ago by Pulay et al. 17 and more<br />
recently by others, 18,19 quantum chemical calculations can be<br />
used to provide the lacking information. With computed<br />
vibration-rotation interaction constants (R r ), it is possible to<br />
correct experimental rotation constants for vibrational effects<br />
and to obtain the corresponding equilibrium values<br />
B e ) B 0 + 1 ∑ R r (1)<br />
2 r<br />
with the sum running over all vibrational degrees of freedom.<br />
The accuracy of such a mixed experimental-theoretical (or<br />
empirical) procedure for the determination of equilibrium<br />
geometries has recently been investigated by Pawlowski et al. 20<br />
for a set of 18 closed-shell molecules. It was concluded in this<br />
study that errors in the determined empirical bond lengths are<br />
below 0.001 Å, if the vibrational corrections to the rotational<br />
constants are calculated at a sufficiently high level such as the<br />
coupled-cluster singles and doubles (CCSD) level 21 augmented<br />
by a perturbative treatment of triple excitations (CCSD(T)) 22<br />
together with the cc-pVQZ set from Dunning’s correlationconsistent<br />
basis-set hierarchy. 23 Although it is not clear whether<br />
the same accuracy can be achieved for open-shell systems, this<br />
combined experimental-theoretical procedure opens an interesting<br />
possibility for the determination of a reliable equilibrium<br />
geometry for CCH.<br />
Alternatively, accurate equilibrium geometries can be obtained<br />
via a purely theoretical approach. Such an approach can and<br />
should take advantage of existing hierarchies of methods for<br />
the treatment of electron correlation and establish basis-set<br />
convergence by using basis-set sequences such as, for example,<br />
the correlation-consistent sets developed by Dunning and coworkers.<br />
23,24 As has been shown by Helgaker et al. 25 and more<br />
recently also by Bak et al. 26 such a procedure can lead to an<br />
accuracy of 0.002-0.003 Å in bond distances if CCSD(T)<br />
calculations together with sufficiently large basis sets are carried<br />
out. Again, this conclusion is mainly valid for closed-shell<br />
10.1021/jp036885t CCC: $27.50 © 2004 American Chemical Society<br />
Published on Web 02/17/2004
Equilibrium Geometry of Ethynyl Radical J. Phys. Chem. A, Vol. 108, No. 15, 2004 3031<br />
molecules and needs to be checked for open-shell systems, for<br />
which some further complications are expected. 27,28 Concerning<br />
the use of multireference methods, a recent study on more than<br />
60 electronic (closed- and open-shell) states of various diatomic<br />
molecules found that approaches such as, for example, the<br />
multireference-averaged quadratic coupled-cluster (MR-AQCC)<br />
method, 29,30 provide bond distances with an accuracy close to<br />
0.001 Å. As multireference methods together with a careful<br />
selection of the reference space offer a well-balanced treatment<br />
for both open- and closed-shell molecules, such calculations<br />
should be considered useful complements to single-referencebased<br />
CC calculations.<br />
The aim of the present paper is to provide an accurate<br />
equilibrium geometry for the electronic ground state of the<br />
ethynyl radical by using both procedures outlined above. The<br />
accuracy and reliability of the theoretically determined values<br />
will be carefully investigated via benchmark calculations up to<br />
the full configuration interaction (FCI) level. Calculated vibrational<br />
corrections to the rotational constants are used to derive<br />
equilibrium geometrical parameters from the available experimental<br />
rotational constants. The accuracy achieved is judged<br />
by a comparison of the results obtained with the two procedures.<br />
II. Computational Methods<br />
Theoretical determinations of the equilibrium geometry of<br />
CCH have been carried out using various coupled-cluster (CC)<br />
approaches and, to investigate possible multireference effects,<br />
the multireference configuration interaction (MR-CI) and multireference-averaged<br />
quadratic coupled-cluster (MR-AQCC)<br />
methods.<br />
Using the CC ansatz, calculations have been performed at<br />
two levels beyond the coupled-cluster singles and doubles<br />
(CCSD) 21 approximation, namely, at the CCSD(T) level which<br />
includes connected triple excitations perturbatively on top of a<br />
CCSD calculation 22,31 and at the CCSDT level 32-34 which<br />
includes a full treatment of triple excitations. Both unrestricted<br />
Hartree-Fock (UHF) and restricted open-shell Hartree-Fock<br />
(ROHF) reference functions have been used in the CC calculations.<br />
The MR-AQCC method can be considered an approximately<br />
“extensive” version of the MR-CISD (multireference configuration<br />
interaction with single and double excitations) method.<br />
MR-AQCC and MR-CISD calculations have been carried out<br />
with different reference (active) spaces. The n e factor in the<br />
MR-AQCC calculations was chosen to be 9, that is, the core<br />
electrons are not considered in the size-extensivity correction<br />
(for details, see ref 30).<br />
The hierarchy of correlation-consistent basis sets cc-pVXZ 23<br />
and cc-pCVXZ 24 has been used with X ) D,T,Q, and 5.<br />
Since the size of CCH renders FCI calculations with small<br />
basis sets possible, FCI calculations (with a restricted openshell<br />
HF reference) have been carried out for the geometry of<br />
CCH employing the cc-pVDZ basis sets. These benchmark<br />
results are used to calibrate the corresponding CC and MR-<br />
AQCC results.<br />
Geometry optimizations have been carried out with analytically<br />
evaluated gradients in the case of the CCSD(T) 31,35-37 and<br />
MR-AQCC calculations, 38,39 while in all other cases the<br />
equilibrium geometry has been determined using purely numerical<br />
methods.<br />
The vibration-rotation interaction constants which are needed<br />
to subtract the vibrational contribution from the experimental<br />
rotational constants have been obtained at the UHF-CCSD(T)<br />
and ROHF-CCSD(T) levels using cc-pVTZ, cc-pCVTZ, ccpVQZ,<br />
and cc-pCVQZ basis sets 23,24 at the geometry optimized<br />
at the same level. The required quantities (for the relevant<br />
computational expressions, see, for example, ref 16) have been<br />
determined using analytic derivative techniques, that is, the<br />
harmonic force field was determined using either analytic<br />
gradients (ROHF-CCSD(T)) 31 or analytic second derivatives<br />
(UHF-CCSD(T)), 40,41 and the cubic force field has been<br />
subsequently determined via numerical differentiation as described<br />
in refs 19 and 42. In addition, to check the reliability<br />
of the obtained force fields, UHF-CCSDT calculations of the<br />
vibration-rotation interaction constants (within the frozen-core<br />
approximation) have been carried out employing our recently<br />
implemented general CC analytic second derivatives. 43<br />
CC calculations have been performed with the Austin-Mainz<br />
version of the ACES II program system. 44 The COLUMBUS<br />
suite of programs 39,45 was used for the MR-AQCC and the<br />
LUCIA code 46 for the FCI calculations. The CCSDT force field<br />
calculations have been carried using the generalized CI/CC code<br />
developed by one of us 47-49 which has been interfaced to the<br />
ACES II program.<br />
III. Results and Discussions<br />
III.A. Choice of Reference Space in the Multireference<br />
Treatments. The 2 Σ + ground state of CCH has a dominant<br />
configuration of (1σ) 2 (2σ) 2 (3σ) 2 (4σ) 2 (1π) 4 5σ. An appropriate<br />
reference space for the description of this electronic state within<br />
a MR-AQCC treatment has to be selected in a careful manner.<br />
In the present work, four different reference spaces have been<br />
tested with respect to their performance for the equilibrium<br />
geometry of CCH. In particular, the convergence of the<br />
calculated geometrical parameters with increase of the reference<br />
space is investigated.<br />
The smallest reference space is of complete active space<br />
(CAS) type and denoted by “5 × 5”, indicating that five<br />
electrons are distributed within five orbitals, namely the openshell<br />
5σ, the pairs of the π and π* orbitals (1π and 2π). The<br />
next CAS reference space, denoted by “5 × 6”, considers in<br />
addition the virtual 6σ orbital, while the largest CAS space (“5<br />
× 8”) includes three virtual orbitals (6σ, 7σ, and 8σ). Finally,<br />
to investigate the effect of including further “active” electrons,<br />
the “5 × 6” space has been augmented by single and double<br />
excitations involving the 3σ and/or 4σ orbital (in the following<br />
denoted by “5 × 6 + 2d”). Note that in all considered cases,<br />
the orbitals have been taken from MCSCF calculations using<br />
the same space. All single and double excitations out of the<br />
reference configurations have been included in the correlation<br />
treatment within the MR-CISD and MR-AQCC calculations.<br />
As the focus of these initial calculations is just the convergence<br />
of the results with respect to the chosen reference space, the<br />
calculations have been performed at the cc-pVDZ and cc-pVTZ<br />
basis-set levels, respectively.<br />
TABLE 1: Comparison of Geometrical Parameters (in Å)<br />
for the 2 Σ + State of CCH with Respect to the Chosen<br />
Reference Space in the MR-CISD and MR-AQCC<br />
Treatments a 5 × 5 5 × 6 5 × 8 5 × 6 + 2d<br />
r CC<br />
MR-AQCC/cc-pVDZ (fc) 1.2369 1.2376 1.2379 1.2371<br />
MR-CISD/cc-pVTZ (ae) 1.2093 1.2102 1.2102 1.2123<br />
MR-AQCC/cc-pVTZ (ae) 1.2121 1.2129 1.2131 1.2126<br />
r CH<br />
MR-AQCC/cc-pVDZ (fc) 1.0794 1.0797 1.0807 1.0799<br />
MR-CISD/cc-pVTZ (ae) 1.0546 1.0548 1.0552 1.0558<br />
MR-AQCC/cc-pVTZ (ae) 1.0573 1.0575 1.0580 1.0580<br />
a<br />
fc ) frozen-core calculations, ae ) all-electron calculations.
3032 J. Phys. Chem. A, Vol. 108, No. 15, 2004 Szalay et al.<br />
TABLE 2: Comparison of Geometrical Parameters (in Å) for the 2 Σ + State of CCH as Obtained at the CCSD(T), CCSDT, and<br />
MR-AQCC Levels Using Different Basis Sets a<br />
UHF-<br />
CCSD(T)<br />
ROHF-<br />
CCSD(T)<br />
r CC<br />
UHF-<br />
CCSDT<br />
ROHF-<br />
CCSDT<br />
MR-<br />
AQCC<br />
UHF-<br />
CCSD(T)<br />
ROHF-<br />
CCSD(T)<br />
r CH<br />
UHF-<br />
CCSDT<br />
ROHF-<br />
CCSDT<br />
MR-<br />
AQCC<br />
cc-pVDZ (fc) 1.2318 1.2353 1.2352 1.2354 1.2376 1.0797 1.0801 1.0801 1.0802 1.0797<br />
cc-pVTZ (fc) 1.2120 1.2153 1.2150 1.2151 1.2173 1.0643 1.0646 1.0645 1.0645 1.0638<br />
cc-pVQZ (fc) 1.2081 1.2113 1.2110 1.2110 1.2133 1.0642 1.0645 1.0644 1.0644 1.0635<br />
cc-pV5Z (fc) 1.2072 1.2104 1.2098 1.2123 1.0639 1.0642 1.0642 1.0632<br />
cc-pCVTZ (ae) 1.2087 1.2119 1.2132 1.0642 1.0645 1.0627<br />
cc-pCVQZ (ae) 1.2052 1.2083 1.2096 1.0630 1.0632 1.0613<br />
cc-pCV5Z (ae) 1.2043 1.2074 1.0626 1.0629<br />
a<br />
fc ) frozen-core calculations, ae ) all-electron calculations. b 5 × 6 reference space.<br />
The corresponding results are compiled in Table 1. The most<br />
significant observation is that there is a faster convergence of<br />
the bond distance with increase of the reference space in the<br />
MR-AQCC than in the MR-CISD calculations, as the MR-<br />
AQCC results seem to be much less sensitive to the choice of<br />
reference space. While the optimized bond distances obtained<br />
with the two methods are very close when the largest reference<br />
space (5 × 6 + 2d) is used, there are noticeable differences for<br />
the smaller reference spaces. For these, the MR-AQCC results<br />
are much closer to the “5 × 6 + 2d” values than the<br />
corresponding MR-CISD results. In particular, the inclusion of<br />
additional electrons in the reference space seems to be less<br />
important when using the MR-AQCC ansatz. The results in<br />
Table 1 thus indicate that the use of a “5 × 6” active space<br />
seems to be a safe and economical choice for large-scale MR-<br />
AQCC calculations on the 2 Σ + state of CCH. The remaining<br />
error due to higher excitations is estimated to be about 0.001-<br />
0.002 Å.<br />
III.B. Comparison of MR-AQCC and CC Results. In Table<br />
2 the CC and CH bond lengths obtained at CCSD(T), CCSDT,<br />
and MR-AQCC levels using different basis sets are compared.<br />
Focusing first on the coupled-cluster results, it is observed<br />
that, independent of the chosen basis set, the CC distances<br />
obtained at the UHF-CCSD(T) level are about 0.003 Å shorter<br />
than the corresponding CCSDT values, while the corresponding<br />
ROHF-CCSD(T) bond lengths are essentially identical to both<br />
the UHF- and ROHF-CCSDT values. This unexpected difference<br />
between the UHF and ROHF results is investigated in a<br />
forthcoming article 28 where the failure of UHF-CCSD(T) is<br />
traced back to a rapid change of the underlying UHF wave<br />
function at certain bond distances. It will be shown in ref 28<br />
that this breakdown of the UHF-CCSD(T) approach occurs for<br />
the ethynyl radical at distances close to the equilibrium<br />
geometry, and thus, the UHF-CCSD(T) results must be considered<br />
unreliable. Interestingly, the full CCSDT approach seems<br />
to be able to recover from these deficiencies of the underlying<br />
UHF reference functions and provides results which are essentially<br />
independent of the chosen reference functions.<br />
For the CC distances the differences between ROHF-CCSD-<br />
(T) and CCSDT are essentially negligible. When considering<br />
in addition the MR-AQCC calculations (obtained with the “5<br />
× 6” reference), we note that the MR-AQCC value for the CC<br />
distance is even longer than the corresponding CCSDT value<br />
(by about 0.002 Å). It is essentially impossible at this point to<br />
decide whether the CCSDT or the MR-AQCC results should<br />
be considered more accurate. 50 Good agreement of the ROHF-<br />
CCSD(T) and CCSDT also suggests that ROHF-CCSD(T) can<br />
be safely used with the larger basis sets where CCSDT is not<br />
practical.<br />
For the CH distance, all considered approaches yield essentially<br />
the same result.<br />
TABLE 3: Comparison of Geometrical Parameters (in Å)<br />
for the 2 Σ + State of CCH at the CCSD(T), CCSDT, and<br />
MR-AQCC Levels with Corresponding FCI Calculations a<br />
III.C. Comparison with Full Configuration Interaction<br />
Results. To judge the accuracy of MR-AQCC and CCSDT,<br />
benchmark calculations at the FCI level using the cc-pVDZ basis<br />
have been performed. The corresponding results are summarized<br />
in Table 3. As these results show, the CH bond distances<br />
obtained by any approach are in excellent agreement (differences<br />
are less than 0.0005 Å), while for the CC bond distance the<br />
FCI result falls between the corresponding CCSDT and MR-<br />
AQCC values. This means that in comparison with FCI the<br />
CCSDT value is about 0.001 Å too short, while MR-AQCC is<br />
about 0.001 Å too long. Both methods thus exhibit errors which<br />
are acceptable for our purpose.<br />
III.D. Basis-Set Convergence. After discussing the issue of<br />
electron correlation, we will now turn our interest to the basisset<br />
effects. Results obtained with both the cc-pVXZ and ccpCVXZ<br />
sequence of basis sets have been given in Table 2. In<br />
the cc-pVXZ calculations, when employing the frozen-core<br />
approximation, smooth convergence of the geometrical parameters<br />
is observed. When going from cc-pVDZ to cc-pV5Z, both<br />
bond distances are reduced, the CC distance by about 0.025 Å<br />
and the CH distance by about 0.016 Å. The differences between<br />
the cc-pVQZ and cc-pV5Z results are with 0.001 and 0.0003<br />
Å already rather small so that the cc-pV5Z results can be<br />
considered as nearly converged. However, the cc-pVXZ calculations<br />
do not incorporate core-correlation effects. To consider<br />
these properly, all-electron calculations using the core-valence<br />
correlating cc-pCVXZ sets have been carried out. As for the<br />
cc-pVXZ sequence, monotonic convergence is observed for the<br />
geometrical parameters within this basis-set sequence and the<br />
differences between quadruple- and pentuple-zeta results are<br />
again small. From the results, it is further seen that core<br />
correlation together with the additional consideration of core<br />
polarization functions reduces the CC bond distance by about<br />
0.003-0.004 Å, while the CH distance, as one might expect, is<br />
less affected and shortened by only 0.001-0.002 Å.<br />
Unfortunately, because of program limitations, it was not<br />
possible to perform MR-AQCC calculations using the largest<br />
cc-pCV5Z basis. However, the rather systematic difference<br />
between the CCSD(T) and MR-AQCC results enables a<br />
r CC<br />
r CH<br />
ROHF-CCSD(T) 1.2353 1.0801<br />
UHF-CCSD(T) 1.2318 1.0797<br />
UHF-CCSDT 1.2352 1.0801<br />
ROHF-CCSDT 1.2354 1.0802<br />
MR-AQCC b 1.2376 1.0797<br />
FCI 1.2367 1.0802<br />
a<br />
All calculations with cc-pVDZ and core orbitals frozen in the<br />
electron-correlation treatment. b 5 × 6 reference space.
Equilibrium Geometry of Ethynyl Radical J. Phys. Chem. A, Vol. 108, No. 15, 2004 3033<br />
TABLE 4: Calculated Vibrational Corrections ∆B ) B e -<br />
B 0 (in MHz) to the Rotational Constants of Different<br />
Isotopomers of CCH from UHF- and ROHF-based CC<br />
Calculations<br />
CCSD(T)<br />
cc-pVTZ<br />
CCSD(T)<br />
cc-pVQZ<br />
CCSD(T)<br />
cc-pCVTZ<br />
CCSD(T)<br />
cc-pCVQZ<br />
CCSDT(fc)<br />
cc-pVTZ a<br />
UHF Reference Function<br />
CCH 368.27 334.70 379.67 355.74 583.64<br />
13<br />
CCH 355.08 322.65 366.09 342.76 564.26<br />
C 13 CH 366.25 333.16 377.31 353.24 580.54<br />
CCD 168.07 151.12 175.33 167.52 258.47<br />
ROHF Reference Function<br />
CCH 531.16 479.58 568.24 495.37<br />
13<br />
CCH 513.21 463.24 549.12 478.15<br />
C 13 CH 528.13 476.98 564.57 491.72<br />
CCD 237.85 214.59 257.20 230.11<br />
a<br />
fc ) frozen-core calculation.<br />
prediction of the corresponding value based on MR-AQCC/ccpCVQZ<br />
and ROHF-CCSD(T)/cc-pCV5Z calculations. As the<br />
use of the pentuple- instead of the quadruple-ζ set decreases<br />
CC and CH bond distances by about 0.0009 and 0.0004 Å,<br />
respectively, the estimated MR-AQCC/cc-pCV5Z values are<br />
about 1.2087 and 1.0609 Å.<br />
The influence of diffuse functions has been investigated at<br />
the UHF-CCSD(T) level. It was found that the changes amounts<br />
to less than 0.0003 Å when going from cc-pCVQZ to aug-ccpCVQZ.<br />
III.E. Best Theoretical Estimates. On the basis of the<br />
previous sections, we are now able to give a best theoretical<br />
estimate for the equilibrium geometry of CCH. There are two<br />
(almost) independent procedures: one uses the MR-AQCC data<br />
while the other uses the CC data, respectively. At the MR-<br />
AQCC level, the best directly calculated geometry has been<br />
obtained with cc-pCVQZ basis set (r e (CC) ) 1.2096 Å and r e -<br />
(CH) ) 1.0613 Å). This geometry should be “improved” by<br />
the FCI correction obtained at the cc-pVDZ level, that is, by<br />
-0.0009 and 0.0005 Å as well as corrected for the remaining<br />
basis-set effect, that is, by -0.0009 Å and -0.0004 Å, for CC<br />
and CH, respectively (see above). Assuming additivity of these<br />
corrections, this leads to final values of 1.2078 and 1.0614 Å<br />
for the CC and CH bond distance, respectively. A similar<br />
extrapolation procedure starting from the ROHF-CCSD(T)/ccpCV5Z<br />
results (1.2074 and 1.0628 Å) and employing corrections<br />
due to full CCSDT (-0.0003 Å and -0.0001 Å) and FCI<br />
(0.0013 and 0.0000 Å) leads to a final estimate of 1.2084 and<br />
1.0627 Å for the two distances. The discrepancy of 0.001 to<br />
0.002 Å between the values obtained with these two extrapolation<br />
schemes is an indication for the accuracy of our theoretical<br />
results.<br />
It is noteworthy to mention that our best theoretical estimates<br />
are in excellent agreement with recent recommendations for the<br />
equilibrium geometry of CCH by Peterson and Dunning 7 based<br />
on CCSD(T) calculations. The corresponding values are 1.2076<br />
and 1.0619 Å.<br />
III.F. Analysis of Experimental Rotational Constants.<br />
After establishing a theoretical estimate for the equilibrium<br />
geometry of CCH, we now focus on the analysis of the<br />
experimental rotation constants using computed vibrational<br />
corrections. These corrections to B, that is, ∆B ) B e - B 0 , have<br />
been obtained at the UHF- and ROHF-CCSD(T) level using<br />
the cc-pVXZ and cc-pCVXZ sets with X ) T and Q. The<br />
calculated ∆B values are compiled in Table 4 and amount to<br />
about 150-590 MHz, that is, about 0.5 to 1.5% of the values<br />
of the corresponding rotational constants for the considered<br />
isotopomer and thus are non-negligible. However, large discrepancies<br />
are seen between the vibrational corrections computed<br />
with UHF and ROHF reference functions. We thus<br />
decided to check the reliability of the CCSD(T) force fields<br />
via corresponding CCSDT calculations using the cc-pVTZ basis<br />
set. As is seen from Table 4, the CCSDT calculations suggest<br />
that the UHF-CCSD(T) force fields (as the corresponding<br />
geometries) should be considered unreliable and that only the<br />
ROHF-CCSD(T) approach yields vibrational corrections in good<br />
agreement with the CCSDT approach. On the basis of these<br />
calculations, we refrain from discussing the UHF-CCSD(T)<br />
results any further and solely discuss the corresponding ROHF-<br />
CCSD(T) results in the following.<br />
For the least-squares fit of the geometrical parameters to the<br />
rotational constants, the most recent B 0 values from refs 8, 12,<br />
and 14, as given in the Introduction, have been used together<br />
with the vibrational corrections compiled in Table 4. The<br />
resulting empirical equilibrium geometries are summarized in<br />
Table 5. According to the values reported there, an “empirical”<br />
equilibrium geometry of r CC ) 1.207 Å and r CH ) 1.069 Å can<br />
be given with 0.002 Å as a conservative error estimate 51 based<br />
on the convergence of the results.<br />
A comparison of the empirical equilibrium geometry with<br />
our best theoretical estimates shows that the remaining discrepancies<br />
are in the range of 0.001 to 0.002 Å for the CC and<br />
0.006 to 0.008 Å for the CH distances. It appears that the<br />
empirical value for the CC distance is slightly shorter and the<br />
CH distance is longer than the corresponding theoretical values.<br />
While these discrepancies can possibly be traced back to<br />
remaining deficiencies in the theoretical treatment, another, and<br />
maybe more likely, possibility is that these differences point to<br />
so far unexplored limitations in the perturbational treatment of<br />
the vibrational corrections (note that there is a low-lying Π state<br />
which interacts with the electronic ground state through the<br />
bending motion).<br />
Nevertheless, the current study leads to a satisfactory agreement<br />
between theory and experiment and thus provides a<br />
consistent picture with respect to the equilibrium geometry.<br />
Concerning previous efforts to determine the geometry of<br />
CCH, we note that the r s (as well as the r 0 ) structures are rather<br />
TABLE 5: Comparison of Geometrical Parameters (in Å) for the 2 Σ + State of CCH Obtained from Theory and Experiment<br />
structure r CC r CH method ref<br />
r e 1.2064 1.0678 from exptl B 0 with ∆B(ROHF-CCSD(T)/cc-pVTZ) this work<br />
r e 1.2076 1.0657 from exptl B 0 with ∆B(ROHF-CCSD(T)/cc-pVQZ) this work<br />
r e 1.2056 1.0689 from exptl B 0 with ∆B(ROHF-CCSD(T)/cc-pCVTZ) this work<br />
r e 1.2075 1.0651 from exptl B 0 with ∆B(ROHF-CCSD(T)/cc-pCVQZ) this work<br />
r e 1.2050 1.0703 from exptl B 0 with ∆B(UHF-CCSDT(fc)/cc-pVTZ) this work<br />
r e 1.2078 1.0614 est from MR-AQCC this work<br />
r e 1.2084 1.0627 est from CCSDT this work<br />
r 0 1.2193 1.0457 from exptl B 0 this work<br />
r s 1.21652 1.04653 from exptl B 0 1<br />
r e 1.2076 1.0619 est from CCSD(T) 7
3034 J. Phys. Chem. A, Vol. 108, No. 15, 2004 Szalay et al.<br />
different (compare Table 5). Both of them deviate by about<br />
0.005 Å in the CC and by about 0.015 Å in the CH distance<br />
from the equilibrium geometries obtained in this work. Apparently,<br />
unlike often claimed, the substitution approach leading<br />
to the r s structure is not able to eliminate vibrational effects in<br />
the case of CCH, and thus, the r s and r 0 structure turn out to be<br />
very similar. Our observation supports the speculation in ref 1<br />
that the significantly too short CH distance is due to insufficient<br />
account of vibrational effects, and in particular of the lowfrequency<br />
bending motion, a well-known artifact of the substitution<br />
approach to molecular structures.<br />
IV. Conclusions<br />
Equilibrium geometrical parameters for the 2 Σ + state of the<br />
ethynyl radical have been obtained using two approaches. The<br />
first purely theoretical procedure based on extensive CC, MR-<br />
AQCC, and FCI calculations yields values of 1.208 Å for the<br />
CC distance and 1.061-1.063 Å for the CH distance, while<br />
the second approach based on the analysis of experimental<br />
rotational constants using computed vibrational corrections<br />
provides values of 1.207 and 1.069 Å. The observed differences<br />
between the two approaches of 0.001-0.002 Å for CC and<br />
0.006-0.008 Å for CH are somewhat larger than expected.<br />
Among possible causes for this discrepancy, we consider<br />
limitations in the perturbational treatment of the vibrational<br />
corrections to the rotational constants. The r s and r 0 geometries<br />
for CCH are, because of a missing or insufficient treatment of<br />
these corrections, far away from the true equilibrium geometry.<br />
Acknowledgment. The authors acknowledge fruitful discussions<br />
with Professor J. F. Stanton (University of Texas,<br />
Austin). This work has been supported by the Hungarian<br />
Scientific Research Foundation (OTKA, Grants T032980 and<br />
M042110), the Deutsche Forschungsgemeinschaft, the Fonds<br />
der Chemischen Industrie, and the Danish Centre for Supercomputing<br />
(DCSC). This research is part of an effort by a task<br />
group of the International Union of Pure and Applied Chemistry<br />
to determine structures, vibrational frequencies, and thermodynamic<br />
functions of free radicals of importance in atmospheric<br />
chemistry.<br />
References and Notes<br />
(1) Bogey, M.; Demuynck, C.; Destombes, J. L. Mol. Phys. 1989, 66,<br />
955.<br />
(2) Hillier, I. H.; Kendrick, J.; Guest, M. F. Mol. Phys. 1975, 30, 1133.<br />
(3) Shih, S.; Peyerimhoff, S. D.; Buenker, R. J. J. Mol. Spectrosc. 1977,<br />
64, 167.<br />
(4) Shih, S.; Peyerimhoff, S. D.; Buenker, R. J. J. Mol. Spectrosc. 1979,<br />
74, 124.<br />
(5) Fogarasi, G.; Boggs, J. E.; Pulay, P. Mol. Phys. 1983, 50, 139.<br />
(6) Kraemer, W. P.; Roos, B. O.; Bunker, P. R.; Jensen, P. J. Mol.<br />
Spectrosc. 1986, 120, 236.<br />
(7) Peterson, K. A.; Dunning, T. H. J. Chem. Phys. 1997, 106, 4119.<br />
(8) Müller, H.; Klaus, T.; Winnewisser, G. Astron. Astrophys. 2000,<br />
357, L65.<br />
(9) Sastry, K. V. L. N.; Helminger, P.; Charo, A.; Herbst, E.; Delucia,<br />
F. C. Astrophys. J. 1981, 251, L119.<br />
(10) Gottlieb, C. A.; Gottlieb, E. W.; Thaddeus, P. Astrophys. J. 1983,<br />
264, 740.<br />
(11) Saykally, R. J,; Veseth, L.; Evenson, K. M. J. Chem. Phys. 1984,<br />
80, 2247.<br />
(12) McCarthy, M. C.; Gottlieb, C. A.; Thaddeus, P. J. Mol. Spectrosc.<br />
1995, 173, 303.<br />
(13) Note that there is a trivial misprint in ref 1 for the former value.<br />
Bogey, M. Université Lille, France. Private communication, 1999.<br />
(14) Bogey, M.; Demuynck, C.; Destombes, J. L. Astron. Astrophys.<br />
1985, 144, L15.<br />
(15) Costain, C. C. J. Chem. Phys. 1958, 82, 5053.<br />
(16) See, for example: Mills, I. M. In Molecular Spectroscopy: Modern<br />
Research; Rao, K. N., Matthews, C. W., Eds.; Academic: New York, 1972;<br />
p. 115<br />
(17) Pulay, P.; Meyer, W.; Boggs, J. E. J. Chem. Phys. 1978, 68, 5077.<br />
(18) McCarthy, M. C.; Gottlieb, C. A.; Thaddeus, P.; Horn, M.;<br />
Botschwina, P. J. Chem. Phys. 1995, 103, 7820.<br />
(19) Stanton, J. F.; Lopreore, C. L.; Gauss, J. J. Chem. Phys. 1998, 108,<br />
7190.<br />
(20) Pawlowski, F.; Jørgensen, P.; Olsen, J.; Hegelund, F.; Helgaker,<br />
T.; Gauss, J.; Bak, K. L.; Stanton, J. F. J. Chem. Phys. 2002, 116, 6482.<br />
(21) Purvis, G. D.; Bartlett, R. J. J. Chem. Phys. 1982, 76, 1910.<br />
(22) Raghavachari, K.; Trucks, G. W.; Head-Gordon, M.; Pople, J. A.<br />
Chem. Phys. Lett. 1989, 157, 479.<br />
(23) Dunning, T. H. J. Chem. Phys. 1989, 90, 1007.<br />
(24) Woon, D. E.; Dunning, T. H. J. Chem. Phys. 1993, 99, 1914.<br />
(25) Helgaker, T.; Gauss, J.; Jørgensen, P.; Olsen, J. J. Chem. Phys.<br />
1997, 106, 6430.<br />
(26) Bak, K. L.; Gauss, J.; Jørgensen, P.; Olsen, J.; Helgaker, T.; Stanton,<br />
J. F. J. Chem. Phys. 2001, 114, 6548.<br />
(27) See, for example: Byrd, E. F. C.; Sherrill, C. D.; Head-Gordon,<br />
M. J. Phys. Chem. A 2001, 105, 9736.<br />
(28) Szalay, P. G.; Vazquez, J.; Stanton, J. F. Material to be submitted<br />
for publication.<br />
(29) Szalay, P. G.; Bartlett, R. J. Chem. Phys. Lett. 1993, 214, 481.<br />
(30) Szalay, P. G.; Bartlett, R. J. J. Chem. Phys. 1995, 103, 3600.<br />
(31) Watts, J. D.; Gauss, J.; Bartlett, R. J. J. Chem. Phys. 1993, 98,<br />
8718.<br />
(32) Noga, J.; Bartlett, R. J. J. Chem. Phys. 1987, 86, 7041.<br />
(33) Scuseria, G. E.; Schaefer, H. F. Chem. Phys. Lett. 1988, 152, 382.<br />
(34) Watts, J. D.; Bartlett, R. J. J. Chem. Phys 1990, 93, 6104.<br />
(35) Gauss, J.; Stanton, J. F.; Bartlett, R. J. J. Chem. Phys. 1991, 95,<br />
2623.<br />
(36) Watts, J. D.; Gauss, J.; Bartlett, R. J. Chem. Phys. Lett. 1992, 200,<br />
1.<br />
(37) Gauss, J.; Lauderdale, W. J.; Stanton, J. F.; Watts, J. D.; Bartlett,<br />
R. J. Chem. Phys. Lett. 1991, 182, 207.<br />
(38) Shepard, R.; Lischka, H.; Szalay, P. G.; Kovar, T.; Ernzerhof, M.<br />
J. Chem. Phys. 1992, 96, 2085.<br />
(39) Lischka, H.; Shepard, R.; Pitzer, R. M.; Shavitt, I.; Dallos, M.;<br />
Müller, T.; Szalay, P. G.; Seth, M.; Kedziora, G., Yabushitah, S.; Zhangi,<br />
Z. Phys. Chem. Chem. Phys. 2001, 3, 664.<br />
(40) Gauss, J.; Stanton, J. F. Chem. Phys. Lett. 1997, 276, 70.<br />
(41) Szalay, P. G.; Gauss, J.; Stanton, J. F. Theor. Chem. Acc. 1998,<br />
100, 5.<br />
(42) Stanton, J. F.; Gauss, J. Int. ReV. Phys. Chem. 2000, 19, 61.<br />
(43) Kállay, M.; Gauss, J. J. Chem. Phys., in press.<br />
(44) Stanton, J. F.; Gauss, J.; Watts, J. D.; Lauderdale, W. J.; Bartlett,<br />
R. J. Int. J. Quantum Chem. Symp. 1992, 26, 879.<br />
(45) Lischka, H.; Shepard, R.; Shavitt, I.; Brown, F. B.; Pitzer, R. M.;<br />
Ahlrichs, R.; Böhm, H.-J.; Chang, A. H. H.; Comeau, D. C.; Gdanitz, R.;<br />
Dachsel, H.; Dallos, M.; Erhard, C.; Ernzerhof, M.; Gawboy, G.; Höchtl,<br />
P.; Irle, S.; Kedziora, G.; Kovar, T.; Müller, T.; Parasuk, V.; Pepper, M.;<br />
Scharf, P.; Schiffer, H.; Schindler, M.; Schüler, M.; Stahlberg, E.; Szalay,<br />
P. G.; Zhao, J.-G. COLUMBUS, An ab Initio Electronic Structure Program,<br />
release 5.8, 2001.<br />
(46) Olsen, J. LUCIA, a Full CI, Restricted ActiVe Space Program;<br />
Aarhus University: Denmark, with contributions from H. Larsen.<br />
(47) Kállay, M.; Surján,P.R.J. Chem. Phys. 2000, 113, 1359.<br />
(48) Kállay, M.; Surján,P.R.J. Chem. Phys. 2001 115, 2945.<br />
(49) Kállay, M.; Gauss, J.; Szalay P. G. J. Chem. Phys. 2003, 119, 2991.<br />
(50) It should be mentioned here that our MR-AQCC results are in<br />
excellent agreement with previous MR-CI calculations by Peterson and<br />
Dunning. 7 Their best values at the MR-CI level (augmented by a Davidson<br />
correction) using a full valence active space using a pV5Z basis for carbon<br />
and a pVQZ basis for hydrogen of 1.2116 and 1.0643 Å are of comparable<br />
quality as our MR-AQCC/pV5Z(fc) values of 1.2123 and 1.0632 Å.<br />
(51) Note that the residuals in the least-squares fit were in all cases<br />
smaller than 1.5 MHz.