Variational Principles in Conformation Dynamics - FU Berlin, FB MI

Freie Universität BerlinInstitut für MathematikArnimallee 614195 BerlinMaster’s thesis:A variational approach forconformation dynamicsFeliks NüskeJuly 6, 2012Dr. Frank Noé, FU Berlin

Ich versichere hiermit an Eides statt, dass ich die vorliegende Masterarbeit selbstständig undohne unzulässige fremde Hilfe erbracht habe. Ich habe keine anderen als die angegebenenQuellen und Hilfsmittel benutzt sowie wörtliche und sinngemäße Zitate kenntlich gemacht.Die Arbeit hat in gleicher oder ähnlicher Form noch keiner Prüfungsbehörde vorgelegen.iii

Table of contentsI. Introduction 3II. The variational principle 5II.1. The transfer operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5II.2. Markov state models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11II.3. Variational principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13II.4. The Ritz method and the Roothan-Hall method . . . . . . . . . . . . . . . 15III.Diffusion processes and application in one dimension 21III.1. Brownian motion and stochastic differential equations . . . . . . . . . . . . 21III.2. Diffusion process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23III.3. Numerical considerations ............................ 25III.4. Diffusion in a quadratic potential . . . . . . . . . . . . . . . . . . . . . . . 26III.5. Two-well potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29IV.Application to molecules 31IV.1. The example system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31IV.2. Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35IV.3. Application of the Roothan-Hall method ................... 36V. Summary and outlook 43V.1. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43V.1.1. Application to larger systems . . . . . . . . . . . . . . . . . . . . . 43V.1.2. Choice of basis functions . . . . . . . . . . . . . . . . . . . . . . . . 44V.1.3. Numerical stability . . . . . . . . . . . . . . . . . . . . . . . . . . . 44V.1.4. Importance of sampling . . . . . . . . . . . . . . . . . . . . . . . . 44A. Appendix 45A.1. Diffusion in a quadratic potential . . . . . . . . . . . . . . . . . . . . . . . 45Bibliography 53

I. IntroductionStochastic dynamical systems play an important role in many branches of mathematicsand applications. Loosely speaking, such a system is a superposition of a deterministicdynamical system with random fluctuations, often called “noise“. A typical example is thesimulation of molecules (molecular dynamics, MD). In principle, one could model a moleculeas a classical physical system by assigning to each atom its three spatial and three momentumcoordinates and propagate these coordinates using some dynamical model. Since the moleculecan typically exchange heat with its surroundings, energy is not conserved, which means thatclassical Hamiltonian dynamics is not a suitable dynamical model. Instead, the exchange ofheat is usually modelled by some random influence. As an example, one can use Smoluchowskidynamics: If we denote the number of atoms by N and the 3N-dimensional position vectorby x, thetimeevolutionisgivenbythestochasticdifferentialequationẋ(t) = 1mγ ∇V (x(t)) + √ 2DdW t .(I.1)Here, V is the potential energy function of the system, D is the diffusion constant, γ denotesfriction and W t is 3N-dimensional Brownian motion. We will come back to this model inmore detail at a later point, but note that this equation defines a stochastic process insteadof a deterministic dynamical system. Instead of asking for a deterministic position in the3N-dimensional state space, we ask for the probability to find the system in a certain regionof the state space.Molecular systems very often display so called conformations, ormetastablestates. Thismeans that while the system still oscillates and fluctuates, the overall geometry remains thesame for long times, [Schütte, Huisinga, Deuflhard, Fischer, 1999]. Only occasionally, transitionsfrom one conformation to another can be observed. An example of this phenomenonis shown in Figure I.1. A quantity of interest would be the average waiting time until suchatransitionoccurs,becauseitwouldhelptounderstandtheoverallbehaviourofsuchamolecule.It has also been shown by [Schütte, Huisinga, Deuflhard, Fischer, 1999] thatthetimeevolutionof probability densities described above can be computed by the action of a linearintegral operator, called the propagator. Moreover, the spectral properties of this operator

4 I. IntroductionFigure I.1.: An internal coordinate of a small molecule transitioning between two differentconformations, [Weber, 2010, ch.1.1].can be related to average waiting times, the quantities of interest. Therefore, estimatingeigenvalues and eigenfunctions of the propagator is an interesting task. Markov state models,[Prinz et al, 2011], have been frequently used to this end. The main idea is to discretizethe state space into a finite number of sets, and then approximate the true propagator byafinite-dimensionalmatrixoperator. Theeigenvaluesofthismatrixcanbeusedasanapproximationof the real eigenvalues. However, the quality of such an approximation largelydepends on the choice of discretization. Finding a good discretization requires the use ofclustering techniques to group the data. In the following, we present a different approach toapproximating the dominant eigenvalues and eigenfunctions, not based on partition of unitymembership functions, but on the use of smooth functions in combination with a variationalprinciple. The main idea is to use the shape of the molecular energy function to choose thebasis functions needed for the approximation. We hope that this can be done without theapplication of a clustering method to sampled data and thus help to avoid many numericalinstabilities. We will develop the theory in chapter II, thenapplythemethodtoonedimensionalexamples of a diffusion process in chapter III, andfinallytackleahigherdimensionalproblem in chapter IV. In the end, we hope that this might lead to a robust and computationallyaffordable method to compute eigenvalues and relevant time scales of stochasticprocesses stemming from real world examples.

II.The variational principleII.1. The transfer operatorWe consider a time-dependent stochastic process X t , t ≥ 0, onausuallyhigh-dimensionaland continuous state space Ω. LettheprocesssatisfytheMarkov property, thatis,forallx 0 ,x 1 ,...,x n and y ∈ Ω, t 0 >t 1 >...>t n ≥ 0 and τ>0, wehaveP (X t0 +τ = y|X t0 = x 0 ,X t1 = x 1 ,...,X tn = x n )=P (X t0 +τ = y|X t0 = x 0 ) .(II.1)In other words, the future behaviour of the process only depends on the current state andnot on the past. A more detailed and more precise formulation of the Markov property canbe found in [Behrends, 2011, ch. 2]. Letusconsiderprocesseswhicharetime-homogeneous,meaning that the conditional probability P(X t0 +τ = y|X t0 = x) in the above equation doesnot depend on t 0 ,butonlyonthetimedifferenceτ and on the positions x, y. Inthiscase,the process allows us to define a function p:p(x, y; τ) :=P(X t+τ = y|X t = x) =P(X τ = y|X 0 = x).(II.2)We shall call this function the transition kernel. Itenablesustounderstandtheevolutionofprobability densities in time. Suppose that, at time t =0,thesystemisdistributedaccordingto a distribution ρ 0 .Theprobabilitythat,aftertimeτ>0, thesystemtransitionedfromapoint x to another point y, isgivenbytheproductoftheprobabilityofbeingatx at timet =0and the conditional probability to go from x to y in time τ. Consequently,thedensityρ τ belonging to time τ, evaluatedaty, isobtainedfromintegratingthisproductoverallpossible states x:ρ τ (y) = dx p(x, y; τ)ρ 0 (x).(II.3)ΩFurthermore, we assume the process to be sufficiently ergodic, whichmeansinessencethatthe system cannot be decomposed into two dynamically independent components, and everystate of the system will be visited infinitely often over an infinite run of the process. In thiscase, there is a unique probability distribution µ :Ω→ R, whichisinvariantintime,i.e.dx p(x, y; τ)µ(x) =µ(y),(II.4)Ω

6 II. The variational principleregardless of τ, [Sarich, Noé, Schütte, 2010, ch.2.2]Thefunctionµ is called the invariantmeasure, theinvariant density or the stationary distribution of the process.Also, we assume the transition kernel to satisfy the detailed balance condition, i.e.µ(x)p(x, y; τ) =µ(y)p(y, x; τ) ∀x, y ∈ Ω. (II.5)Detailed balance is an important assumption that is justified in many applications fromphysics. In a system not satisfying Equation II.5, therewouldbeapathwaywhereonedirectionis preferred compared to the other. This would allow the process to produce work,that is, to convert thermal energy into work. This is impossible in physical systems due tothe second law of thermodynamics, [Prinz et al, 2011, sec. 2.A].We see that Equation II.3 defines the action of a lag time dependent integral operator P(τ)on suitable probability densities ρ. Let us call it the propagator of the system, which hasthe following important property:Lemma II.1 (Chapman-Kolmogorov equation): For any τ 1 ,τ 2 > 0, wehaveP(τ 1 + τ 2 )=P(τ 1 )P(τ 2 ).(II.6)Proof. First, we observe that the Markov property and time-homogeneity imply for x, z ∈Ω,1p(x, z; τ 1 + τ 2 )=P(X τ1 +τ 2= z|X 0 = x) =P(X 0 = x) P(X τ 1 +τ 2= z,X 0 = x) (II.7)1=dy P(X τ1 +τP(X 0 = x)2= z|X τ2 = y)P(X τ2 = y, X 0 = x) (II.8)Ω= dy p(y, z; τ 1 ) P(X τ 2= y, X 0 = x)= dy p(y, z; τ 1 )p(x, y; τ 2 ).ΩP(X 0 = x)Ω(II.9)Using this, we show that for a function ρ,P(τ 1 + τ 2 )ρ(z) = dx p(x, z; τ 1 + τ 2 )ρ(x) ==ΩΩΩdxdy p(y, z; τ 1 )P(τ 2 )ρ(y) =P(τ 1 )P(τ 2 )ρ(z).dy p(y, z; τ 1 )p(x, y; τ 2 )ρ(x)(II.10)(II.11)

II.1. The transfer operator 7Let us additionally assume that P(τ)ρ → ρ for all suitable functions ρ, andletusdefineP(0) := id. This means that Lemma II.1 is also valid for all τ 1 ,τ 2 ≥ 0. Before we go onto study the propagator, let us introduce a slight modification to it. Denoting by L 2 µ(Ω)the space of all functions v : Ω → R which are square-integrable with respect to theweight function µ, i.e. Ω v2 (x)µ(x)dx is finite, with the weighted scalar-product v | w µ=dx v(x)w(x)µ(x), wemakethefollowingdefinition:ΩDefinition II.2 (Transfer operator): The transfer operator T (τ) acting on a functionu ∈ L 2 µ(Ω) is defined by:T (τ)u(y) := 1 dx p(x, y; τ)µ(x)u(x).µ(y) Ω(II.12)The domain of definition is chosen to have this operator act on a Hilbert space. We notethe most important properties of T (τ):Lemma II.3: If the transition kernel p(x, y; τ) is given by a smooth and bounded probabilitydensity, the transfer operator T (τ) is linear, bounded, compact and self-adjoint.Proof. Linearity follows directly from the definition. In order to prove boundedness, let usfirst use detailed balance to find:T (τ)v |T(τ)v µ= dy [T (τ)v(y)] 2 µ(y)(II.13)==ΩΩΩ21dy dx p(x, y; τ)µ(x)v(x)µ(y) (II.14)µ(y)Ω2dy 12µ(y)Ωµ(y)2 dx p(y, x; τ)v(x). (II.15)Before we proceed, we make the following general observation.density on Ω and f :Ω→ R afunction,wehaveIf π is some probability0 ≤ ω 2 [f] π= E f 2 π − (E [f] π )2 , (II.16)where we use ω 2 [ · ] πto denote the variance of f with respect to the distribution π. Hence,dx f(x)π(x)Ω 2=(E [f] π) 2 ≤ E f 2 π = dx f 2 (x)π(x).Ω(II.17)

8 II. The variational principleIn Equation II.15, p(y, x; τ) is, by assumption, a smooth probability density in x. We canconsequently apply the above estimate to the squared integral and obtain T (τ)v |T(τ)v µ≤ dy µ(y) dx p(y, x; τ)v 2 (x)(II.18)ΩΩ= dx v 2 (x) dy µ(y)p(y, x; τ) = dx v 2 (x)µ(x) =v 2 µ. (II.19)ΩΩWe have thus found that T (τ) is a well-defined and bounded operator on L 2 µ(Ω) withoperator norm less or equal to one. By inserting the constant function 1 which satisfies1(x) =1 ∀x ∈ Ω, Equation II.4 immediately yields that T (τ)1 = 1 and consequently,T (τ) =1. As p(x, y; τ) is bounded for all x, y, wealsohavep(x, y; τ) ∈ L 2 µ×µ(Ω × Ω),therefore T (τ) is a Hilbert-Schmidt operator and thus compact, [Werner, 2002, ch. 6,p.272]. Finally, self-adjointness also follows from the detailed balance condition:T (τ)u | v µ= dy 1 dx p(x, y; τ)µ(x)u(x) µ(y)v(y)(II.20)Ω µ(y) Ω = dx dy p(y, x; τ)µ(y)u(x)v(y)(II.21)Ω Ω= dx 1 dy p(y, x; τ)µ(y)v(y) µ(x)u(x) =u |T(τ)vΩ µ(x)µ. (II.22)ΩΩThe transfer operator defines a compact and self-adjoint map on the Hilbert space L 2 µ(Ω).From functional analysis, we know that it has rich spectral properties, see [Werner, 2002,ch. 6, p. 241] : The spectrum is real-valued and, because of T (τ) =1,containedwithinthe interval [−1, 1]. All non-zero spectral values are discrete and are indeed eigenvalueswith corresponding eigenfunctions. The eigenvalues can be ordered into a series λ i withλ i → 0 as i → 0. The Hilbert space L 2 µ(Ω) can be decomposed into the direct sum ofthe eigenfunctions’ linear span and the operator’s kernel. If one extends the linear span byan orthonormal basis of the kernel, we obtain an orthonormal basis ψ i of the full Hilbertspace such that the action of T (τ) completely decomposes into the action on each of thesefunctions. If we write v ∈ L 2 µ(Ω) as v = ∞i=1 v | ψ i µψ i ,wethenfindthatT (τ)v =∞v | ψ i µT (τ)ψ i =i=1∞λ i v | ψ i µψ i ,i=1(II.23)where possibly λ i =0holds from some index i on. As we have already noticed in the courseof proving Lemma II.3, λ 1 =1is an eigenvalue with corresponding eigenfunction ψ 1 = 1. Inmany applications, it turns out that this eigenvalue is simple and dominant, meaning that itsmultiplicity is one and that −1 is not an eigenvalue of T (τ). Additionally, there usually is anumber of further eigenvalues 1 >λ 2 >...>λ m ,whicharestrictlysmallerbut”close”toone, whereas the remaining spectrum is contained in a ball around zero which is essentiallybounded away from these eigenvalues, see again [Sarich, Noé, Schütte, 2010, ch2.2]. Let

II.1. The transfer operator 9us from now on assume this to be the case, and let us call these eigenvalues the dominanteigenvalues and their corresponding functions ψ 2 ,...,ψ m the dominant eigenfunctions.Consider now the map M : L 2 µ(Ω) → L 2 µ(Ω) defined by Mψ(x) :=µ(x)ψ(x) for any−1function ψ ∈ L 2 µ(Ω). Here, L 2 µ(Ω) is the L 2 -space with weight function µ −1 (x) = 1 ,−1 µ(x)where we assumed the stationary density to be non-zero, having in mind that in our latercontexts, it will always be given by the Boltzmann distribution. M is linear and well-defined,indeed, if ψ ∈ L 2 µ(Ω), thenMψ |Mψ µ −1 = dx Ω µ(x)ψ(x)µ(x)ψ(x)µ−1 (x) =ψ 2 µ.Moreover, M is a bijective map, which can rapidly be checked, and because ofMψ | φ µ −1 = dx µ(x)ψ(x)φ(x)µ −1 (x) = ψ |M −1 φ , (II.24)µΩM is also unitary. Looking back at Equation II.3 and Definition II.2, we find that onL 2 µ(Ω), P(τ) =MT (τ)M −1 . The propagator differs from the transfer operator by no−1more than a unitary transformation, and it thus inherits the self-adjointness and compactnessproperties we have just derived for T (τ). Inparticular,ifψ i ∈ L 2 µ(Ω) is a transfer operatoreigenfunction, then φ i := Mψ = µψ is a propagator eigenfunction corresponding to thesame eigenvalue, and vice versa. The Hilbert space L 2 µ(Ω) decomposes in the same way−1as L 2 µ(Ω) does, only replacing the orthonormal basis {ψ i } by {φ i }.Example 1: If Ω={x 1 ,...,x N } is finite, it is enough to specify the conditional transitionprobabilities p ij (τ) =P(X τ = x j |X 0 = x i ). A probability density simply becomes a vectorp τ =(p 1 (τ),...,p N (τ)), assigningaprobabilityp i (τ) to each state x i ,andsatisfyingNi=1 p i(τ) =1.ProbabilityvectorsarepropagatedintimebymultiplicationwiththematrixP(τ) =(p ij (τ)) from the left, sincep i (τ) =P(X τ = x i )=NP(X τ = x i |X 0 = x j )P(X 0 = x j )=j=1Np ji (τ)p j (0).j=1(II.25)Consequently, p τ = p 0 P(τ). ThelinearpropagatorsimplybecomesthematrixP(τ) T .Theeigenfunctions φ i are the left-hand eigenvectors of the matrix P(τ), inparticular,thestationarydistribution is a vector π := φ 1 ,whichsatisfiesπ P(τ) =π. SincethetransformationM acts on a function by point wise multiplication with the stationary density, M can bewritten as a matrix Π, whichisdiagonalandcarriesthevectorπ on its diagonal. Since, withthis notation, reversibility can be written as ΠP(τ) =P(τ) T Π,wefindthatT (τ) =M −1 P(τ)M =Π −1 P(τ) T Π=Π −1 ΠP(τ) =P(τ).(II.26)We see that propagator eigenvectors, which are left-hand eigenvectors of the P-matrix, becometransfer operator eigenvectors upon transposing and multiplication with Π −1 ,andthusbecome right-hand eigenvectors of the P-matrix. In the finite-dimensional case, the differencebetween the propagator and the transfer operator is just a matter of left-hand andright-hand eigenvectors of the same matrix.

10 II. The variational principleThe spectral decomposition, the Chapman-Kolmogorov equation Lemma II.1 and the assumptionP(τ)φ → φ for all φ ∈ L 2 µ −1 (Ω) show us that the i-th eigenvalue λ i (τ) is acontinuous function of τ and satisfies λ i (τ 1 +τ 2 )=λ i (τ 1 )λ i (τ 2 ) as well as λ i (0) = 1. Therefore,it can be written as λ i (τ) =e −κ iτ for some κ i > 0, thatis,ithasanexponentialdecayrate. Clearly, κ 1 =0and κ 2 ,...,κ m are close to zero. Looking at the action of P(τ) onsome function ρ once more shows thatP(τ)ρ =∞ρ | φ i µ −1 P(τ)φ i =i=1∞e −κiτ ρ | φ i µ −1 φ i .i=1(II.27)If the time lag τ is very close to zero, almost all terms in the above sum will contribute. Ifτ becomes larger, however, there is a range of time lags τ where, for i>m,allexponentialdecay terms e −κiτ have essentially vanished, whereas all terms with i ≤ m still contribute. Ifτ increases even further, namely if τ 1 κ 2,therearenocontributionsexcepttheveryfirstterm. For a probability distribution ρ, wehaveρ | φ 1 µ −1 = ρ | µ µ −1 =1,soP(τ)ρ ≈ µ.The system is then distributed according to its equilibrium distribution and has basically“forgotten“ the initial deviation from equilibrium expressed by ρ. Eachdominanteigenvaluedefines a time scale 1 κ iand a corresponding slow process which equilibrates only if one waitsmuch longer than this time scale. The slow processes are the link between eigenvalues andmetastable states. A slow process usually corresponds to an equilibration process between twometastable regions, and the implied time scale 1 κ i= − τlog λ idefines an average transition(τ)time. For this reason, the approximation of dominant eigenvalues is of such importance.Example 2: Consider a four state system with stochastic transition matrix⎛⎞0.7571 0.2429 0 0P= ⎜ 0.1609 0.8306 0.0085 0⎟⎝ 0 0.0084 0.8294 0.1622 ⎠ .0 0 0.2413 0.7587(II.28)Clealy, most of the transitions will occur either between states x 1 and x 2 or between x 3 andx 4 ,whereasatransitionfromx 2 to x 3 will be a very rare event. Grouped together, x 1 , x 2 andx 3 , x 4 form two metastable states. The eigenvalues are λ 1 =1, λ 2 =0.9900, λ 3 =0.5964and λ 4 =0.5894. Clearly, there is one dominant eigenvalue related to the slow transitionprocess between the two metastable states.Example 3: In chapter 2, we will discuss the diffusion process of a one-dimensional particlein an energy landscape V .TheparticleissubjecttoaforceF ,whichequalsthederivativeofthe energy function, but is additionally perturbed by random fluctuations. For the potentialfunction V shown in Figure II.1a, theparticlewillspendmostofthetimearoundoneofthetwo minima of V ,andwilljustoccasionallycrossthebarrierseparatingthem. Figure II.1bshows a trajectory of the process, where the metastable behaviour is clearly visible. This isanother typical example of metastable states, we will investigate it in more detail in the next

II.2. Markov state models 11524.51.543.5130.5V(x)2.5X t02−0.51.51−10.5−1.50−1.5 −1 −0.5 0 0.5 1 1.5x−20 2 4 6 8 10Time step tx 10 4(a)(b)Figure II.1.: The potential energy function V displaying two metastable states around x = −1and x =1,andasampletrajectoryofthecorrespondingdiffusionprocess.chapter.II.2. Markov state modelsMarkov state models (MSMs), [Prinz et al, 2011], are a widely used technique to approximatethe eigenvalues of stochastic processes. The idea is to partition the state space intofinitely many disjoint sets A 1 ,...,A s with si=1 A i =Ω,andonlyconsiderthetransitionsbetween any two of these sets. This means that instead of noticing every transition betweenany two points of the full continuous state space, we only take care of transitions betweenthe sets A i . Let us compute the conditional transition probability p ij (τ) between two setsA i and A j ,overtimeτ:p ij (τ) =P(X τ ∈ A j |X 0 ∈ A i )= P(X τ ∈ A j ,X 0 ∈ A i )P(X 0 ∈ A i )=A i(II.29)A jdx dy µ(x)p(x, y; τ). (II.30)A idx µ(x)The matrix P(τ) with entries p ij (τ) is a row-stochastic matrix which defines a Markov processon the finite state space A 1 ,...,A s ,anditiscalledtheMSMtransfermatrix. Probabilitydensities on this space, i.e. probability vectors p =(p 1 ,...,p s ) with si=1 p i =1,aretransported by multiplication with P(τ) from the left, i.e. p τ = p 0 P(τ). As an approximationto the true eigenvalues of the continuous propagator, one simply computes the eigenvalues ofP(τ). Tothisend,onehastofindthematrixentriesp ij (τ). Theseareusuallyestimatedfrom

12 II. The variational principlesimulation trajectories: Let X 0 ,...,X N be a trajectory of length N, withasimulationtimestep ∆t = τ. One estimates µ i := A idx µ(x) as the number of simulation steps X k withX k ∈ A i ,dividedbyN. Thetransitionprobability A iA jdx dy µ(x)p(x, y; τ) is estimatedby the number of transitions from A i to A j ,dividedbythetotalnumberoftransitions,inthis case N − 1. Inpractice,itisoftenusefultochooseτ as some general integer multiple ofthe simulation window ∆t, i.e. τ = n∆t, withn not necessarily equal to one. In this case,only transitions over n simulation time steps are taken into account, which usually leads toahigherapproximationquality,asweshallseelater.It is very important to note that the Markov process defined by the MSM transfer matrixis an approximation of the continuous dynamics: The definition of the transfer matrix inEquation II.30 overlooks all transitions within one of the sets A i . But these transitions areimportant, because the probability to cross over to some other set A j very much depends onthe current position within set A i .MakinguseoftheMSMthereforeautomaticallyresultsinasystematicerrorthateffectstheaccuracyoftheestimatedeigenvalues.Itisquiteeasytoshow that this error decays exponentially proportional to the second eigenvalue. In their 2010paper, Sarich, Noé and Schütte show that the pre-factor can be split up into two independentcomponents. One of them depends on the lag time only, whereas the other one comes fromthe choice of discretization, [Sarich, Noé, Schütte, 2010, theorem3.1].Thediscretizationistherefore of great importance, and a good choice can increase the approximation quality alot.Example 4: The four state system Equation II.28 shows the MSM approximation of thediffusion process shown in Figure II.1. Wechosethefoursetstobetheintervals[−2, −1],[−1, 0], [0, 1] and [1, 2]. ThetransitionmatrixP=P(τ) was estimated from a sample trajectoryof 20 million steps. The lag time dependent improvement of the approximation qualitycan be visualized by looking at the second implied time scale t 2 = −τlog(λ 2. The faster(τ))the estimated time scale converges, the higher the quality of the approximation. Figure II.2shows how the estimated time scale improves upon increasing the time lag. The four statediscretization is a very simple one. We will achieve much faster convergence towards thecorrect time scale in the next chapter.We would like to make use of two more aspects emphasized in this work. First, the constructionof the MSM can be seen as the projection of the transfer operator T (τ) onto afinite subspace of the Hilbert space L 2 µ(Ω), namelythelinearspanofindicatorfunctionscorresponding to the sets A i .Second,aspointedoutinchapter3.4ofthatarticle,itisnotnecessary to restrict this projection to indicator functions, but the error estimate is still trueif T (τ) is projected onto a much more general subspace of L 2 µ(Ω). This is, in a sense, thestarting point of our work. Knowing that the discretization is crucial to the approximationquality, we will try to use the linear span of smooth functions as the subspace referred toabove, and hope that this might work equally well or even better in some cases. In order toformulate our method, we will need a variational principle, which will be derived in the nextsection.

II.3. Variational principle 133.532.5Implied time scale t221.510.500 20 40 60 80 100 120 140 160 180 200Time lag τFigure II.2.: Improvement of estimated time scale t 2 upon increasing the lag time τ, displayedin multiples of the simulation length.II.3. Variational principleWe start with the following definition:Definition II.4: Let f and g be observables of the state space, i.e. functions f,g :Ω→ R.The autocorrelation function acf(f,g; τ) is defined by: acf(f,g; τ) := dx dy f(x)p(x, y; τ)µ(x)g(y).(II.31)Ω ΩThe autocorrelation function of f and g is the expectation value of the product f(x)g(y)with respect to the transition density µ(x)p(x, y; τ). Furthermore,weseefromEquation II.31that it also equals the Rayleigh coefficient T (τ)f | g µ= f |T(τ)g µ=: f |T(τ) | g,because off |T(τ) | g = T (τ)f | g µ= dy 1 dx p(x, y; τ)µ(x)f(x) µ(y)g(y) (II.32)Ω µ(y) Ω = dx dy f(x)p(x, y; τ)µ(x)g(y) =acf(f,g; τ).(II.33)ΩΩ

14 II. The variational principleFor a symmetric operator like T (τ), theRayleighcoefficientisalsocalledmatrix element orsimply expectation value. Thenexttworesultswillrevealtousthesignificanceofautocorrelationfunctions when it comes to approximating eigenvalues and -functions, [Noé, 2011,sec. 2.3]:Lemma II.5: The transfer operator’s eigenfunctions ψ i satisfy the relation:acf(ψ i ,ψ i ; τ) =λ i .(II.34)Proof. All we have to do is recognize the action of T (τ) on its eigenfunctions and use theorthogonality relation between those:acf(ψ i ,ψ i ; τ) =T (τ)ψ i | ψ i µ= λ i ψ i | ψ i µ= λ i .(II.35)Theorem II.6 (Variational principle): For any function ψ ∈ L 2 µ(Ω) which is orthogonalto ψ 1 ,wehave:acf(ψ, ψ; τ) ≤ λ 2 .(II.36)Proof. First, expand ψ in terms of the orthonormal basis {ψ i }.Notethatthereisnooverlapwith ψ 1 :acf(ψ, ψ; τ) =T (τ)ψ | ψ µ=∞c i c j T (τ)ψ i | ψ j µ,i,j=2(II.37)where c i = ψ | ψ i µ.Likeinthepreviousproof,weusetheactionofthetransferoperatoron its eigenfunctions:acf(ψ, ψ; τ) =∞c i c j λ i ψ i | ψ j µ=i,j=2∞c 2 i λ i .i=2(II.38)Finally, we recall that the eigenvalues are sorted in decreasing order, and that in an orthonormalbasis expansion, the squares of the coefficients sum up to unity:acf(ψ, ψ; τ) ≤ λ 2∞i=2c 2 i = λ 2 .(II.39)

II.4. The Ritz method and the Roothan-Hall method 15Remark II.7: Likewise, we can prove that for any function ψ which is orthogonal to thefirst k eigenfunctions, the expression acf(ψ, ψ; τ) is bounded from above by λ k+1 .Among all candidate functions, namely those which are orthogonal to the first eigenfunctionψ 1 ,thetruesecondeigenfunctionmaximizestheRayleighcoefficient.Forsomefunctionψ,T (τ)ψ | ψ µcan be viewed as a measure of how well it approximates the second eigenfunction,and that is what we will be trying to do: Maximize the Rayleigh coefficient amonganumberofansatzfunctions,andcomparedifferentchoicesofcandidatefunctions. Variationalmethods of that sort are used in a number of different fields as well, e.g. for theapproximation of electronic wavefunctions in quantum mechanics. In the next section, weshow how the Rayleigh coefficient can be maximized within the linear span of some giventest functions.II.4. The Ritz method and the Roothan-Hall methodFor all of the upcoming results, we follow [Szabo, Ostlund, 1989, ch.1.3].Theorem II.8 (Ritz method): Let χ 1 ,...,χ m ∈ L 2 µ(Ω) be mutually orthonormal. TheRayleigh coefficient is maximized among these function by the eigenvector b 1 correspondingto the greatest eigenvalue ξ 1 of the matrix eigenvalue problemH b 1 = ξ 1 b 1 ,(II.40)where H is the density matrix with entries h ij =acf(χ i ,χ j ; τ) =ΩΩdx dy χ i (x)p(x, y; τ)µ(x)χ j (y).(II.41)More precisely, the maximal Rayleigh coefficient is found by taking the linear combination ofthe functions χ i with coefficients taken from b 1 .Proof. Let us first, for a function ˆψ = mi=1 b iχ i ,expresstheRayleighcoefficientintermsof the b i :acf( ˆψ, ˆψ;τ) =ˆψ |T(τ) | ˆψ =mb i b j χ i |T(τ) | χ j =i,j=1mb i b j h ij .i,j=1(II.42)

16 II. The variational principleSince the matrix-elements h ij are fixed, acf( ˆψ, ˆψ; τ) can be seen as a differentiable functionof the coefficients b i .Wenowneedtomaximizethisfunctionwithrespecttoaconstraint,namely, that the solution has unit norm:1= ˆψ mm2 = ˆψ | ˆψ = b i b j χ i | χ j µ= b 2 i ,(II.43)µi,j=1as the basis functions χ i were assumed to be orthonormal. In summary, we need to maximizethe functionmmF (b 1 ,...,b m )= b i b j h ij − ξ( b 2 i − 1),(II.44)i,j=1where ξ is a Lagrange multiplier, with respect to the coefficients b i . Since the H-matrix issymmetric, this results in the equations:0= ∂F m=2 h ij b j − 2ξb i , i =1,...,m, (II.45)∂b i⇒ ξb i =j=1i=1mh ij b j , i =1,...,m. (II.46)j=1We have thus arrived at the eigenvalue equation H b = ξb. SinceH is a symmetric matrix,we can find m solutions b i corresponding to real eigenvalues ξ i . If we now compute theRayleigh coefficient of the function ˆψ i = mj=1 b i,jχ j generated from such a solution, wefind:ˆψi |T(τ) | ˆψ mmi = b i,j b i,k χ j |T(τ) | χ k = b i,j b i,k h jk (II.47)=j,k=1mξ i b 2 i,j = ξ i ,j=1i=1j,k=1(II.48)where we used the eigenvalue relation mk=1 b i,kh jk = ξ i b i,j .Thus,thesolutioncorrespondingto the largest eigenvalue ξ 1 maximizes the Rayleigh coefficient among the basis functions χ i ,as we claimed.Lemma II.9: The second largest eigenvalue ξ 2 of Equation II.40 satisfies ξ 2 ≤ λ 2 .Proof. Let ˆψ 1 and ˆψ 2 be the functions generated from the eigenvectors b 1 , b 2 and theireigenvalues ξ 1 , ξ 2 ,respectively. Consideralinearcombination ˆψ := x ˆψ 1 + y ˆψ 2 ,forx, y ∈ R.If ˆψ is normalized, we find, by orthonormality: 1=ˆψ | ˆψ = x 2 ˆψ1 | ˆψ 1 + y 2 ˆψ2 | ˆψ 2 +2xy ˆψ1 | ˆψ2 = x 2 + y 2 . (II.49)µµµµ

II.4. The Ritz method and the Roothan-Hall method 17Moreover, as ˆψ 1 and ˆψ 2 diagonalize the H-matrix:ˆψ1 |T(τ) | ˆψ mm2 = b 1,i b 2,j χ i |T(τ) | χ j = b 1,i b 2,j h ij=i,j=1mξ 2 b 1,i b 2,i =0.i=1i,j=1(II.50)(II.51)Therefore, computing the Rayleigh coefficient for ˆψ yields: ˆψ |T(τ) | ˆψ= x 2 ˆψ1 |T(τ) | ˆψ 1 + y 2 ˆψ2 |T(τ) | ˆψ 2 +2xy ˆψ1 |T(τ) | ˆψ2(II.52)= x 2 ξ 1 + y 2 ξ 2 =(1− y 2 )ξ 1 + y 2 ξ 2 = ξ 1 − y 2 (ξ 1 − ξ 2 ). (II.53)Depending on the choice of y, thisvalueisbetweenξ 2 and ξ 1 .Ifwenowhadλ 2

18 II. The variational principleThe only problem left is the requirement that the basis functions were so far assumed toorthonormal. However, this can easily be circumvented. Again, let χ 1 ,...,χ m ∈ L 2 µ(Ω) benormalized basis functions as before, except that they are no longer assumed to be orthogonal.We can still derive a similar result:Theorem II.10 (Roothan-Hall method): For basis functions as above, the linear combinationthat maximizes the Rayleigh coefficient is given by the solution b 1 corresponding tothe greatest eigenvalue ξ 1 of the generalized eigenvalue problemH b 1 = ξ 1 S b 1 ,where H is as in Theorem II.8 and S is the overlap matrix with entriess ij = χ i | χ j µ.(II.55)(II.56)Proof. Starting out the same way as in Theorem II.8, wefirsthavetoreformulatethenormalization constraint. Equation II.43 then becomes:1= ˆψmm2 = b i b j χ i | χ j µ= b i b j s ij .(II.57)i,j=1i,j=1i,j=1This results in an effective functional of the form:mmF (b 1 ,...,b m )= b i b j h ij − ξ( b i b j s ij − 1).Maximization requires the solution of:⇒ ξ0=2mb j s ij =j=1i,j=1mb j h ij − 2ξj=1mb j h ij ,j=1mb j s ij .j=1(II.58)(II.59)(II.60)for every i ∈{1,...,m}. Thisisthegeneralizedeigenvalueequationstatedabove.SinceH issymmetric and S is positive definite, this problem has m real eigenvalues ξ i and correspondingeigenvectors b i which are orthonormal with respect to the S-weighted scalar product, i.e. wehave b T i S b j = δ ij . Therefore, similar to Equation II.47 - Equation II.48, wefindforthecorresponding functions ˆψ i := mj=1 b i,jχ j :ˆψi |T(τ) | ˆψ mmi = b i,j b i,k χ j |T(τ) | χ k = b i,j b i,k h jk (II.61)j,k=1j,k=1 m= ξ i b i,j b i,k s j,k = ξ i b T i S b i = ξ i .j,k=1(II.62)

II.4. The Ritz method and the Roothan-Hall method 19So again, by taking the solution b 1 with maximal eigenvalue ξ 1 ,wehavefoundthelinearcombination that maximizes the Rayleigh coefficient.Clearly, the Ritz method is just a special case of the Roothan-Hall method, where S is givenby the m × m-identity matrix. With the Roothan-Hall method, we can use any basis set,orthonormal or not. Like the density matrix H, theoverlapmatrixS can also be estimatedfrom finite sampling. Since we are mostly interested in the propagator eigenfunctions, wewill weight the solution with the stationary distribution afterwards. In most cases, we willhave to estimate µ from a sample trajectory as well. Our method consequently consists ofthe following steps:(a) Compute a sufficiently long or sufficiently many short sampling trajectories of theprocess and check convergence.(b) Estimate the stationary distribution from the sample.(c) Estimate the H- andS-matrix from the sample.(d) Solve the generalized eigenvalue problem from Theorem II.10.(e) Weight the solution with the estimated stationary density.Lastly, we see that the Markov state model approximation is also obtained from using theRitz method with indicator functions. So, like in section II.2, chooseapartitionofthestate space into s mutually disjoint sets A 1 ,...,A s ,andletthecandidatebasisfunctions1χ 1 ,...,χ s be given by the indicator functions of these sets, normalized by a pre-factor √ πi,with π i = A idx µ(x). ComputingthedensitymatrixH yields:h ij = √ 1 dxπi π jA i√πi P(X τ ∈ A j ,X 0 ∈ A i )= √πj P(X 0 ∈ A i )A jdy µ(x)p(x, y; τ)=√πi√πjp ij (τ).(II.63)(II.64)The density matrix is the same as the MSM transfer matrix up to a similarity transformationwith transformation matrix Π 1 2 ,whichisthediagonalmatrixwiththesquarerootsofthelocal stationary probabilities as diagonal entries. The Ritz method therefore generates theright-hand eigenvalues of the MSM transfer matrix and is equivalent to the construction ofaMarkovstatemodel.

III.Diffusion processes andapplication in one dimensionWe are now prepared to apply the variational methods we have just derived to a number ofexample systems. Our guiding example will be a diffusion process. This is a system where anumber of classical particles move under the influence of a deterministic force field, generatedby some potential energy function. If there was no further influence, this system could bemodelled by classical equations of motion. However, the particles are also subject to randomfluctuations, which interfere with the deterministic motion. The fundamental concepts ofsuch a system will be introduced below. As a result, the system trajectory can no longerbe uniquely predicted. Instead, we have a stochastic process to which our theoretical resultscan be applied.III.1. Brownian motion and stochastic differentialequationsThe fundamental model for the description of random fluctuations is Brownian motion orthe Wiener process. In1826-27,R.Brownstudiedtheirregularbehaviourofpollengrainsinwater. He and many others observed that the pathways taken by such a particle were highlyirregular. Moreover, whereas the average distance from the starting point seemed to vanish,the mean fluctuation from the average seemed to be growing linearly. These observationsmotivated the following mathematical model, [Evans, ch.3].Definition III.1: Aone-dimensionalstochasticprocessW t on a probability space Ω, definedfor t ≥ 0, iscalledaBrownianmotionoraWienerprocessifitsatisfies:(a) W 0 =0a.e.

22 III. Diffusion processes and application in one dimension(b) For any times t 1 ,...,t k ,theincrementsW tk − W tk−1 ,...,W t2 − W t1random variables.are independent(c) The increments W t −W s are normally distributed according to W t −W s ∼N(0,t−s),∀t >s≥ 0.An R n -valued stochastic process where each component is a one-dimensional Wiener processis called n-dimensional Wiener process. In the 1920s, N. Wiener constructed a Brownianmotion and thereby mathematically proved the existence of such a process. In fact, there isanumberofdifferentwayshowthiscanbeachieved,buttheexplicitrealizationisoflittleimportance. All we need to know are the three conditions from the above definition andthe most important properties. It can be shown that a Brownian motion defines a Markovprocess, that the sample paths, i.e. the functions t → W t ,arecontinuous,butalmosteverywherein Ω nowhere differentiable with respect to time, see Figure III.1 and [Behrends, 2011,ch. 5.2]. This model has turned out to be a very suitable for many different phenomena,reaching way beyond the simulation of particles in a fluid. For instance, Brownian motionhas frequently been used to model financial markets.200150100W t500−50−1000 2000 4000 6000 8000 10000Time step tFigure III.1.: AsamplepathoftheWienerprocess. Clearly,thepathiscontinuousbutnotdifferentiable as a function of time.Despite the irregularity of its sample paths, Brownian motion can be used to model phenomenainvolving differential calculus. Suppose that the change of a quantity X(t) over a short

III.2. Diffusion process 23time ∆t is given by∆X(t) =∆tF (X(t),t)+σW t ,(III.1)meaning that the change is composed of a deterministic function F depending on the quantityX(t) and the time plus a random fluctuation, modelled by Brownian motion. Here, σ>0is simply a control parameter governing the strength of the perturbation. Upon dividing by∆t and passing to the limit ∆t → 0, oneistemptedtowritedownadifferentialequationdX(t)dt= F (X(t),t)+σdW t , (III.2)where we would like to have dW t denote the time derivative of Brownian motion. We havejust seen that such a derivative does not exist. However, Equation III.2 does make sense ifit is transformed into an integral form. This involves the definition of a stochastic integraland the Ito formula, [Behrends, 2011, ch.6,7].Inparticular,Brownianmotionitselfcanbeshown to solve Equation III.2 in this sense, if F =0.III.2. Diffusion processConsider a classical particle of mass m moving under the influence of friction γ>0 and ofaforcefieldF, generatedbyapotentialenergyfunctionV ,i.e. F = −∇V . By Newton’sequation, the position vector x(t) ∈ R 3 satisfies the relationm d2 x(t)dt 2= −mγ dx(t)dt−∇V (x(t)).(III.3)Let us now suppose that there are random perturbations, caused, for instance, by collisionswith small particles, which continuously alter the particle’s motion. We therefore makeEquation III.3 astochasticdifferentialequationbyaddingσdW t ,m d2 x(t)dt 2= −mγ dx(t)dt−∇V (x(t)) + σdW t ,(III.4)keeping in mind that it is meant to be solved in an integral sense. It is known thatσ = √ 2mγk B T ,wherek B =1.3806488J/K is Boltzmann’s constant and T is the systemtemperature. This equation is typically called the Langevin equation of our system,the solution x(t) =:X t is a Markov process, called a diffusion process. Equation III.4 holdsin the same way if we study a system of n particles, in this case, X t is a stochastic process in3n dimensions. If the friction parameter γ is large enough, it is justified to drop the secondderivative on the left-hand side of Equation III.4, [Zwanzig, 2001, sec. 2.2].Rearrangingtheterms modifies the equation todx(t)dt= − 1mγ ∇V (x(t)) + √ 2DdW t ,(III.5)

24 III. Diffusion processes and application in one dimensionwhere we have defined D := k BTThis equation is the Langevin equation in the high frictionlimit, the underlying dynamics is called Smoluchowski dynamics or also Brownianmγdynamics. Itwillbetheinterestofourstudiesinwhatfollows.The Langevin equation itself can hardly be solved, except in a few rare cases. However, itis helpful to restrict oneself to the study of probability densities. Instead of trying to findan expression for the stochastic process X t itself, we are only interested in computing theaverage probability ρ(x,t) to find the particle in a position x at time t. This probabilitydensity also satisfies a differential equation, called the Smoluchowski equation:Theorem III.2 (Smoluchowski equation): The probability ρ(x,t) to find a system obeyingthe Langevin Equation III.5 at position x at time t satisfies the differential equation:∂ρ(x,t)∂t= 1 ∇· (∇V (x)ρ)+D∆ρ(x,t).mγ (III.6)Here, ∇· denotes the divergence of a vector field and ∆ the Laplace operator of a scalarfield.Proof. Aderivationcanbefoundin[Zwanzig, 2001, ch2.2].Equation III.6 is a special case of a Fokker-Planck equation. Thisresultenablesustofindthe stationary density for the diffusion process:Theorem III.3: The stationary solution of the Smoluchowski Equation III.6 is the Boltzmanndistributionµ(x) = 1 Zexp (−βV (x))) ,(III.7)where β := 1 is the inverse Boltzmann temperature and Z := k Bdx exp(−βV (x)) is aTΩnormalization constant, called the partition function.Proof. If ∂ρ∂t=0holds, we have to check that− 1 ∇· (∇V (x)µ(x)) = D∆µ(x).mγ (III.8)

III.3. Numerical considerations 25We find:D∆µ(x) = k BT mγZi= k BTmγZ∂ ∂V (x)−β exp (−βV (x))∂x i ∂x i−β∆V (x)exp(−βV (x)) + i= − 1 [∆V (x)µ(x)+∇V (x) ·∇µ(x)]mγ (III.11)= − 1 ∇· (∇V (x)µ(x)) .mγ (III.12)(III.9) 2 ∂V (x)β 2 exp (−βV (x))∂x i(III.10)We note that µ has to be well-defined for this result to make sense, i.e. the integral definingZ has to exist. This mainly requires that V (x) →∞fast enough as |x| →∞. If thestate space is dynamically connected, which we will always assume, this can also be shownto be sufficient for the process to be ergodic and meet the requirements from chapter II,[Sarich, Noé, Schütte, 2010, ch. 2.2]. Therefore,wehavejustshownthatµ is the stationarydistribution of the diffusion process, and we can now apply the foregoing theory.III.3. Numerical considerationsKnowing the invariant distribution of a diffusion process helps us a lot, because our computationalresult for the first eigenfunction can be compared to the function exp(−β∇V (x)),and should be almost equal to it up to a pre-factor. However, little more can be computedin advance, except for a few simple systems. In particular, the transition kernel p(x, y; τ) isunknown to us. But in order to apply the variational methods, we need to perform a numericalsimulation of the process, which means that if the current simulation step is x ∈ Ω, weneed to draw the next step with time lag ∆t from the distribution p(x, · ;∆t). Wethereforeapproximate this distribution by a very common procedure, the so-called Euler-Maruyamamethod. It consists of combining a deterministic explicit Euler step with a normally distributedrandom step. If the current simulation position is x, thedeterministicEulerstepisx → x − ∆t∇V (x).(III.13)This approximates the position that the process would attain if it was unperturbed. Butsince it is subject to normally distributed noise which, after time ∆t, hasvariance2D∆t,we add a normally distributed random number with standard deviation √ 2D∆t. Altogether,

26 III. Diffusion processes and application in one dimensionthe simulation step becomesx → x − ∆t∇V (x)+ √ 2D∆tη,(III.14)where η ∼N(0, 1). Clearly,thisisjustanapproximationofthetruedynamics,butagoodone as long as ∆t is chosen small enough. However, having in mind the difficulties arisingfrom the use of the explicit Euler method to unperturbed dynamical systems, it can also leadto serious trouble, a problem we will encounter in the next chapter.We are left with the question of how to choose an appropriate basis set. In the previouschapter, we have seen that the eigenfunctions φ i of the propagator and the eigenfunctionsψ i of the transfer operator differ only by a factor of the stationary density. An approximationscheme using the Roothan-Hall method results in an estimate of the functions ψ i .Butsinceit is our goal to determine the propagator eigenfunctions φ i ,wecaneitherstartwithasetof functions suitable to approximate the functions ψ i ,andweighttheresultwithafactorµ afterwards, or we can go the other way round. That means to start with a number offunctions suitable to estimate the φ i -functions and divide them by µ prior to the computation.After the computation, we simply take the resulting linear combination of our original basisfunctions to estimate the functions φ i .Thereisalsoathirdway:Iff ∈ L 2 (Ω) is an elementof the unweighted L 2 -space, then √ fµis in L 2 µ(Ω) on the one hand, and √ µf is in L 2 µ(Ω) −1on the other hand. These functions are somehow “in between“ the two domains, and theycan also be used with the Roothan-Hall method. To this end, they have to be divided bythe square root of µ before the computation, and the resulting linear combination has to bere-weighted by √ µ afterwards.Our approach to all of the following problems will be to make use of the shape of the energyfunction. The function V will always display a number of minima, which correspond tometastable regions of the state space. We will try to use local functions, like Gaussian hats,which are essentially non-zero only in a neighbourhood of the metastable regions. Sincefunctions in L 2 µ(Ω) and L 2 (Ω) at least have to decay to zero if |x| becomes large, whereas−1functions in L 2 µ(Ω) can be constant or even unbounded, we will prefer one of the lattertwo ways. Additionally, using unbounded functions can lead to instabilities if rare statesoutside the metastable regions are visited by a sampling trajectory, because these functionscan become very large for such states. In the experiments, both of the two preferred waysseemed to perform well. All of the computations shown below were ran using Gaussianfunctions weighted with the square root of the stationary density.III.4. Diffusion in a quadratic potentialAsimplebutimportantexampleisdiffusioninaquadraticpotentialV (x) = 1 2bx2 , forx ∈ R and b>0. Inserting this into Equation III.7 shows that µ(x) = √ 12παexp ,the stationary distribution becomes a Gaussian distribution with zero mean and varianceα = k BT. The potential and its invariant measure are depicted in Figure III.2. It turns outb− x22α

III.4. Diffusion in a quadratic potential 2780.470.3560.350.25V(x)4µ(x)0.230.1520.110.050−4 −3 −2 −1 0 1 2 3 4x0−4 −3 −2 −1 0 1 2 3 4xFigure III.2.: Quadratic potential V (x) =0.5x 2 and its invariant distribution.that this is one of the rare cases where all important quantities can be determined analytically.Using stochastic integrals and Ito’s formula, we derive an explicit expression for this process inLemma A.1. Usingthisexpression,wethenshowthatiftheprocessstartswithadistributionρ 0 ,theexpectationvalueattimet is given by E [ρ t ]=e −θt E [ρ 0 ],wherewehavedefinedθ := bmγ .Moreover,thevarianceevolvesaccordingtoω2 [ρ t ]=e −2θt ω 2 [ρ 0 ]+(1− e −2θt )α.Regardless of the initial distribution, the process quickly approaches the stationary Gaussiandistribution. Even the complete set of eigenfunctions can be found analytically:Lemma III.4: The propagator eigenfunctions are given by the appropriately scaled Hermitepolynomials, multiplied with the invariant measure: αφ i (x) =i−1 x(i − 1)! H i−1 √α µ(x). (III.15)The corresponding eigenvalues are λ i (τ) =e −θ(i−1)τ .Proof. We also show the proof of this Lemma in the appendix.Since there is only one potential minimum, there are no metastable states. Unless b is verysmall or mass and friction are very large, the global relaxation time −τ = τ = mγlog λ 2is(τ) θτ bshort, the process quickly equilibrates to its invariant distribution.Since all these analytic quantities are at hand, we can compare them to the results obtainedfrom the Roothan-Hall method. As shown in Figure III.3, arelativelysmallnumericaleffortleads to a good approximation of the eigenfunctions as well as the corresponding eigenvaluesand implied time scales. Though this example may be a simple one, it is still important. In

28 III. Diffusion processes and application in one dimension0.450.4Estimated φ1Exact φ10.250.2Estimated φ2Exact φ20.350.150.30.1φ1(x)0.250.2φ2(x)0.050−0.050.15−0.10.1−0.150.05−0.20−4 −3 −2 −1 0 1 2 3 4x−0.25−4 −3 −2 −1 0 1 2 3 4xSecond eigenvalue λ210.980.960.940.920.90.880.860.840.82Estimated second eigenvalueExact second eigenvalueImplied time scale t210.990.980.970.960.950.940.930.920.910.80 50 100 150 200Time lag τ0.90 50 100 150 200Time lag τFigure III.3.: Results obtained from applying the Roothan-Hall method to diffusion in a harmonicpotential. A simulation of 20 million time steps of length ∆t =10 −3was used with thirteen Gaussian test functions centred at −3, −2, −1, −0.8,−0.4, −0.2, 0, 0.2, 0.4, 0.8, 1, 2, 3. The variances were set to 0.5 for allcentres between −0.8 and 0.8, andto1 for the remaining functions. (a) Firsteigenfunction φ 1 .(b)Secondeigenfunctionφ 2 .(c)Secondeigenvalueλ 2 .(d)Estimated second implied time scale t 2 .Exactvalueist 2 =1.

III.5. Two-well potential 29molecular simulations, a harmonic potential is often used to model bonds between atoms,as well as bond angles. The harmonic potential will therefore frequently show up in applications.III.5. Two-well potentialAnother important example is a two-well potential. In general, this is some potential functiondisplaying two minimum positions which are separated by a significant energy barrier, andwhich rises to infinity both to the left and to the right of the minima. For our calculations,we choose V (x) =k(x 4 − 2x 2 +1) for some k>0, whichhastwominimaatx = −1and x =1.TheenergyfunctionandtheinvariantdistributionaredisplayedinFigure III.4aand Figure III.4b. Thetwo-wellpotentialisagoodmodelforcertainmolecularinteractionswhere a certain degree of freedom favours two characteristic positions. The regions aroundthe potential minima correspond to metastable states, we therefore expect one dominantslow process apart from the stationary process. We apply the Roothan-Hall method usingthirteen Gaussian functions centred around the two minima, with uniform variance equalto 0.5. In order to evaluate the results, we also computed the eigenvalues of an MSMtransition matrix. Here, we used a fine discretization of the state space into 100 sets, mostof them situated close to the potential minima. A comparison of the results can be seenin Figure III.4. Most importantly, we see that the implied time scales are estimated verywell. Both methods converge quickly to about the same value. The stationary distributionis very well approximated, and we obtain a convincing result for the second eigenfunction,clearly displaying a characteristic sign change between the two metastable regions. Thecomputational effort required by the Roothan-Hall method is much smaller, since we onlyneed to compute an eleven by eleven matrix instead of a 100 × 100 matrix.

30 III. Diffusion processes and application in one dimension54.543.530.90.80.70.60.5Estimated φ1Exact φ1V(x)2.5φ1(x)0.420.31.50.210.10.500−1.5 −1 −0.5 0 0.5 1 1.5x−0.1−3 −2 −1 0 1 2 3x(a)(b)10.9950.99Estimated λ2 from Roothan−HallEstimated λ2 from MSM4.5Second eigenvalue λ20.9850.980.9750.970.9650.960.9550.950 50 100 150 200Time lag τ(c)Implied time scale t243.5Estimated t2 from Roothan−HallEstimated t2 from MSM30 50 100 150 200Time lag τ(d)10.8Estimated φ20.60.40.2φ2(x)0−0.2−0.4−0.6−0.8−1−3 −2 −1 0 1 2 3x(e)Figure III.4.: Application of the Roothan-Hall method to the one-dimensional two-well potential.A simulation of 20 million time steps with ∆t =10 −3 was used with thirteenGaussian hats centred at −2, −1.5, −1.2, −1, −0.8, −0.5, 0, 0.5, 0.8, 1,1.2, 1.5, 2. Thevariancesweresetto1 for the centres at x = −2, −1.5, 0, 1.5, 2and to 0.5 for all others. We compare the results with a 100 set MSM discretization.(a) Potential function V (x). (b) First eigenfunction φ 1 . (c) Secondeigenvalue λ 2 . (d) Second implied time scale t 2 . (e) Second eigenfunction φ 2 .

IV.Application to moleculesIn this chapter, we are going to apply the variational method to diffusion processes in ahigher dimensional state space. For systems with many degrees of freedom, the potentialenergy function often possesses multiple minima, and contains at times highly complicatedinteractions between several of the coordinates. Evaluation of the partition function becomesvery difficult, and requires specialized algorithms like Markov chain Monte Carlo methods.Computing eigenvalues and characteristic time scales can be done using Markov state models,but the choice of adequate sets requires suitable clustering methods. This also leads to ahuge computational effort if the state space is very high-dimensional. We have the hope thatthe use of variational methods can help to reduce this effort.IV.1. The example systemLet us consider a system which is like a very much simplified small molecule. Let it consistof N atoms, with N being small, either equal to 4 or to 5 in what follows. Denote theirposition vectors by r i ∈ R 3 , i ∈{1,...,N}. Neighbouringatomsareconnectedbyabond.Denote the distance vectors between those atoms by r ij := r j − r i and the distances byr ij := r ij .Forthreeneighbouringatoms,wedefinethebondangleθ ijk byθ ijk := cos −1 − r ij | r jk r ij r jk, (IV.1)which is the angle between r ij and r jk . Additionally, for four atoms, we can define thedihedral angle ψ ijkl as the angle between the plane spanned by r ij , r jk and the one spannedby r jk , r kl .Usingthenormalvectorsn ijk and n jkl ,givenbyn ijk := r ij × r jk ,(IV.2)which are perpendicular to the respective planes, we find that the dihedral is the anglebetween the two normal vectors: ψ ijkl =cos −1 nijk | n jkl . (IV.3)n ijk n jkl

32 IV. Application to molecules‘-168.931.49Figure IV.1.: Schematic drawing of the four atom system, corresponding either to system Aor B. It shows the dihedral angle in blue. Created with [VMD, 1996].AsketchofsuchasystemisshowninFigure IV.1.Let us now define a potential function V ,whichgeneratesaforcefieldactingonthemolecule.Abondbetweentwoatomsistypicallymodelledlikeaspring. Therefore,wedefineharmonicpotentials V ij ,dependingonthedistancer ij ,foreachpairofconnectedatoms:V ij := 1 2 k ij (d ij − r ij ) 2 .(IV.4)Here, k ij is a constant defining the strength of the potential and d ij is the distance of minimalenergy. The system will drive the bond lengths towards the minimum distances d ij .Similarly,we set up harmonic potentials V ijk for each bond angle:V ijk := 1 2 k ijk (d ijk − θ ijk ) 2 .(IV.5)The dihedral angles usually have a number of favoured positions. In order to model this, wedefine the dihedral potential asV ijkl := k ijkl (1 − cos(nψ ijkl )) ,(IV.6)for ψ ijkl ∈ [−π, π). The position ψ ijkl = π is identical to ψ ijkl = −π, consequently,itisenough to have V ijkl defined on the above interval. This potential has n minimum positions,

IV.1. The example system 33221.81.81.61.61.41.41.21.2V(ψ)1V(ψ)10.80.80.60.60.40.40.20.20−4 −3 −2 −1 0 1 2 3 4ψ0−4 −3 −2 −1 0 1 2 3 4ψFigure IV.2.: The potential energy function V (ψ ijkl )=k ijkl (1 − cos(nψ ijkl )) for the dihedralangle, with k ijkl =1and n =2, n =4,respectively.as displayed in Figure IV.2. Ifn =2,thesetwominimaaresituatedatψ ijkl =0;−π. Lastly,we would also like to include Coulomb potentials V C defined by:V C (r ij )= 14π 0e 1 e 2r ij, (IV.7)which describe the electrostatic repulsion or attraction between two atoms. Here, 0 is theelectric constant, and e 1 , e 2 are the charges of the two atoms. In our simulation, the chargeswill be equal to one elementary charge e =1.602 · 10 −19 C in absolute value, but can haveopposite signs. The total potential energy function V is then the sum of all individualenergies.For our computations, we will consider three small systems, which we call system A, systemB and system C.Thefirstisafouratomsystemwhichonlyincludesthethreeharmonicbondinteractions, two bond angle interactions, and a dihedral potential with n =2.Thesecondone also consists of four atoms, but we set n =4for the dihedral and add an attractiveCoulomb interaction between atoms one and four. In the last case, we have N =5,fourbonds, three bond angles and two dihedrals, one with n =4and the other with n =2.If we now add random perturbations and Smoluchowski dynamics, we obtain a diffusionprocess with invariant distributionµ(x) = 1 Zexp(−βV (x)),(IV.8)where Z is the partition function, as before, and x ∈ R 3N is the full position vector containingall coordinates of the atoms.It is important to point out once more that the stationary distribution and all the othereigenfunctions are defined on the 3N-dimensional Euclidean space. As such, they probablytake a highly complicated shape. At least for our examples however, the state of the system

34 IV. Application to molecules12121010Absolute frequency14 x 104 Bond angle θ86Absolute frequency14 x 104 Bond angle θ86442201.6 1.7 1.8 1.9 2 2.1 2.2 2.301.75 1.8 1.85 1.9 1.95 2 2.05 2.1 2.15 2.2Figure IV.3.: Statistical distribution of the two bond angles of a four atom system withCoulomb 1-4 attraction. The position of minimum energy is θ =1.9373 =111.00 deg. We see that the interaction causes a slight shift of both distributions,but because of the strong bonds, the Gaussian type distribution is wellconserved. Clearly, the changes of the distributions caused by the interactionscan be much more drastic, but still we hope that they will not altogether destroythe original shape.as well as its potential energy is completely determined by the bond lengths, the bond anglesand the dihedral angles. These coordinates are called the internal coordinates. Projectedonto these internal coordinates, the eigenfunctions will probably have a much simpler form.Clearly, these projections will in general not be equal to the eigenfunctions we would findif a one-dimensional particle was propagated under the influence of the energy dependingon that coordinate only, since the internal coordinates are not independent of each other(see Figure IV.3). But still, we have the hope that the projections somehow resemble thesefunctions. This is the central idea of the following calculations: If the internal coordinates arenot too heavily dependent on one another, we can try to apply the variational method withlocal functions, dependent on just one internal coordinate, and placed near the minimumpositions of the individual potential energies. Choosing these functions does not require theuse of clustering techniques, but can be based on knowledge of the molecular energy function,which is known as an expression of the internal coordinates. Since the number of internalcoordinates grows at most like O(N 2 ),thesamewouldbetrueforthenumberofbasisfunctions needed. As before, we will stick to Gaussian functions in the square root weightedspace throughout the following examples, and we will only use functions which depend onone of the dihedral coordinates.

IV.2. Simulation 35Quantity Symbol Unit ValueBoltzmann constant k B kJ/mol · K 8.3144621 · 10 −3Avogadro constant N A mol −1 6.02214129 · 10 23Electric constant 0 Fm −1 8.854187817 · 10 −12Elementary charge e C 1.602176565 · 10 −19Atomic mass unit u kg 1.660538912 · 10 −27Mass of one carbon atom m kg 12.001uReference distance between atoms d b nm 0.153Reference bond angle d a deg 111.00Temperature T K 400 (A,B) or 600 (C)Friction γ ps −1 200Bond force constant k b kJ/mol · nm 2 2 × 10 5Angle force constant k a kJ/mol · rad 2 2 × 10 3Dihedral force constant k d kJ/mol 5.92Table IV.1.: Overview of constants and parameters used in the simulation.IV.2. SimulationIn order to produce simulation trajectories of the above system, we implement a small Matlabbasedmolecular dynamics program. It allows us to define all necessary parameters, such astemperature, friction, mass of the atoms, potentials to be used, simulation time step andnumber of time steps. After initializing all these, it propagates the system in its Euclideancoordinates, that is, for each time step, it computes the gradient of the potential in Euclideancoordinates and adds a random perturbation. After each step, it computes the internalcoordinates and saves them to a binary file. Optionally, one can also store the Euclideancoordinates as a binary file or as an .xyz-file to be used with an MD-viewer.As for the settings, we try to make a realistic simulation and use physical units, in particularnm for distances, ps for time and kJ/mol for energies. We also try to use realistic values forthe reference distances and angles, as well as for the force constants. However, we also wishfor a certain type of behaviour to show up. First, we choose the bonds and the angles to bevery strong, meaning that they should be distributed closely around the reference positions,which requires k ij and k ijk to be chosen sufficiently large for all bonds and all bond angles.We encounter a number of difficulties here. If k ij is large, the Euler discretization canquickly lead to a blow-up. Consequently, we either have to make the simulation time stepvery small or friction large enough. Additionally, we want the dihedral angles to display ametastable behaviour, like the type of behaviour we have studied in the previous examples.This requirement limits the temperature, because a high temperature will flatten the energylandscape, and also keeps friction from being chosen too large, since this would lead to verylong waiting times until a transition occurs. Furthermore, it keeps the time step from beingmade too small, since this would make simulations covering sufficiently many transitions very

36 IV. Application to moleculesexpensive. It turns out to be quite difficult to find settings matching these requirements. Inparticular, we have to turn away from values used in real world examples a number of times.AcompletelistoftheparametersweusedisshowninTable IV.1.IV.3. Application of the Roothan-Hall methodWith a sufficiently long simulation trajectory at hand, we can turn to the approximation ofeigenvalues and time scales. For system A, we expect one dominant slow process to show up.We start with a small basis set, consisting of no more than two Gaussian functions centredat the two minimum positions, in order to get a rough approximation of the eigenvalues.Furthermore, we apply the method to a large basis set of twenty-seven functions, most ofthem centred closely around the two minima, with variance equal to 0.1. Detailsaboutthebasis sets used in this chapter are given in Table IV.2. As a reference, we also computethe eigenvalues and -vectors of two Markov state model discretizations. One of them justsplits up the state space into two sets along the dihedral coordinate. The other one is a finediscretization involving 100 sets along the same coordinate. Results showing the estimatedimplicit time scales, the dominant eigenvalues, as well as the first two eigenfunctions, aredepicted in Figure IV.4. Weseethatinbothcases,theslowprocesscanberesolved,andtheimplicit time scales converge as the lag time τ increases. As expected, the fine discretizationsconverge much faster than the coarse ones. Most importantly, we see that the large basisset Roothan-Hall approximation is as good as the fine MSM for most lag times. Also, bothmethods yield a very good approximation of the stationary density, compared to the statisticalestimate obtained from sampling, and a convincing estimate of the second eigenfunction,displaying the characteristic sign change between the minima. This is a good result, it showsus that the variational method yields an approximation which is comparable to a fine MSM.We also learn from our attempts that the choice of basis functions is very influential to theapproximation quality. If the functions are too strongly peaked, they are not sufficiently correlatedand the eigenvalues go down. However, if they are too widely spread, they correlatetoo much and the H-matrix is just poorly estimated. As a consequence, the computationreturns eigenvalues which are greater than one or have non-zero imaginary parts. But as wesaid in the beginning of the chapter, the shape of the energy function can be used to choosethe right functions and move in the right direction.For system B, the process will now display four metastable regions, but they will be populatedwith different frequencies. Whereas the minimum which is closest to atom number one willbe visited much more often because of the electrostatic attraction, the others will be lesspopulated. We therefore expect a number of dominant slow processes to appear, describingthe transitions between different minima. Again, we apply the Roothan-Hall method withtwo different basis sets, a small one consisting of four functions, and a large one consistingof 35 functions, which are distributed closely around the four minima. We find that threedominant slow processes are present, and compare the estimates for implicit time scales and

IV.3. Application of the Roothan-Hall method 37eigenvalues with a four set and a 100 set MSM discretization. The results are shown inFigure IV.5 and Figure IV.6. Theyaresimilartothepreviousexample,botheigenvaluesandtime scales as well as the eigenfunctions are approximated comparably well by both the largeset Roothan-Hall method and the fine MSM.These examples had in common that the metastable behaviour was only dependent on onesingle coordinate. As a last step, we take a look at system C, where more than one coordinateis relevant for the description of slow processes. For the Roothan-Hall method, wecombine the two basis sets used in the previous examples, that means we have 35 basisfunctions which depend on the first dihedral coordinate and 27 which cover the second. Wecompare the results to those stemming from a Markov state model discretization, whichuses all possible combinations of 15 sets along each coordinate, i.e. 225 sets altogether.The second, third and fourth eigenvalue and time scale, estimated with both methods, areshown in Figure IV.7. WeseethatthetimescalesestimatedwiththeRoothan-Hallmethodconverge earlier. This is a good sign, because we are able to estimate the eigenvalues andtime scales of the larger system by simply adding up the basis functions used for the smallersystem, without having to use all possible combinations of these functions.

38 IV. Application to moleculesSystem Potentials Basis setA Three bonds, two bond angles, one Small basis: Two Gaussians, centredat − π, π,bothwithvariance2 2dihedral with two minima.0.3. To simplify the computation,we shifted the minima to those twopositions.Large basis: 27 Gaussians, centredat −3.1, −3.0, −2.8, −2.6, −2.4,−2.2, −2.0, −1.5, −1.0, −0.8,−0.6, −0.4, −0.2, 0, 0.2, 0.4, 0.6,0.8, 1.0, 1.5, 2.0, 2.2, 2.4, 2.6, 2.8,3.0, 3.1. All of them with varianceBCThree bonds, two bond angles, onedihedral with four minima, electrostaticattraction between atomsone and four.Four bonds, three bond angles, twodihedrals with n =4for the firstone and n =2for the second one.0.1.Small basis: Four Gaussians, centredat − 3π, − π, π, 3π,allofthem4 4 4 4with variance 0.1. To simplify thecomputation, we shifted the minimato those four positions.Large basis: 35 Gaussians centredat −3.0, −2.9, −2.8, −2.6, −2.2,−2.0, −1.8, −1.7, −1.6, −1.5,−1.4, −1.2, −1.0, −0.8, −0.4,−0.2, −0.1, 0, 0.1, 0.2, 0.4, 0.8,1.0, 1.2, 1.4, 1.5, 1.6, 1.7, 1.8, 2.0,2.2, 2.6, 2.8, 2.9, 3.0. All of themwith variance 0.1.The large set from B for the firstdihedral and the large set from Afor the second.Table IV.2.: Description of the example systems and the basis sets used for them.

IV.3. Application of the Roothan-Hall method 3912100.70.6Estimated φ1, Roothan−HallEstimated φ1, sampling80.5V(ψ)64φ1(ψ)0.40.30.220.10−4 −3 −2 −1 0 1 2 3 4ψ(a)0−4 −3 −2 −1 0 1 2 3 4ψ(b)Second eigenvalue λ21.00510.9950.990.9850.98Estimated λ2, large Roothan−HallEstimated λ2, large MSMEstimated λ2, small MSMEstimated λ2, small Roothan−HallImplied time scale t23530252015100.9750.970 50 100 150Time lag τ(c)Estimated t2, large Roothan−Hall5Estimated t2, large MSMEstimated t2, small Roothan−HallEstimated t2, small MSM00 50 100 150Time lag τ(d)0.60.40.2φ2(ψ)0−0.2−0.4−0.6Estimated φ2, Roothan−HallEstimated φ2, MSM−0.8−4 −3 −2 −1 0 1 2 3 4ψ(e)Figure IV.4.: Approximation results for system A. We used every third step of a 30 milliontrajectory corresponding to a sampling time step ∆t =10 −3 ps. We comparethe results obtained with the small and the large basis set with a 2 set anda 100 set MSM. The functions displayed were computed from the large basissets. (a) Potential energy for the dihedral coordinate. (b) Projection of firsteigenfunction φ 1 ,comparedtodirectestimatefromthesample. (c)Secondeigenvalue λ 2 . (d) Second implied time scale t 2 . (e) Projection of secondeigenfunction φ 2 .

40 IV. Application to molecules121080.90.80.70.6Estimated φ1, Roothan−HallEstimated φ1, samplingV(ψ)64φ1(ψ)0.50.40.320−4 −3 −2 −1 0 1 2 3 4ψ(a)0.20.10−4 −3 −2 −1 0 1 2 3 4ψ(b)10.99Estimated λ2, large Roothan−HallEstimated λ2, large MSMEstimated λ2, small Roothan HallEstimated λ2, small MSM181614Second eigenvalue λ20.980.970.96Implied time scale t2121080.950.940 50 100 150Time lag τ6Estimated t2, large Roothan−HallEstimated t2, large MSM4Estimated t2, small Roothan HallEstimated t2, small MSM20 50 100 150Time lag τ(c)(d)0.80.6Estimated φ2, Roothan−HallEstimated φ2, MSM0.40.2φ2(ψ)0−0.2−0.4−0.6−0.8−4 −3 −2 −1 0 1 2 3 4ψ(e)Figure IV.5.: Approximation results for system B. Every fifth step of a 30 million trajectorycorresponding to a sampling time step ∆t =10 −3 ps was used. We comparethe results obtained with the small and the large basis set to a 4 set and a100 set MSM discretization. The functions displayed were computed using thelarge basis sets. (a) Potential energy for the dihedral angle. (b) Projectionof first eigenfunction φ 1 ,comparedwithdirectestimatefromsampling. (c)Second eigenvalue λ 2 . (d) Second implied time scale t 2 . (e) Projection ofsecond eigenfunction φ 2 .

IV.3. Application of the Roothan-Hall method 4110.990.98Estimated λ3, large Roothan−HallEstimated λ3, large MSMEstimated λ3, small Roothan HallEstimated λ3, small MSM1210Third eigenvalue λ30.970.960.95Implied time scale t38640.940.930.920 50 100 150Time lag τEstimated t3, large Roothan−Hall2Estimated t3, large MSMEstimated t3, small Roothan HallEstimated t3, small MSM00 50 100 150Time lag τ(a)(b)0.80.60.4Estimated φ3, Roothan−HallEstimated φ3, MSM10.980.96Estimated λ4, large Roothan−HallEstimated λ4, large MSMEstimated λ4, small Roothan HallEstimated λ4, small MSMφ3(ψ)0.20−0.2Fourth eigenvalue λ40.940.92−0.40.9−0.60.88−0.8−4 −3 −2 −1 0 1 2 3 4ψ(c)0.860 50 100 150Time lag τ(d)760.80.6Estimated φ4, Roothan−HallEstimated φ4, MSM50.4Implied time scale t4432Estimated t4, large Roothan−Hall1Estimated t4, large MSMEstimated t4, small Roothan HallEstimated t4, small MSM00 50 100 150Time lag τ(e)φ4(ψ)0.20−0.2−0.4−0.6−4 −3 −2 −1 0 1 2 3 4ψ(f)Figure IV.6.: Continuation of Figure IV.5. (a) Third eigenvalue λ 3 . (b) Third implied timescale t 3 . (c) Projection of third eigenfunction φ 3 . (d) Fourth eigenvalue λ 4 .(e) Fourth implied time scale t 4 .(f)Projectionoffourtheigenfunctionφ 4 .

42 IV. Application to moleculesSecond eigenvalue λ210.9950.990.9850.980.9750.970.9650.96Estimated λ2, Roothan−HallEstimated λ2, MSM0.9550 20 40 60 80 100Time lag τ(a)Implied time scale t21412108642Estimated t2, Roothan−HallEstimated t2, MSM00 20 40 60 80 100Time lag τ(b)Third eigenvalue λ310.990.980.970.960.950.940.93Estimated λ3, Roothan−HallEstimated λ3, MSMImplied time scale t35.554.543.532.521.50.920.910 20 40 60 80 100Time lag τ(c)1Estimated t3, Roothan−HallEstimated t3, MSM0.50 20 40 60 80 100Time lag τ(d)Fourth eigenvalue λ410.990.980.970.960.950.940.930.92Estimated λ4, Roothan−HallEstimated λ4, MSMImplied time scale t45.554.543.532.521.50.910.90 20 40 60 80 100Time lag τ(e)1Estimated t4, Roothan−HallEstimated t4, MSM0.50 20 40 60 80 100Time lag τ(f)Figure IV.7.: Approximation results for system C. Every fifth step of a 10 million samplingtrajectory with time step ∆t =10 −3 ps was used. We compare the Roothan-Hall method with a combination of the large basis sets of the previous examplesto a 15×15 MSM discretization. (a) Second eigenvalue λ 2 .(b)Secondimpliedtime scale t 2 . (c) Third eigenvalue λ 3 . (d) Third implied time scale t 3 . (e)Fourth eigenvalue λ 4 .(f)Fourthimpliedtimescalet 4 .

V. Summary and outlookWe have presented a method to approximate the eigenvalues and time scales of slow processesin stochastic dynamical systems based on the variational principle Theorem II.6 andthe Roothan-Hall method Theorem II.10. WehaveappliedthismethodwithGaussianfunctionsto one-dimensional and a few simple higher dimensional examples of diffusion processesgoverned by Smoluchowski dynamics. So far, we have seen that in all cases a convincingestimate of the involved time scales was achieved. Comparison with results obtained fromMarkov state model discretizations showed that the variational method can reproduce thequality of these.V.1. Future workFuture work on the subject will have to deal with a number of questions:V.1.1. Application to larger systemsCan we, in a chain with many internal coordinates, obtain a convincing result by simply addingup local basis functions, dependent on one coordinate, chosen on the basis of the molecularenergy function? This assumes that the internal coordinates are sufficiently independent toallow this approximation. If it works, it would avoid the use of clustering techniques andresult in a computationally affordable method to compute the dominant eigenvalues andrelevant time scales. As a first test, we will have to try a five atom system with additionalCoulomb interactions, and see if it works.

44 V. Summary and outlookV.1.2. Choice of basis functionsSo far, we have exclusively used Gaussian functions, and more or less guessed their shapeparameters from the energy function. Can we make a more precise statement about therelation between the shape of the basis functions and the approximation quality, and whatother types of functions can be used?V.1.3. Numerical stabilityWe will have to address sources of numerical instabilities involved in our computations. Asmentioned before, choosing basis functions which are either too uncorrelated or correlatedtoo much can lead to meaningless results. Can we find a way to quantify these instabilitiesand determine how they are related to the choice of functions?V.1.4. Importance of samplingThe method does not work without the computation of simulation trajectories. In the examplewe studied, it was easy to compute trajectories which visit all the metastable regionsfrequently often. But what does this mean for greater systems? How can we determinethe minimum length of a sufficiently long trajectory, and how much does the use of finitesimulations affect all the quantities involved in the method, like the H- andS-matrix?

A. AppendixA.1. Diffusion in a quadratic potentialLet us prove the properties of the diffusion in a quadratic potential as introduced in sectionIII.4. First,letusderiveananalyticexpressionfortheproblem’ssolution.Onstochasticintegrals and Ito’s formula, see [Evans, ch.4].Lemma A.1: If the process is initially distributed according to X 0 ,thenthedistributionattime t is given by the stochastic integralX t = e −θt X 0 + σ t0dB s e −θ(t−s) .(A.1)Proof. The proof can also be found on [Wikipedia]. Let X t be the solution of the correspondingstochastic differential equation, i.e.dX t = −θX t dt + σdB t .(A.2)Consider the function f(X t ,t):=e θt X t . According to Ito’s formula, f(X t ,t) satisfies theSDE:d(f(X t ,t)) = ∂f ∂fdt +∂t ∂x dX t + 1 ∂ 2 f2 ∂x 2 σ2 dt(A.3)= θe θt X t dt − θe θt X t dt + e θt σdB t = e θt σdB t . (A.4)The process f(X t ,t) is therefore given by the stochastic integrale θt X t = f(X t ,t)=X 0 + σ t0dB s e θs .(A.5)

46 A. AppendixMultiplying the last expression by e −θt gives the desired result.Now we can confirm the claims about the expectation and variance.stochastic integrals we use can be found again in [Evans, ch. 4].The properties ofLemma A.2: The solution X t from Lemma A.1 satisfiesE [X t ]=e −θt E [X 0 ] ,ω 2 [X t ]=e −2θt ω 2 [X 0 ]+(1− e −2θt )α,(A.6)(A.7)with α = k BTb= σ22θ .Proof. Directly using Equation A.1, wefind tE [X t ]=e −θt E [X 0 ]+σE dB s e −θ(t−s) = e −θt E [X 0 ] , (A.8)0since the expectation of stochastic integrals over scalar functions is always zero. Similarly,using that the expectation of the square of a stochastic integral over a scalar function equalsthe normal integral over the square of that function, we computeE Xt2 = e −2θt E t t 2X02 +2σe −θt E [X 0 ] E dB s e −θ(t−s) + σ 2 E dB s e −θ(t−s)0= e −2θt E tX0 2 + σ 2 E ds e −2θ(t−s) (A.9)(A.10)0= e −2θt E X02 σ 2+2θ (1 − e−2θt ).(A.11)Therefore,ω 2 [X t ]=E Xt2 − (E [Xt ]) 2 = e −2θt E X02 + α(1 − e −2θt ) − e −2θt (E [X 0 ]) 2 (A.12)= e −2θt ω 2 [X 0 ]+α(1 − e −2θt ). (A.13)0Next, let us determine the transition kernel p(x, y; τ). Ifthecurrentpositionoftheprocessis known to be x ∈ Ω, itscurrentdistributionisadeltafunctioncentredatx. Consequently,by the preceding Lemma, the distribution at time τ>0 has expectation value e −θτ x andvariance α(1 − e −2θτ ). We can verify that this distribution is a Gaussian function withprecisely these shape parameters:

A.1. Diffusion in a quadratic potential 47Lemma A.3: With ν := (1−e −2θτ ),thetransitionfunctionp(x, y; τ) is given by a Gaussianp(x, y; τ) =1√2πανexp− (y − e−θτ x) 22αν. (A.14)Proof. We show that p satisfies the Smoluchowski Equation III.6 with initial conditionp(x, · ;0) = δ x . Since the initial condition is clearly fulfilled, we check that the differentialequation holds for all τ>0. In order to do so, we note that θ = b2 = D α mγk B,whereasT α 2= θ. Ontheonehandwefind:D= bα mγ ∂p 2(y − e −θτ∂τ = − x)θe −θτ x2αν − (y − e −θτ x) 2 4αθe −2θτp − 1 4α 2 ν 22 p2πα2θe−2θτ (A.15)2πανb 2=mγk B T p − yxe−θτ − e −2θτ x 2 − yxe −3θτ + x 2 e −4θτ − y 2 e −2θτ +2yxe −3θτ − x 2 e −4θτν 2− p θe−2θτνb 2=mγk B T p − yxe−θτ − e −2θτ x 2 + yxe −3θτ − y 2 e −2θτν 2On the other hand we have:1 ∂∂2(byp)+Dmγ ∂y ∂y p = θpν2 ν − b2 − yxe −θτ )νmγk B T p(y2 ν 2+b 2 − 2yxe −θτ + x 2 e −2θτmγk B T py2 ν 2= −p θe−2θτν++− p θe−2θτν− pθν(A.16)(A.17). (A.18)(A.19)(A.20)b 2 + yxe −θτ + y 2 e −2θτ − yxe −3θτmγk B T p−y2 ν(A.21)b 2 − 2yxe −θτ + x 2 e −2θτmγk B T py2 ν 2= −p θe−2θτνSince all terms cancel, the equation holds.+(A.22)b 2mγk B T p−yxe−θτ + y 2 e −2θτ − yxe −3θτ + x 2 e −2θτν 2 .(A.23)Knowing this, we can prove the statement about the eigenfunctions of the propagator:

48 A. AppendixProof of Lemma III.4. For convenience, we restrict the proof to the case that α =1and leave out all pre-factors. The case i =1was already treated when we discussed thestationary distribution, so let us check the statement for i =2. By partial integration wefind that:dx p(x, y; τ)x exp − x2= −2== e−θτν− e−2θτνdx p(x, y; τ) d dx exp − x22dx e −θτ (y − e−θτ x)νyp(x, y; τ)expdx p(x, y; τ)exp− x22dx p(x, y; τ)x exp− x22 − x22(A.24)(A.25)(A.26). (A.27)Upon multiplication by ν, weseethatthelasttermcancels,andweareleftwith: dx p(x, y; τ)x exp− x2= e −θτ y dx p(x, y; τ)exp− x2= e −θτ y exp22− y2.2(A.28)Having established the first two cases, we can prove the assertion for general i by induction.We will need the recursion relation for Hermite polynomials, which readsH i+1 (x) =xH i (x) − iH i−1 (x),(A.29)and also holds for the eigenfunctions φ i .Furthermore,similarlytothefirstcase,wecanusethat φ i (x) =(−1) i di exp(− x2 ),whichisonewaytodefinetheHermitefunctions. Thedx i 2course of the proof is then quite the same as for i =2: dx p(x, y; τ)φ i+1 (x) =(−1) i+1 dx p(x, y; τ) di+1dx exp − x2(A.30)i+1 2=(−1) i dx e −θτ (y − e−θτ x)p(x, y; τ) diνdx exp − x2(A.31)i 2= e−θτνy dx p(x, y, τ)φ i (x) − e−2θτdx p(x, y; τ)xφ i (x)ν(A.32)= e−θτνλ iyφ i (y) − e−2θτνdx p(x, y; τ)[φ i+1 (x)+iφ i−1 (x)] .(A.33)Again, multiplying the equation by ν eliminates the terms containing φ i+1 and the factore −2θτ .Weareleftwith:dx p(x, y; τ)φ i+1 (x) =e −θτ λ i yφ i (y) − ie −2θτ dxp(x, y; τ)φ i−1 (x) (A.34)= e −θτ λ i yφ i (y) − ie −2θτ λ i−1 φ i−1 (y). (A.35)

A.1. Diffusion in a quadratic potential 49By assumption, we have that e −θτ λ i−1 = λ i . With one more application of the recursionrelation, we arrive at the final result:dx p(x, y; τ)φ i+1 (x) =e −θτ λ i [yφ i (y) − iφ i−1 (y)](A.36)= e −θτ λ i φ i+1 (y), (A.37)which proves both the assertions about the eigenfunctions and the corresponding eigenvalues.The pre-factors are then chosen to assure that the eigenfunctions are normalized.

AcknowledgementsIwouldliketothankallthosewhohavesupportedmewhileIwasworkingonthisthesis.In particular, thank you to Frank Noé, for offering me to work on the subject and for allthe time you spent helping me and giving me advice throughout the last year. I am alsovery grateful to Guillermo Pérez Hernández, who spent long hours helping me with theprogramming and many other problems. Thank you very much, Phillip Berndt, for all of thegreat conversations we had in the past years, and for answering all my questions at any time.Thank you, Tamara Wallenhauer, for all the support and encouragement you had for me, aswell as for your patience with me. I would also like to say thanks to my family and manyother of my friends for helping me through this time.

Bibliography[Behrends, 2011] Ehrhard Behrends,Markovprozesse und stochastische Differentialgleichungen,FU Berlin, 2011,http://page.mi.fu-berlin.de/bhrnds/stochdg2011/[Evans] Lawrence C. Evans,An Introduction to Stochastic Differential Equations,University of California,http://math.berkeley.edu/~evans/SDE.course.pdf[Noé, 2011] Frank Noé,A Variational Approach to Modelling Slow Processes in Stochastic Dynamical Systems,FU Berlin, 2011,http://compmolbio.biocomputing-berlin.de/index.php/publications[Prinz et al, 2011] Jan-Hendrik Prinz, Hao Wu, Marco Sarich, Bettina Keller, Martin Senne,Martin Held, John D. Chodera, Christof Schütte and Frank Noé,Markov models of molecular kinetics: Generation and validation,The Journal of Chemical Physics 134, 174105, 2011[Sarich, Noé, Schütte, 2010] Marko Sarich, Frank Noé and Christof Schütte,On the Approximation Quality of Markov State Models,SIAM Multiscale Model. Simul., 8 : 1154 - 1177, 2010.[Schütte, Huisinga, Deuflhard, Fischer, 1999] Christof Schütte, Alexander Fischer, WilhelmHuisinga and Peter DeuflhardA Direct Approach to Conformational Dynamics based on Hybrid Monte Carlo,J. Comput. Phys., 151 : 146 - 168, 1999[Szabo, Ostlund, 1989] Attila Szabo and Neil S. Ostlund,Modern Quantum Chemistry,1st edition, revised, McGraw-Hill Publishing, New York, 1989[VMD, 1996] W. Humphrey, A. Dalke and K. Schulten,VMD - Visual Molecular Dynamics,J. Molec. Graphics, 14 : 33 - 38.[Weber, 2010] Markus Weber,

54 BibliographyA Subspace Approach to Molecular Markov State Models via an Infinitesimal Generator,ZIB report 09-27, 2010,http://www.zib.de/weber/[Werner, 2002] Dirk Werner,Funktionalanalysis,4th edition, Springer-Verlag, Berlin, 2002[Wikipedia] Ornstein-Uhlenbeck process,http://en.wikipedia.org/wiki/Ornstein-Uhlenbeck_process[Zwanzig, 2001] Robert Zwanzig,Nonequilibrium Statistical Mechanics,1st edition, Oxford University Press, Oxford, 2001

Variational Principles in Conformation Dynamics - FU Berlin, FB MI

Create successful ePaper yourself

Delete template?

Save as template?