11.07.2015 Views

Variational Principles in Conformation Dynamics - FU Berlin, FB MI

Variational Principles in Conformation Dynamics - FU Berlin, FB MI

Variational Principles in Conformation Dynamics - FU Berlin, FB MI

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Freie Universität Berl<strong>in</strong>Institut für MathematikArnimallee 614195 Berl<strong>in</strong>Master’s thesis:A variational approach forconformation dynamicsFeliks NüskeJuly 6, 2012Dr. Frank Noé, <strong>FU</strong> Berl<strong>in</strong>


Ich versichere hiermit an Eides statt, dass ich die vorliegende Masterarbeit selbstständig undohne unzulässige fremde Hilfe erbracht habe. Ich habe ke<strong>in</strong>e anderen als die angegebenenQuellen und Hilfsmittel benutzt sowie wörtliche und s<strong>in</strong>ngemäße Zitate kenntlich gemacht.Die Arbeit hat <strong>in</strong> gleicher oder ähnlicher Form noch ke<strong>in</strong>er Prüfungsbehörde vorgelegen.iii


Table of contentsI. Introduction 3II. The variational pr<strong>in</strong>ciple 5II.1. The transfer operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5II.2. Markov state models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11II.3. <strong>Variational</strong> pr<strong>in</strong>ciple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13II.4. The Ritz method and the Roothan-Hall method . . . . . . . . . . . . . . . 15III.Diffusion processes and application <strong>in</strong> one dimension 21III.1. Brownian motion and stochastic differential equations . . . . . . . . . . . . 21III.2. Diffusion process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23III.3. Numerical considerations ............................ 25III.4. Diffusion <strong>in</strong> a quadratic potential . . . . . . . . . . . . . . . . . . . . . . . 26III.5. Two-well potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29IV.Application to molecules 31IV.1. The example system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31IV.2. Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35IV.3. Application of the Roothan-Hall method ................... 36V. Summary and outlook 43V.1. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43V.1.1. Application to larger systems . . . . . . . . . . . . . . . . . . . . . 43V.1.2. Choice of basis functions . . . . . . . . . . . . . . . . . . . . . . . . 44V.1.3. Numerical stability . . . . . . . . . . . . . . . . . . . . . . . . . . . 44V.1.4. Importance of sampl<strong>in</strong>g . . . . . . . . . . . . . . . . . . . . . . . . 44A. Appendix 45A.1. Diffusion <strong>in</strong> a quadratic potential . . . . . . . . . . . . . . . . . . . . . . . 45Bibliography 53


I. IntroductionStochastic dynamical systems play an important role <strong>in</strong> many branches of mathematicsand applications. Loosely speak<strong>in</strong>g, such a system is a superposition of a determ<strong>in</strong>isticdynamical system with random fluctuations, often called “noise“. A typical example is thesimulation of molecules (molecular dynamics, MD). In pr<strong>in</strong>ciple, one could model a moleculeas a classical physical system by assign<strong>in</strong>g to each atom its three spatial and three momentumcoord<strong>in</strong>ates and propagate these coord<strong>in</strong>ates us<strong>in</strong>g some dynamical model. S<strong>in</strong>ce the moleculecan typically exchange heat with its surround<strong>in</strong>gs, energy is not conserved, which means thatclassical Hamiltonian dynamics is not a suitable dynamical model. Instead, the exchange ofheat is usually modelled by some random <strong>in</strong>fluence. As an example, one can use Smoluchowskidynamics: If we denote the number of atoms by N and the 3N-dimensional position vectorby x, thetimeevolutionisgivenbythestochasticdifferentialequationẋ(t) = 1mγ ∇V (x(t)) + √ 2DdW t .(I.1)Here, V is the potential energy function of the system, D is the diffusion constant, γ denotesfriction and W t is 3N-dimensional Brownian motion. We will come back to this model <strong>in</strong>more detail at a later po<strong>in</strong>t, but note that this equation def<strong>in</strong>es a stochastic process <strong>in</strong>steadof a determ<strong>in</strong>istic dynamical system. Instead of ask<strong>in</strong>g for a determ<strong>in</strong>istic position <strong>in</strong> the3N-dimensional state space, we ask for the probability to f<strong>in</strong>d the system <strong>in</strong> a certa<strong>in</strong> regionof the state space.Molecular systems very often display so called conformations, ormetastablestates. Thismeans that while the system still oscillates and fluctuates, the overall geometry rema<strong>in</strong>s thesame for long times, [Schütte, Huis<strong>in</strong>ga, Deuflhard, Fischer, 1999]. Only occasionally, transitionsfrom one conformation to another can be observed. An example of this phenomenonis shown <strong>in</strong> Figure I.1. A quantity of <strong>in</strong>terest would be the average wait<strong>in</strong>g time until suchatransitionoccurs,becauseitwouldhelptounderstandtheoverallbehaviourofsuchamolecule.It has also been shown by [Schütte, Huis<strong>in</strong>ga, Deuflhard, Fischer, 1999] thatthetimeevolutionof probability densities described above can be computed by the action of a l<strong>in</strong>ear<strong>in</strong>tegral operator, called the propagator. Moreover, the spectral properties of this operator


4 I. IntroductionFigure I.1.: An <strong>in</strong>ternal coord<strong>in</strong>ate of a small molecule transition<strong>in</strong>g between two differentconformations, [Weber, 2010, ch.1.1].can be related to average wait<strong>in</strong>g times, the quantities of <strong>in</strong>terest. Therefore, estimat<strong>in</strong>geigenvalues and eigenfunctions of the propagator is an <strong>in</strong>terest<strong>in</strong>g task. Markov state models,[Pr<strong>in</strong>z et al, 2011], have been frequently used to this end. The ma<strong>in</strong> idea is to discretizethe state space <strong>in</strong>to a f<strong>in</strong>ite number of sets, and then approximate the true propagator byaf<strong>in</strong>ite-dimensionalmatrixoperator. Theeigenvaluesofthismatrixcanbeusedasanapproximationof the real eigenvalues. However, the quality of such an approximation largelydepends on the choice of discretization. F<strong>in</strong>d<strong>in</strong>g a good discretization requires the use ofcluster<strong>in</strong>g techniques to group the data. In the follow<strong>in</strong>g, we present a different approach toapproximat<strong>in</strong>g the dom<strong>in</strong>ant eigenvalues and eigenfunctions, not based on partition of unitymembership functions, but on the use of smooth functions <strong>in</strong> comb<strong>in</strong>ation with a variationalpr<strong>in</strong>ciple. The ma<strong>in</strong> idea is to use the shape of the molecular energy function to choose thebasis functions needed for the approximation. We hope that this can be done without theapplication of a cluster<strong>in</strong>g method to sampled data and thus help to avoid many numerical<strong>in</strong>stabilities. We will develop the theory <strong>in</strong> chapter II, thenapplythemethodtoonedimensionalexamples of a diffusion process <strong>in</strong> chapter III, andf<strong>in</strong>allytackleahigherdimensionalproblem <strong>in</strong> chapter IV. In the end, we hope that this might lead to a robust and computationallyaffordable method to compute eigenvalues and relevant time scales of stochasticprocesses stemm<strong>in</strong>g from real world examples.


II.The variational pr<strong>in</strong>cipleII.1. The transfer operatorWe consider a time-dependent stochastic process X t , t ≥ 0, onausuallyhigh-dimensionaland cont<strong>in</strong>uous state space Ω. LettheprocesssatisfytheMarkov property, thatis,forallx 0 ,x 1 ,...,x n and y ∈ Ω, t 0 >t 1 >...>t n ≥ 0 and τ>0, wehaveP (X t0 +τ = y|X t0 = x 0 ,X t1 = x 1 ,...,X tn = x n )=P (X t0 +τ = y|X t0 = x 0 ) .(II.1)In other words, the future behaviour of the process only depends on the current state andnot on the past. A more detailed and more precise formulation of the Markov property canbe found <strong>in</strong> [Behrends, 2011, ch. 2]. Letusconsiderprocesseswhicharetime-homogeneous,mean<strong>in</strong>g that the conditional probability P(X t0 +τ = y|X t0 = x) <strong>in</strong> the above equation doesnot depend on t 0 ,butonlyonthetimedifferenceτ and on the positions x, y. Inthiscase,the process allows us to def<strong>in</strong>e a function p:p(x, y; τ) :=P(X t+τ = y|X t = x) =P(X τ = y|X 0 = x).(II.2)We shall call this function the transition kernel. Itenablesustounderstandtheevolutionofprobability densities <strong>in</strong> time. Suppose that, at time t =0,thesystemisdistributedaccord<strong>in</strong>gto a distribution ρ 0 .Theprobabilitythat,aftertimeτ>0, thesystemtransitionedfromapo<strong>in</strong>t x to another po<strong>in</strong>t y, isgivenbytheproductoftheprobabilityofbe<strong>in</strong>gatx at timet =0and the conditional probability to go from x to y <strong>in</strong> time τ. Consequently,thedensityρ τ belong<strong>in</strong>g to time τ, evaluatedaty, isobta<strong>in</strong>edfrom<strong>in</strong>tegrat<strong>in</strong>gthisproductoverallpossible states x:ρ τ (y) = dx p(x, y; τ)ρ 0 (x).(II.3)ΩFurthermore, we assume the process to be sufficiently ergodic, whichmeans<strong>in</strong>essencethatthe system cannot be decomposed <strong>in</strong>to two dynamically <strong>in</strong>dependent components, and everystate of the system will be visited <strong>in</strong>f<strong>in</strong>itely often over an <strong>in</strong>f<strong>in</strong>ite run of the process. In thiscase, there is a unique probability distribution µ :Ω→ R, whichis<strong>in</strong>variant<strong>in</strong>time,i.e.dx p(x, y; τ)µ(x) =µ(y),(II.4)Ω


6 II. The variational pr<strong>in</strong>cipleregardless of τ, [Sarich, Noé, Schütte, 2010, ch.2.2]Thefunctionµ is called the <strong>in</strong>variantmeasure, the<strong>in</strong>variant density or the stationary distribution of the process.Also, we assume the transition kernel to satisfy the detailed balance condition, i.e.µ(x)p(x, y; τ) =µ(y)p(y, x; τ) ∀x, y ∈ Ω. (II.5)Detailed balance is an important assumption that is justified <strong>in</strong> many applications fromphysics. In a system not satisfy<strong>in</strong>g Equation II.5, therewouldbeapathwaywhereonedirectionis preferred compared to the other. This would allow the process to produce work,that is, to convert thermal energy <strong>in</strong>to work. This is impossible <strong>in</strong> physical systems due tothe second law of thermodynamics, [Pr<strong>in</strong>z et al, 2011, sec. 2.A].We see that Equation II.3 def<strong>in</strong>es the action of a lag time dependent <strong>in</strong>tegral operator P(τ)on suitable probability densities ρ. Let us call it the propagator of the system, which hasthe follow<strong>in</strong>g important property:Lemma II.1 (Chapman-Kolmogorov equation): For any τ 1 ,τ 2 > 0, wehaveP(τ 1 + τ 2 )=P(τ 1 )P(τ 2 ).(II.6)Proof. First, we observe that the Markov property and time-homogeneity imply for x, z ∈Ω,1p(x, z; τ 1 + τ 2 )=P(X τ1 +τ 2= z|X 0 = x) =P(X 0 = x) P(X τ 1 +τ 2= z,X 0 = x) (II.7)1=dy P(X τ1 +τP(X 0 = x)2= z|X τ2 = y)P(X τ2 = y, X 0 = x) (II.8)Ω= dy p(y, z; τ 1 ) P(X τ 2= y, X 0 = x)= dy p(y, z; τ 1 )p(x, y; τ 2 ).ΩP(X 0 = x)Ω(II.9)Us<strong>in</strong>g this, we show that for a function ρ,P(τ 1 + τ 2 )ρ(z) = dx p(x, z; τ 1 + τ 2 )ρ(x) ==ΩΩΩdxdy p(y, z; τ 1 )P(τ 2 )ρ(y) =P(τ 1 )P(τ 2 )ρ(z).dy p(y, z; τ 1 )p(x, y; τ 2 )ρ(x)(II.10)(II.11)


II.1. The transfer operator 7Let us additionally assume that P(τ)ρ → ρ for all suitable functions ρ, andletusdef<strong>in</strong>eP(0) := id. This means that Lemma II.1 is also valid for all τ 1 ,τ 2 ≥ 0. Before we go onto study the propagator, let us <strong>in</strong>troduce a slight modification to it. Denot<strong>in</strong>g by L 2 µ(Ω)the space of all functions v : Ω → R which are square-<strong>in</strong>tegrable with respect to theweight function µ, i.e. Ω v2 (x)µ(x)dx is f<strong>in</strong>ite, with the weighted scalar-product v | w µ=dx v(x)w(x)µ(x), wemakethefollow<strong>in</strong>gdef<strong>in</strong>ition:ΩDef<strong>in</strong>ition II.2 (Transfer operator): The transfer operator T (τ) act<strong>in</strong>g on a functionu ∈ L 2 µ(Ω) is def<strong>in</strong>ed by:T (τ)u(y) := 1 dx p(x, y; τ)µ(x)u(x).µ(y) Ω(II.12)The doma<strong>in</strong> of def<strong>in</strong>ition is chosen to have this operator act on a Hilbert space. We notethe most important properties of T (τ):Lemma II.3: If the transition kernel p(x, y; τ) is given by a smooth and bounded probabilitydensity, the transfer operator T (τ) is l<strong>in</strong>ear, bounded, compact and self-adjo<strong>in</strong>t.Proof. L<strong>in</strong>earity follows directly from the def<strong>in</strong>ition. In order to prove boundedness, let usfirst use detailed balance to f<strong>in</strong>d:T (τ)v |T(τ)v µ= dy [T (τ)v(y)] 2 µ(y)(II.13)==ΩΩΩ21dy dx p(x, y; τ)µ(x)v(x)µ(y) (II.14)µ(y)Ω2dy 12µ(y)Ωµ(y)2 dx p(y, x; τ)v(x). (II.15)Before we proceed, we make the follow<strong>in</strong>g general observation.density on Ω and f :Ω→ R afunction,wehaveIf π is some probability0 ≤ ω 2 [f] π= E f 2 π − (E [f] π )2 , (II.16)where we use ω 2 [ · ] πto denote the variance of f with respect to the distribution π. Hence,dx f(x)π(x)Ω 2=(E [f] π) 2 ≤ E f 2 π = dx f 2 (x)π(x).Ω(II.17)


8 II. The variational pr<strong>in</strong>cipleIn Equation II.15, p(y, x; τ) is, by assumption, a smooth probability density <strong>in</strong> x. We canconsequently apply the above estimate to the squared <strong>in</strong>tegral and obta<strong>in</strong> T (τ)v |T(τ)v µ≤ dy µ(y) dx p(y, x; τ)v 2 (x)(II.18)ΩΩ= dx v 2 (x) dy µ(y)p(y, x; τ) = dx v 2 (x)µ(x) =v 2 µ. (II.19)ΩΩWe have thus found that T (τ) is a well-def<strong>in</strong>ed and bounded operator on L 2 µ(Ω) withoperator norm less or equal to one. By <strong>in</strong>sert<strong>in</strong>g the constant function 1 which satisfies1(x) =1 ∀x ∈ Ω, Equation II.4 immediately yields that T (τ)1 = 1 and consequently,T (τ) =1. As p(x, y; τ) is bounded for all x, y, wealsohavep(x, y; τ) ∈ L 2 µ×µ(Ω × Ω),therefore T (τ) is a Hilbert-Schmidt operator and thus compact, [Werner, 2002, ch. 6,p.272]. F<strong>in</strong>ally, self-adjo<strong>in</strong>tness also follows from the detailed balance condition:T (τ)u | v µ= dy 1 dx p(x, y; τ)µ(x)u(x) µ(y)v(y)(II.20)Ω µ(y) Ω = dx dy p(y, x; τ)µ(y)u(x)v(y)(II.21)Ω Ω= dx 1 dy p(y, x; τ)µ(y)v(y) µ(x)u(x) =u |T(τ)vΩ µ(x)µ. (II.22)ΩΩThe transfer operator def<strong>in</strong>es a compact and self-adjo<strong>in</strong>t map on the Hilbert space L 2 µ(Ω).From functional analysis, we know that it has rich spectral properties, see [Werner, 2002,ch. 6, p. 241] : The spectrum is real-valued and, because of T (τ) =1,conta<strong>in</strong>edwith<strong>in</strong>the <strong>in</strong>terval [−1, 1]. All non-zero spectral values are discrete and are <strong>in</strong>deed eigenvalueswith correspond<strong>in</strong>g eigenfunctions. The eigenvalues can be ordered <strong>in</strong>to a series λ i withλ i → 0 as i → 0. The Hilbert space L 2 µ(Ω) can be decomposed <strong>in</strong>to the direct sum ofthe eigenfunctions’ l<strong>in</strong>ear span and the operator’s kernel. If one extends the l<strong>in</strong>ear span byan orthonormal basis of the kernel, we obta<strong>in</strong> an orthonormal basis ψ i of the full Hilbertspace such that the action of T (τ) completely decomposes <strong>in</strong>to the action on each of thesefunctions. If we write v ∈ L 2 µ(Ω) as v = ∞i=1 v | ψ i µψ i ,wethenf<strong>in</strong>dthatT (τ)v =∞v | ψ i µT (τ)ψ i =i=1∞λ i v | ψ i µψ i ,i=1(II.23)where possibly λ i =0holds from some <strong>in</strong>dex i on. As we have already noticed <strong>in</strong> the courseof prov<strong>in</strong>g Lemma II.3, λ 1 =1is an eigenvalue with correspond<strong>in</strong>g eigenfunction ψ 1 = 1. Inmany applications, it turns out that this eigenvalue is simple and dom<strong>in</strong>ant, mean<strong>in</strong>g that itsmultiplicity is one and that −1 is not an eigenvalue of T (τ). Additionally, there usually is anumber of further eigenvalues 1 >λ 2 >...>λ m ,whicharestrictlysmallerbut”close”toone, whereas the rema<strong>in</strong><strong>in</strong>g spectrum is conta<strong>in</strong>ed <strong>in</strong> a ball around zero which is essentiallybounded away from these eigenvalues, see aga<strong>in</strong> [Sarich, Noé, Schütte, 2010, ch2.2]. Let


II.1. The transfer operator 9us from now on assume this to be the case, and let us call these eigenvalues the dom<strong>in</strong>anteigenvalues and their correspond<strong>in</strong>g functions ψ 2 ,...,ψ m the dom<strong>in</strong>ant eigenfunctions.Consider now the map M : L 2 µ(Ω) → L 2 µ(Ω) def<strong>in</strong>ed by Mψ(x) :=µ(x)ψ(x) for any−1function ψ ∈ L 2 µ(Ω). Here, L 2 µ(Ω) is the L 2 -space with weight function µ −1 (x) = 1 ,−1 µ(x)where we assumed the stationary density to be non-zero, hav<strong>in</strong>g <strong>in</strong> m<strong>in</strong>d that <strong>in</strong> our latercontexts, it will always be given by the Boltzmann distribution. M is l<strong>in</strong>ear and well-def<strong>in</strong>ed,<strong>in</strong>deed, if ψ ∈ L 2 µ(Ω), thenMψ |Mψ µ −1 = dx Ω µ(x)ψ(x)µ(x)ψ(x)µ−1 (x) =ψ 2 µ.Moreover, M is a bijective map, which can rapidly be checked, and because ofMψ | φ µ −1 = dx µ(x)ψ(x)φ(x)µ −1 (x) = ψ |M −1 φ , (II.24)µΩM is also unitary. Look<strong>in</strong>g back at Equation II.3 and Def<strong>in</strong>ition II.2, we f<strong>in</strong>d that onL 2 µ(Ω), P(τ) =MT (τ)M −1 . The propagator differs from the transfer operator by no−1more than a unitary transformation, and it thus <strong>in</strong>herits the self-adjo<strong>in</strong>tness and compactnessproperties we have just derived for T (τ). Inparticular,ifψ i ∈ L 2 µ(Ω) is a transfer operatoreigenfunction, then φ i := Mψ = µψ is a propagator eigenfunction correspond<strong>in</strong>g to thesame eigenvalue, and vice versa. The Hilbert space L 2 µ(Ω) decomposes <strong>in</strong> the same way−1as L 2 µ(Ω) does, only replac<strong>in</strong>g the orthonormal basis {ψ i } by {φ i }.Example 1: If Ω={x 1 ,...,x N } is f<strong>in</strong>ite, it is enough to specify the conditional transitionprobabilities p ij (τ) =P(X τ = x j |X 0 = x i ). A probability density simply becomes a vectorp τ =(p 1 (τ),...,p N (τ)), assign<strong>in</strong>gaprobabilityp i (τ) to each state x i ,andsatisfy<strong>in</strong>gNi=1 p i(τ) =1.Probabilityvectorsarepropagated<strong>in</strong>timebymultiplicationwiththematrixP(τ) =(p ij (τ)) from the left, s<strong>in</strong>cep i (τ) =P(X τ = x i )=NP(X τ = x i |X 0 = x j )P(X 0 = x j )=j=1Np ji (τ)p j (0).j=1(II.25)Consequently, p τ = p 0 P(τ). Thel<strong>in</strong>earpropagatorsimplybecomesthematrixP(τ) T .Theeigenfunctions φ i are the left-hand eigenvectors of the matrix P(τ), <strong>in</strong>particular,thestationarydistribution is a vector π := φ 1 ,whichsatisfiesπ P(τ) =π. S<strong>in</strong>cethetransformationM acts on a function by po<strong>in</strong>t wise multiplication with the stationary density, M can bewritten as a matrix Π, whichisdiagonalandcarriesthevectorπ on its diagonal. S<strong>in</strong>ce, withthis notation, reversibility can be written as ΠP(τ) =P(τ) T Π,wef<strong>in</strong>dthatT (τ) =M −1 P(τ)M =Π −1 P(τ) T Π=Π −1 ΠP(τ) =P(τ).(II.26)We see that propagator eigenvectors, which are left-hand eigenvectors of the P-matrix, becometransfer operator eigenvectors upon transpos<strong>in</strong>g and multiplication with Π −1 ,andthusbecome right-hand eigenvectors of the P-matrix. In the f<strong>in</strong>ite-dimensional case, the differencebetween the propagator and the transfer operator is just a matter of left-hand andright-hand eigenvectors of the same matrix.


10 II. The variational pr<strong>in</strong>cipleThe spectral decomposition, the Chapman-Kolmogorov equation Lemma II.1 and the assumptionP(τ)φ → φ for all φ ∈ L 2 µ −1 (Ω) show us that the i-th eigenvalue λ i (τ) is acont<strong>in</strong>uous function of τ and satisfies λ i (τ 1 +τ 2 )=λ i (τ 1 )λ i (τ 2 ) as well as λ i (0) = 1. Therefore,it can be written as λ i (τ) =e −κ iτ for some κ i > 0, thatis,ithasanexponentialdecayrate. Clearly, κ 1 =0and κ 2 ,...,κ m are close to zero. Look<strong>in</strong>g at the action of P(τ) onsome function ρ once more shows thatP(τ)ρ =∞ρ | φ i µ −1 P(τ)φ i =i=1∞e −κiτ ρ | φ i µ −1 φ i .i=1(II.27)If the time lag τ is very close to zero, almost all terms <strong>in</strong> the above sum will contribute. Ifτ becomes larger, however, there is a range of time lags τ where, for i>m,allexponentialdecay terms e −κiτ have essentially vanished, whereas all terms with i ≤ m still contribute. Ifτ <strong>in</strong>creases even further, namely if τ 1 κ 2,therearenocontributionsexcepttheveryfirstterm. For a probability distribution ρ, wehaveρ | φ 1 µ −1 = ρ | µ µ −1 =1,soP(τ)ρ ≈ µ.The system is then distributed accord<strong>in</strong>g to its equilibrium distribution and has basically“forgotten“ the <strong>in</strong>itial deviation from equilibrium expressed by ρ. Eachdom<strong>in</strong>anteigenvaluedef<strong>in</strong>es a time scale 1 κ iand a correspond<strong>in</strong>g slow process which equilibrates only if one waitsmuch longer than this time scale. The slow processes are the l<strong>in</strong>k between eigenvalues andmetastable states. A slow process usually corresponds to an equilibration process between twometastable regions, and the implied time scale 1 κ i= − τlog λ idef<strong>in</strong>es an average transition(τ)time. For this reason, the approximation of dom<strong>in</strong>ant eigenvalues is of such importance.Example 2: Consider a four state system with stochastic transition matrix⎛⎞0.7571 0.2429 0 0P= ⎜ 0.1609 0.8306 0.0085 0⎟⎝ 0 0.0084 0.8294 0.1622 ⎠ .0 0 0.2413 0.7587(II.28)Clealy, most of the transitions will occur either between states x 1 and x 2 or between x 3 andx 4 ,whereasatransitionfromx 2 to x 3 will be a very rare event. Grouped together, x 1 , x 2 andx 3 , x 4 form two metastable states. The eigenvalues are λ 1 =1, λ 2 =0.9900, λ 3 =0.5964and λ 4 =0.5894. Clearly, there is one dom<strong>in</strong>ant eigenvalue related to the slow transitionprocess between the two metastable states.Example 3: In chapter 2, we will discuss the diffusion process of a one-dimensional particle<strong>in</strong> an energy landscape V .TheparticleissubjecttoaforceF ,whichequalsthederivativeofthe energy function, but is additionally perturbed by random fluctuations. For the potentialfunction V shown <strong>in</strong> Figure II.1a, theparticlewillspendmostofthetimearoundoneofthetwo m<strong>in</strong>ima of V ,andwilljustoccasionallycrossthebarrierseparat<strong>in</strong>gthem. Figure II.1bshows a trajectory of the process, where the metastable behaviour is clearly visible. This isanother typical example of metastable states, we will <strong>in</strong>vestigate it <strong>in</strong> more detail <strong>in</strong> the next


II.2. Markov state models 11524.51.543.5130.5V(x)2.5X t02−0.51.51−10.5−1.50−1.5 −1 −0.5 0 0.5 1 1.5x−20 2 4 6 8 10Time step tx 10 4(a)(b)Figure II.1.: The potential energy function V display<strong>in</strong>g two metastable states around x = −1and x =1,andasampletrajectoryofthecorrespond<strong>in</strong>gdiffusionprocess.chapter.II.2. Markov state modelsMarkov state models (MSMs), [Pr<strong>in</strong>z et al, 2011], are a widely used technique to approximatethe eigenvalues of stochastic processes. The idea is to partition the state space <strong>in</strong>tof<strong>in</strong>itely many disjo<strong>in</strong>t sets A 1 ,...,A s with si=1 A i =Ω,andonlyconsiderthetransitionsbetween any two of these sets. This means that <strong>in</strong>stead of notic<strong>in</strong>g every transition betweenany two po<strong>in</strong>ts of the full cont<strong>in</strong>uous state space, we only take care of transitions betweenthe sets A i . Let us compute the conditional transition probability p ij (τ) between two setsA i and A j ,overtimeτ:p ij (τ) =P(X τ ∈ A j |X 0 ∈ A i )= P(X τ ∈ A j ,X 0 ∈ A i )P(X 0 ∈ A i )=A i(II.29)A jdx dy µ(x)p(x, y; τ). (II.30)A idx µ(x)The matrix P(τ) with entries p ij (τ) is a row-stochastic matrix which def<strong>in</strong>es a Markov processon the f<strong>in</strong>ite state space A 1 ,...,A s ,anditiscalledtheMSMtransfermatrix. Probabilitydensities on this space, i.e. probability vectors p =(p 1 ,...,p s ) with si=1 p i =1,aretransported by multiplication with P(τ) from the left, i.e. p τ = p 0 P(τ). As an approximationto the true eigenvalues of the cont<strong>in</strong>uous propagator, one simply computes the eigenvalues ofP(τ). Tothisend,onehastof<strong>in</strong>dthematrixentriesp ij (τ). Theseareusuallyestimatedfrom


12 II. The variational pr<strong>in</strong>ciplesimulation trajectories: Let X 0 ,...,X N be a trajectory of length N, withasimulationtimestep ∆t = τ. One estimates µ i := A idx µ(x) as the number of simulation steps X k withX k ∈ A i ,dividedbyN. Thetransitionprobability A iA jdx dy µ(x)p(x, y; τ) is estimatedby the number of transitions from A i to A j ,dividedbythetotalnumberoftransitions,<strong>in</strong>this case N − 1. Inpractice,itisoftenusefultochooseτ as some general <strong>in</strong>teger multiple ofthe simulation w<strong>in</strong>dow ∆t, i.e. τ = n∆t, withn not necessarily equal to one. In this case,only transitions over n simulation time steps are taken <strong>in</strong>to account, which usually leads toahigherapproximationquality,asweshallseelater.It is very important to note that the Markov process def<strong>in</strong>ed by the MSM transfer matrixis an approximation of the cont<strong>in</strong>uous dynamics: The def<strong>in</strong>ition of the transfer matrix <strong>in</strong>Equation II.30 overlooks all transitions with<strong>in</strong> one of the sets A i . But these transitions areimportant, because the probability to cross over to some other set A j very much depends onthe current position with<strong>in</strong> set A i .Mak<strong>in</strong>guseoftheMSMthereforeautomaticallyresults<strong>in</strong>asystematicerrorthateffectstheaccuracyoftheestimatedeigenvalues.Itisquiteeasytoshow that this error decays exponentially proportional to the second eigenvalue. In their 2010paper, Sarich, Noé and Schütte show that the pre-factor can be split up <strong>in</strong>to two <strong>in</strong>dependentcomponents. One of them depends on the lag time only, whereas the other one comes fromthe choice of discretization, [Sarich, Noé, Schütte, 2010, theorem3.1].Thediscretizationistherefore of great importance, and a good choice can <strong>in</strong>crease the approximation quality alot.Example 4: The four state system Equation II.28 shows the MSM approximation of thediffusion process shown <strong>in</strong> Figure II.1. Wechosethefoursetstobethe<strong>in</strong>tervals[−2, −1],[−1, 0], [0, 1] and [1, 2]. ThetransitionmatrixP=P(τ) was estimated from a sample trajectoryof 20 million steps. The lag time dependent improvement of the approximation qualitycan be visualized by look<strong>in</strong>g at the second implied time scale t 2 = −τlog(λ 2. The faster(τ))the estimated time scale converges, the higher the quality of the approximation. Figure II.2shows how the estimated time scale improves upon <strong>in</strong>creas<strong>in</strong>g the time lag. The four statediscretization is a very simple one. We will achieve much faster convergence towards thecorrect time scale <strong>in</strong> the next chapter.We would like to make use of two more aspects emphasized <strong>in</strong> this work. First, the constructionof the MSM can be seen as the projection of the transfer operator T (τ) onto af<strong>in</strong>ite subspace of the Hilbert space L 2 µ(Ω), namelythel<strong>in</strong>earspanof<strong>in</strong>dicatorfunctionscorrespond<strong>in</strong>g to the sets A i .Second,aspo<strong>in</strong>tedout<strong>in</strong>chapter3.4ofthatarticle,itisnotnecessary to restrict this projection to <strong>in</strong>dicator functions, but the error estimate is still trueif T (τ) is projected onto a much more general subspace of L 2 µ(Ω). This is, <strong>in</strong> a sense, thestart<strong>in</strong>g po<strong>in</strong>t of our work. Know<strong>in</strong>g that the discretization is crucial to the approximationquality, we will try to use the l<strong>in</strong>ear span of smooth functions as the subspace referred toabove, and hope that this might work equally well or even better <strong>in</strong> some cases. In order toformulate our method, we will need a variational pr<strong>in</strong>ciple, which will be derived <strong>in</strong> the nextsection.


II.3. <strong>Variational</strong> pr<strong>in</strong>ciple 133.532.5Implied time scale t221.510.500 20 40 60 80 100 120 140 160 180 200Time lag τFigure II.2.: Improvement of estimated time scale t 2 upon <strong>in</strong>creas<strong>in</strong>g the lag time τ, displayed<strong>in</strong> multiples of the simulation length.II.3. <strong>Variational</strong> pr<strong>in</strong>cipleWe start with the follow<strong>in</strong>g def<strong>in</strong>ition:Def<strong>in</strong>ition II.4: Let f and g be observables of the state space, i.e. functions f,g :Ω→ R.The autocorrelation function acf(f,g; τ) is def<strong>in</strong>ed by: acf(f,g; τ) := dx dy f(x)p(x, y; τ)µ(x)g(y).(II.31)Ω ΩThe autocorrelation function of f and g is the expectation value of the product f(x)g(y)with respect to the transition density µ(x)p(x, y; τ). Furthermore,weseefromEquation II.31that it also equals the Rayleigh coefficient T (τ)f | g µ= f |T(τ)g µ=: f |T(τ) | g,because off |T(τ) | g = T (τ)f | g µ= dy 1 dx p(x, y; τ)µ(x)f(x) µ(y)g(y) (II.32)Ω µ(y) Ω = dx dy f(x)p(x, y; τ)µ(x)g(y) =acf(f,g; τ).(II.33)ΩΩ


14 II. The variational pr<strong>in</strong>cipleFor a symmetric operator like T (τ), theRayleighcoefficientisalsocalledmatrix element orsimply expectation value. Thenexttworesultswillrevealtousthesignificanceofautocorrelationfunctions when it comes to approximat<strong>in</strong>g eigenvalues and -functions, [Noé, 2011,sec. 2.3]:Lemma II.5: The transfer operator’s eigenfunctions ψ i satisfy the relation:acf(ψ i ,ψ i ; τ) =λ i .(II.34)Proof. All we have to do is recognize the action of T (τ) on its eigenfunctions and use theorthogonality relation between those:acf(ψ i ,ψ i ; τ) =T (τ)ψ i | ψ i µ= λ i ψ i | ψ i µ= λ i .(II.35)Theorem II.6 (<strong>Variational</strong> pr<strong>in</strong>ciple): For any function ψ ∈ L 2 µ(Ω) which is orthogonalto ψ 1 ,wehave:acf(ψ, ψ; τ) ≤ λ 2 .(II.36)Proof. First, expand ψ <strong>in</strong> terms of the orthonormal basis {ψ i }.Notethatthereisnooverlapwith ψ 1 :acf(ψ, ψ; τ) =T (τ)ψ | ψ µ=∞c i c j T (τ)ψ i | ψ j µ,i,j=2(II.37)where c i = ψ | ψ i µ.Like<strong>in</strong>thepreviousproof,weusetheactionofthetransferoperatoron its eigenfunctions:acf(ψ, ψ; τ) =∞c i c j λ i ψ i | ψ j µ=i,j=2∞c 2 i λ i .i=2(II.38)F<strong>in</strong>ally, we recall that the eigenvalues are sorted <strong>in</strong> decreas<strong>in</strong>g order, and that <strong>in</strong> an orthonormalbasis expansion, the squares of the coefficients sum up to unity:acf(ψ, ψ; τ) ≤ λ 2∞i=2c 2 i = λ 2 .(II.39)


II.4. The Ritz method and the Roothan-Hall method 15Remark II.7: Likewise, we can prove that for any function ψ which is orthogonal to thefirst k eigenfunctions, the expression acf(ψ, ψ; τ) is bounded from above by λ k+1 .Among all candidate functions, namely those which are orthogonal to the first eigenfunctionψ 1 ,thetruesecondeigenfunctionmaximizestheRayleighcoefficient.Forsomefunctionψ,T (τ)ψ | ψ µcan be viewed as a measure of how well it approximates the second eigenfunction,and that is what we will be try<strong>in</strong>g to do: Maximize the Rayleigh coefficient amonganumberofansatzfunctions,andcomparedifferentchoicesofcandidatefunctions. <strong>Variational</strong>methods of that sort are used <strong>in</strong> a number of different fields as well, e.g. for theapproximation of electronic wavefunctions <strong>in</strong> quantum mechanics. In the next section, weshow how the Rayleigh coefficient can be maximized with<strong>in</strong> the l<strong>in</strong>ear span of some giventest functions.II.4. The Ritz method and the Roothan-Hall methodFor all of the upcom<strong>in</strong>g results, we follow [Szabo, Ostlund, 1989, ch.1.3].Theorem II.8 (Ritz method): Let χ 1 ,...,χ m ∈ L 2 µ(Ω) be mutually orthonormal. TheRayleigh coefficient is maximized among these function by the eigenvector b 1 correspond<strong>in</strong>gto the greatest eigenvalue ξ 1 of the matrix eigenvalue problemH b 1 = ξ 1 b 1 ,(II.40)where H is the density matrix with entries h ij =acf(χ i ,χ j ; τ) =ΩΩdx dy χ i (x)p(x, y; τ)µ(x)χ j (y).(II.41)More precisely, the maximal Rayleigh coefficient is found by tak<strong>in</strong>g the l<strong>in</strong>ear comb<strong>in</strong>ation ofthe functions χ i with coefficients taken from b 1 .Proof. Let us first, for a function ˆψ = mi=1 b iχ i ,expresstheRayleighcoefficient<strong>in</strong>termsof the b i :acf( ˆψ, ˆψ;τ) =ˆψ |T(τ) | ˆψ =mb i b j χ i |T(τ) | χ j =i,j=1mb i b j h ij .i,j=1(II.42)


16 II. The variational pr<strong>in</strong>cipleS<strong>in</strong>ce the matrix-elements h ij are fixed, acf( ˆψ, ˆψ; τ) can be seen as a differentiable functionof the coefficients b i .Wenowneedtomaximizethisfunctionwithrespecttoaconstra<strong>in</strong>t,namely, that the solution has unit norm:1= ˆψ mm2 = ˆψ | ˆψ = b i b j χ i | χ j µ= b 2 i ,(II.43)µi,j=1as the basis functions χ i were assumed to be orthonormal. In summary, we need to maximizethe functionmmF (b 1 ,...,b m )= b i b j h ij − ξ( b 2 i − 1),(II.44)i,j=1where ξ is a Lagrange multiplier, with respect to the coefficients b i . S<strong>in</strong>ce the H-matrix issymmetric, this results <strong>in</strong> the equations:0= ∂F m=2 h ij b j − 2ξb i , i =1,...,m, (II.45)∂b i⇒ ξb i =j=1i=1mh ij b j , i =1,...,m. (II.46)j=1We have thus arrived at the eigenvalue equation H b = ξb. S<strong>in</strong>ceH is a symmetric matrix,we can f<strong>in</strong>d m solutions b i correspond<strong>in</strong>g to real eigenvalues ξ i . If we now compute theRayleigh coefficient of the function ˆψ i = mj=1 b i,jχ j generated from such a solution, wef<strong>in</strong>d:ˆψi |T(τ) | ˆψ mmi = b i,j b i,k χ j |T(τ) | χ k = b i,j b i,k h jk (II.47)=j,k=1mξ i b 2 i,j = ξ i ,j=1i=1j,k=1(II.48)where we used the eigenvalue relation mk=1 b i,kh jk = ξ i b i,j .Thus,thesolutioncorrespond<strong>in</strong>gto the largest eigenvalue ξ 1 maximizes the Rayleigh coefficient among the basis functions χ i ,as we claimed.Lemma II.9: The second largest eigenvalue ξ 2 of Equation II.40 satisfies ξ 2 ≤ λ 2 .Proof. Let ˆψ 1 and ˆψ 2 be the functions generated from the eigenvectors b 1 , b 2 and theireigenvalues ξ 1 , ξ 2 ,respectively. Consideral<strong>in</strong>earcomb<strong>in</strong>ation ˆψ := x ˆψ 1 + y ˆψ 2 ,forx, y ∈ R.If ˆψ is normalized, we f<strong>in</strong>d, by orthonormality: 1=ˆψ | ˆψ = x 2 ˆψ1 | ˆψ 1 + y 2 ˆψ2 | ˆψ 2 +2xy ˆψ1 | ˆψ2 = x 2 + y 2 . (II.49)µµµµ


II.4. The Ritz method and the Roothan-Hall method 17Moreover, as ˆψ 1 and ˆψ 2 diagonalize the H-matrix:ˆψ1 |T(τ) | ˆψ mm2 = b 1,i b 2,j χ i |T(τ) | χ j = b 1,i b 2,j h ij=i,j=1mξ 2 b 1,i b 2,i =0.i=1i,j=1(II.50)(II.51)Therefore, comput<strong>in</strong>g the Rayleigh coefficient for ˆψ yields: ˆψ |T(τ) | ˆψ= x 2 ˆψ1 |T(τ) | ˆψ 1 + y 2 ˆψ2 |T(τ) | ˆψ 2 +2xy ˆψ1 |T(τ) | ˆψ2(II.52)= x 2 ξ 1 + y 2 ξ 2 =(1− y 2 )ξ 1 + y 2 ξ 2 = ξ 1 − y 2 (ξ 1 − ξ 2 ). (II.53)Depend<strong>in</strong>g on the choice of y, thisvalueisbetweenξ 2 and ξ 1 .Ifwenowhadλ 2


18 II. The variational pr<strong>in</strong>cipleThe only problem left is the requirement that the basis functions were so far assumed toorthonormal. However, this can easily be circumvented. Aga<strong>in</strong>, let χ 1 ,...,χ m ∈ L 2 µ(Ω) benormalized basis functions as before, except that they are no longer assumed to be orthogonal.We can still derive a similar result:Theorem II.10 (Roothan-Hall method): For basis functions as above, the l<strong>in</strong>ear comb<strong>in</strong>ationthat maximizes the Rayleigh coefficient is given by the solution b 1 correspond<strong>in</strong>g tothe greatest eigenvalue ξ 1 of the generalized eigenvalue problemH b 1 = ξ 1 S b 1 ,where H is as <strong>in</strong> Theorem II.8 and S is the overlap matrix with entriess ij = χ i | χ j µ.(II.55)(II.56)Proof. Start<strong>in</strong>g out the same way as <strong>in</strong> Theorem II.8, wefirsthavetoreformulatethenormalization constra<strong>in</strong>t. Equation II.43 then becomes:1= ˆψmm2 = b i b j χ i | χ j µ= b i b j s ij .(II.57)i,j=1i,j=1i,j=1This results <strong>in</strong> an effective functional of the form:mmF (b 1 ,...,b m )= b i b j h ij − ξ( b i b j s ij − 1).Maximization requires the solution of:⇒ ξ0=2mb j s ij =j=1i,j=1mb j h ij − 2ξj=1mb j h ij ,j=1mb j s ij .j=1(II.58)(II.59)(II.60)for every i ∈{1,...,m}. Thisisthegeneralizedeigenvalueequationstatedabove.S<strong>in</strong>ceH issymmetric and S is positive def<strong>in</strong>ite, this problem has m real eigenvalues ξ i and correspond<strong>in</strong>geigenvectors b i which are orthonormal with respect to the S-weighted scalar product, i.e. wehave b T i S b j = δ ij . Therefore, similar to Equation II.47 - Equation II.48, wef<strong>in</strong>dforthecorrespond<strong>in</strong>g functions ˆψ i := mj=1 b i,jχ j :ˆψi |T(τ) | ˆψ mmi = b i,j b i,k χ j |T(τ) | χ k = b i,j b i,k h jk (II.61)j,k=1j,k=1 m= ξ i b i,j b i,k s j,k = ξ i b T i S b i = ξ i .j,k=1(II.62)


II.4. The Ritz method and the Roothan-Hall method 19So aga<strong>in</strong>, by tak<strong>in</strong>g the solution b 1 with maximal eigenvalue ξ 1 ,wehavefoundthel<strong>in</strong>earcomb<strong>in</strong>ation that maximizes the Rayleigh coefficient.Clearly, the Ritz method is just a special case of the Roothan-Hall method, where S is givenby the m × m-identity matrix. With the Roothan-Hall method, we can use any basis set,orthonormal or not. Like the density matrix H, theoverlapmatrixS can also be estimatedfrom f<strong>in</strong>ite sampl<strong>in</strong>g. S<strong>in</strong>ce we are mostly <strong>in</strong>terested <strong>in</strong> the propagator eigenfunctions, wewill weight the solution with the stationary distribution afterwards. In most cases, we willhave to estimate µ from a sample trajectory as well. Our method consequently consists ofthe follow<strong>in</strong>g steps:(a) Compute a sufficiently long or sufficiently many short sampl<strong>in</strong>g trajectories of theprocess and check convergence.(b) Estimate the stationary distribution from the sample.(c) Estimate the H- andS-matrix from the sample.(d) Solve the generalized eigenvalue problem from Theorem II.10.(e) Weight the solution with the estimated stationary density.Lastly, we see that the Markov state model approximation is also obta<strong>in</strong>ed from us<strong>in</strong>g theRitz method with <strong>in</strong>dicator functions. So, like <strong>in</strong> section II.2, chooseapartitionofthestate space <strong>in</strong>to s mutually disjo<strong>in</strong>t sets A 1 ,...,A s ,andletthecandidatebasisfunctions1χ 1 ,...,χ s be given by the <strong>in</strong>dicator functions of these sets, normalized by a pre-factor √ πi,with π i = A idx µ(x). Comput<strong>in</strong>gthedensitymatrixH yields:h ij = √ 1 dxπi π jA i√πi P(X τ ∈ A j ,X 0 ∈ A i )= √πj P(X 0 ∈ A i )A jdy µ(x)p(x, y; τ)=√πi√πjp ij (τ).(II.63)(II.64)The density matrix is the same as the MSM transfer matrix up to a similarity transformationwith transformation matrix Π 1 2 ,whichisthediagonalmatrixwiththesquarerootsofthelocal stationary probabilities as diagonal entries. The Ritz method therefore generates theright-hand eigenvalues of the MSM transfer matrix and is equivalent to the construction ofaMarkovstatemodel.


III.Diffusion processes andapplication <strong>in</strong> one dimensionWe are now prepared to apply the variational methods we have just derived to a number ofexample systems. Our guid<strong>in</strong>g example will be a diffusion process. This is a system where anumber of classical particles move under the <strong>in</strong>fluence of a determ<strong>in</strong>istic force field, generatedby some potential energy function. If there was no further <strong>in</strong>fluence, this system could bemodelled by classical equations of motion. However, the particles are also subject to randomfluctuations, which <strong>in</strong>terfere with the determ<strong>in</strong>istic motion. The fundamental concepts ofsuch a system will be <strong>in</strong>troduced below. As a result, the system trajectory can no longerbe uniquely predicted. Instead, we have a stochastic process to which our theoretical resultscan be applied.III.1. Brownian motion and stochastic differentialequationsThe fundamental model for the description of random fluctuations is Brownian motion orthe Wiener process. In1826-27,R.Brownstudiedtheirregularbehaviourofpollengra<strong>in</strong>s<strong>in</strong>water. He and many others observed that the pathways taken by such a particle were highlyirregular. Moreover, whereas the average distance from the start<strong>in</strong>g po<strong>in</strong>t seemed to vanish,the mean fluctuation from the average seemed to be grow<strong>in</strong>g l<strong>in</strong>early. These observationsmotivated the follow<strong>in</strong>g mathematical model, [Evans, ch.3].Def<strong>in</strong>ition III.1: Aone-dimensionalstochasticprocessW t on a probability space Ω, def<strong>in</strong>edfor t ≥ 0, iscalledaBrownianmotionoraWienerprocessifitsatisfies:(a) W 0 =0a.e.


22 III. Diffusion processes and application <strong>in</strong> one dimension(b) For any times t 1 ,...,t k ,the<strong>in</strong>crementsW tk − W tk−1 ,...,W t2 − W t1random variables.are <strong>in</strong>dependent(c) The <strong>in</strong>crements W t −W s are normally distributed accord<strong>in</strong>g to W t −W s ∼N(0,t−s),∀t >s≥ 0.An R n -valued stochastic process where each component is a one-dimensional Wiener processis called n-dimensional Wiener process. In the 1920s, N. Wiener constructed a Brownianmotion and thereby mathematically proved the existence of such a process. In fact, there isanumberofdifferentwayshowthiscanbeachieved,buttheexplicitrealizationisoflittleimportance. All we need to know are the three conditions from the above def<strong>in</strong>ition andthe most important properties. It can be shown that a Brownian motion def<strong>in</strong>es a Markovprocess, that the sample paths, i.e. the functions t → W t ,arecont<strong>in</strong>uous,butalmosteverywhere<strong>in</strong> Ω nowhere differentiable with respect to time, see Figure III.1 and [Behrends, 2011,ch. 5.2]. This model has turned out to be a very suitable for many different phenomena,reach<strong>in</strong>g way beyond the simulation of particles <strong>in</strong> a fluid. For <strong>in</strong>stance, Brownian motionhas frequently been used to model f<strong>in</strong>ancial markets.200150100W t500−50−1000 2000 4000 6000 8000 10000Time step tFigure III.1.: AsamplepathoftheWienerprocess. Clearly,thepathiscont<strong>in</strong>uousbutnotdifferentiable as a function of time.Despite the irregularity of its sample paths, Brownian motion can be used to model phenomena<strong>in</strong>volv<strong>in</strong>g differential calculus. Suppose that the change of a quantity X(t) over a short


III.2. Diffusion process 23time ∆t is given by∆X(t) =∆tF (X(t),t)+σW t ,(III.1)mean<strong>in</strong>g that the change is composed of a determ<strong>in</strong>istic function F depend<strong>in</strong>g on the quantityX(t) and the time plus a random fluctuation, modelled by Brownian motion. Here, σ>0is simply a control parameter govern<strong>in</strong>g the strength of the perturbation. Upon divid<strong>in</strong>g by∆t and pass<strong>in</strong>g to the limit ∆t → 0, oneistemptedtowritedownadifferentialequationdX(t)dt= F (X(t),t)+σdW t , (III.2)where we would like to have dW t denote the time derivative of Brownian motion. We havejust seen that such a derivative does not exist. However, Equation III.2 does make sense ifit is transformed <strong>in</strong>to an <strong>in</strong>tegral form. This <strong>in</strong>volves the def<strong>in</strong>ition of a stochastic <strong>in</strong>tegraland the Ito formula, [Behrends, 2011, ch.6,7].Inparticular,Brownianmotionitselfcanbeshown to solve Equation III.2 <strong>in</strong> this sense, if F =0.III.2. Diffusion processConsider a classical particle of mass m mov<strong>in</strong>g under the <strong>in</strong>fluence of friction γ>0 and ofaforcefieldF, generatedbyapotentialenergyfunctionV ,i.e. F = −∇V . By Newton’sequation, the position vector x(t) ∈ R 3 satisfies the relationm d2 x(t)dt 2= −mγ dx(t)dt−∇V (x(t)).(III.3)Let us now suppose that there are random perturbations, caused, for <strong>in</strong>stance, by collisionswith small particles, which cont<strong>in</strong>uously alter the particle’s motion. We therefore makeEquation III.3 astochasticdifferentialequationbyadd<strong>in</strong>gσdW t ,m d2 x(t)dt 2= −mγ dx(t)dt−∇V (x(t)) + σdW t ,(III.4)keep<strong>in</strong>g <strong>in</strong> m<strong>in</strong>d that it is meant to be solved <strong>in</strong> an <strong>in</strong>tegral sense. It is known thatσ = √ 2mγk B T ,wherek B =1.3806488J/K is Boltzmann’s constant and T is the systemtemperature. This equation is typically called the Langev<strong>in</strong> equation of our system,the solution x(t) =:X t is a Markov process, called a diffusion process. Equation III.4 holds<strong>in</strong> the same way if we study a system of n particles, <strong>in</strong> this case, X t is a stochastic process <strong>in</strong>3n dimensions. If the friction parameter γ is large enough, it is justified to drop the secondderivative on the left-hand side of Equation III.4, [Zwanzig, 2001, sec. 2.2].Rearrang<strong>in</strong>gtheterms modifies the equation todx(t)dt= − 1mγ ∇V (x(t)) + √ 2DdW t ,(III.5)


24 III. Diffusion processes and application <strong>in</strong> one dimensionwhere we have def<strong>in</strong>ed D := k BTThis equation is the Langev<strong>in</strong> equation <strong>in</strong> the high frictionlimit, the underly<strong>in</strong>g dynamics is called Smoluchowski dynamics or also Brownianmγdynamics. Itwillbethe<strong>in</strong>terestofourstudies<strong>in</strong>whatfollows.The Langev<strong>in</strong> equation itself can hardly be solved, except <strong>in</strong> a few rare cases. However, itis helpful to restrict oneself to the study of probability densities. Instead of try<strong>in</strong>g to f<strong>in</strong>dan expression for the stochastic process X t itself, we are only <strong>in</strong>terested <strong>in</strong> comput<strong>in</strong>g theaverage probability ρ(x,t) to f<strong>in</strong>d the particle <strong>in</strong> a position x at time t. This probabilitydensity also satisfies a differential equation, called the Smoluchowski equation:Theorem III.2 (Smoluchowski equation): The probability ρ(x,t) to f<strong>in</strong>d a system obey<strong>in</strong>gthe Langev<strong>in</strong> Equation III.5 at position x at time t satisfies the differential equation:∂ρ(x,t)∂t= 1 ∇· (∇V (x)ρ)+D∆ρ(x,t).mγ (III.6)Here, ∇· denotes the divergence of a vector field and ∆ the Laplace operator of a scalarfield.Proof. Aderivationcanbefound<strong>in</strong>[Zwanzig, 2001, ch2.2].Equation III.6 is a special case of a Fokker-Planck equation. Thisresultenablesustof<strong>in</strong>dthe stationary density for the diffusion process:Theorem III.3: The stationary solution of the Smoluchowski Equation III.6 is the Boltzmanndistributionµ(x) = 1 Zexp (−βV (x))) ,(III.7)where β := 1 is the <strong>in</strong>verse Boltzmann temperature and Z := k Bdx exp(−βV (x)) is aTΩnormalization constant, called the partition function.Proof. If ∂ρ∂t=0holds, we have to check that− 1 ∇· (∇V (x)µ(x)) = D∆µ(x).mγ (III.8)


III.3. Numerical considerations 25We f<strong>in</strong>d:D∆µ(x) = k BT mγZi= k BTmγZ∂ ∂V (x)−β exp (−βV (x))∂x i ∂x i−β∆V (x)exp(−βV (x)) + i= − 1 [∆V (x)µ(x)+∇V (x) ·∇µ(x)]mγ (III.11)= − 1 ∇· (∇V (x)µ(x)) .mγ (III.12)(III.9) 2 ∂V (x)β 2 exp (−βV (x))∂x i(III.10)We note that µ has to be well-def<strong>in</strong>ed for this result to make sense, i.e. the <strong>in</strong>tegral def<strong>in</strong><strong>in</strong>gZ has to exist. This ma<strong>in</strong>ly requires that V (x) →∞fast enough as |x| →∞. If thestate space is dynamically connected, which we will always assume, this can also be shownto be sufficient for the process to be ergodic and meet the requirements from chapter II,[Sarich, Noé, Schütte, 2010, ch. 2.2]. Therefore,wehavejustshownthatµ is the stationarydistribution of the diffusion process, and we can now apply the forego<strong>in</strong>g theory.III.3. Numerical considerationsKnow<strong>in</strong>g the <strong>in</strong>variant distribution of a diffusion process helps us a lot, because our computationalresult for the first eigenfunction can be compared to the function exp(−β∇V (x)),and should be almost equal to it up to a pre-factor. However, little more can be computed<strong>in</strong> advance, except for a few simple systems. In particular, the transition kernel p(x, y; τ) isunknown to us. But <strong>in</strong> order to apply the variational methods, we need to perform a numericalsimulation of the process, which means that if the current simulation step is x ∈ Ω, weneed to draw the next step with time lag ∆t from the distribution p(x, · ;∆t). Wethereforeapproximate this distribution by a very common procedure, the so-called Euler-Maruyamamethod. It consists of comb<strong>in</strong><strong>in</strong>g a determ<strong>in</strong>istic explicit Euler step with a normally distributedrandom step. If the current simulation position is x, thedeterm<strong>in</strong>isticEulerstepisx → x − ∆t∇V (x).(III.13)This approximates the position that the process would atta<strong>in</strong> if it was unperturbed. Buts<strong>in</strong>ce it is subject to normally distributed noise which, after time ∆t, hasvariance2D∆t,we add a normally distributed random number with standard deviation √ 2D∆t. Altogether,


26 III. Diffusion processes and application <strong>in</strong> one dimensionthe simulation step becomesx → x − ∆t∇V (x)+ √ 2D∆tη,(III.14)where η ∼N(0, 1). Clearly,thisisjustanapproximationofthetruedynamics,butagoodone as long as ∆t is chosen small enough. However, hav<strong>in</strong>g <strong>in</strong> m<strong>in</strong>d the difficulties aris<strong>in</strong>gfrom the use of the explicit Euler method to unperturbed dynamical systems, it can also leadto serious trouble, a problem we will encounter <strong>in</strong> the next chapter.We are left with the question of how to choose an appropriate basis set. In the previouschapter, we have seen that the eigenfunctions φ i of the propagator and the eigenfunctionsψ i of the transfer operator differ only by a factor of the stationary density. An approximationscheme us<strong>in</strong>g the Roothan-Hall method results <strong>in</strong> an estimate of the functions ψ i .Buts<strong>in</strong>ceit is our goal to determ<strong>in</strong>e the propagator eigenfunctions φ i ,wecaneitherstartwithasetof functions suitable to approximate the functions ψ i ,andweighttheresultwithafactorµ afterwards, or we can go the other way round. That means to start with a number offunctions suitable to estimate the φ i -functions and divide them by µ prior to the computation.After the computation, we simply take the result<strong>in</strong>g l<strong>in</strong>ear comb<strong>in</strong>ation of our orig<strong>in</strong>al basisfunctions to estimate the functions φ i .Thereisalsoathirdway:Iff ∈ L 2 (Ω) is an elementof the unweighted L 2 -space, then √ fµis <strong>in</strong> L 2 µ(Ω) on the one hand, and √ µf is <strong>in</strong> L 2 µ(Ω) −1on the other hand. These functions are somehow “<strong>in</strong> between“ the two doma<strong>in</strong>s, and theycan also be used with the Roothan-Hall method. To this end, they have to be divided bythe square root of µ before the computation, and the result<strong>in</strong>g l<strong>in</strong>ear comb<strong>in</strong>ation has to bere-weighted by √ µ afterwards.Our approach to all of the follow<strong>in</strong>g problems will be to make use of the shape of the energyfunction. The function V will always display a number of m<strong>in</strong>ima, which correspond tometastable regions of the state space. We will try to use local functions, like Gaussian hats,which are essentially non-zero only <strong>in</strong> a neighbourhood of the metastable regions. S<strong>in</strong>cefunctions <strong>in</strong> L 2 µ(Ω) and L 2 (Ω) at least have to decay to zero if |x| becomes large, whereas−1functions <strong>in</strong> L 2 µ(Ω) can be constant or even unbounded, we will prefer one of the lattertwo ways. Additionally, us<strong>in</strong>g unbounded functions can lead to <strong>in</strong>stabilities if rare statesoutside the metastable regions are visited by a sampl<strong>in</strong>g trajectory, because these functionscan become very large for such states. In the experiments, both of the two preferred waysseemed to perform well. All of the computations shown below were ran us<strong>in</strong>g Gaussianfunctions weighted with the square root of the stationary density.III.4. Diffusion <strong>in</strong> a quadratic potentialAsimplebutimportantexampleisdiffusion<strong>in</strong>aquadraticpotentialV (x) = 1 2bx2 , forx ∈ R and b>0. Insert<strong>in</strong>g this <strong>in</strong>to Equation III.7 shows that µ(x) = √ 12παexp ,the stationary distribution becomes a Gaussian distribution with zero mean and varianceα = k BT. The potential and its <strong>in</strong>variant measure are depicted <strong>in</strong> Figure III.2. It turns outb− x22α


III.4. Diffusion <strong>in</strong> a quadratic potential 2780.470.3560.350.25V(x)4µ(x)0.230.1520.110.050−4 −3 −2 −1 0 1 2 3 4x0−4 −3 −2 −1 0 1 2 3 4xFigure III.2.: Quadratic potential V (x) =0.5x 2 and its <strong>in</strong>variant distribution.that this is one of the rare cases where all important quantities can be determ<strong>in</strong>ed analytically.Us<strong>in</strong>g stochastic <strong>in</strong>tegrals and Ito’s formula, we derive an explicit expression for this process <strong>in</strong>Lemma A.1. Us<strong>in</strong>gthisexpression,wethenshowthatiftheprocessstartswithadistributionρ 0 ,theexpectationvalueattimet is given by E [ρ t ]=e −θt E [ρ 0 ],wherewehavedef<strong>in</strong>edθ := bmγ .Moreover,thevarianceevolvesaccord<strong>in</strong>gtoω2 [ρ t ]=e −2θt ω 2 [ρ 0 ]+(1− e −2θt )α.Regardless of the <strong>in</strong>itial distribution, the process quickly approaches the stationary Gaussiandistribution. Even the complete set of eigenfunctions can be found analytically:Lemma III.4: The propagator eigenfunctions are given by the appropriately scaled Hermitepolynomials, multiplied with the <strong>in</strong>variant measure: αφ i (x) =i−1 x(i − 1)! H i−1 √α µ(x). (III.15)The correspond<strong>in</strong>g eigenvalues are λ i (τ) =e −θ(i−1)τ .Proof. We also show the proof of this Lemma <strong>in</strong> the appendix.S<strong>in</strong>ce there is only one potential m<strong>in</strong>imum, there are no metastable states. Unless b is verysmall or mass and friction are very large, the global relaxation time −τ = τ = mγlog λ 2is(τ) θτ bshort, the process quickly equilibrates to its <strong>in</strong>variant distribution.S<strong>in</strong>ce all these analytic quantities are at hand, we can compare them to the results obta<strong>in</strong>edfrom the Roothan-Hall method. As shown <strong>in</strong> Figure III.3, arelativelysmallnumericaleffortleads to a good approximation of the eigenfunctions as well as the correspond<strong>in</strong>g eigenvaluesand implied time scales. Though this example may be a simple one, it is still important. In


28 III. Diffusion processes and application <strong>in</strong> one dimension0.450.4Estimated φ1Exact φ10.250.2Estimated φ2Exact φ20.350.150.30.1φ1(x)0.250.2φ2(x)0.050−0.050.15−0.10.1−0.150.05−0.20−4 −3 −2 −1 0 1 2 3 4x−0.25−4 −3 −2 −1 0 1 2 3 4xSecond eigenvalue λ210.980.960.940.920.90.880.860.840.82Estimated second eigenvalueExact second eigenvalueImplied time scale t210.990.980.970.960.950.940.930.920.910.80 50 100 150 200Time lag τ0.90 50 100 150 200Time lag τFigure III.3.: Results obta<strong>in</strong>ed from apply<strong>in</strong>g the Roothan-Hall method to diffusion <strong>in</strong> a harmonicpotential. A simulation of 20 million time steps of length ∆t =10 −3was used with thirteen Gaussian test functions centred at −3, −2, −1, −0.8,−0.4, −0.2, 0, 0.2, 0.4, 0.8, 1, 2, 3. The variances were set to 0.5 for allcentres between −0.8 and 0.8, andto1 for the rema<strong>in</strong><strong>in</strong>g functions. (a) Firsteigenfunction φ 1 .(b)Secondeigenfunctionφ 2 .(c)Secondeigenvalueλ 2 .(d)Estimated second implied time scale t 2 .Exactvalueist 2 =1.


III.5. Two-well potential 29molecular simulations, a harmonic potential is often used to model bonds between atoms,as well as bond angles. The harmonic potential will therefore frequently show up <strong>in</strong> applications.III.5. Two-well potentialAnother important example is a two-well potential. In general, this is some potential functiondisplay<strong>in</strong>g two m<strong>in</strong>imum positions which are separated by a significant energy barrier, andwhich rises to <strong>in</strong>f<strong>in</strong>ity both to the left and to the right of the m<strong>in</strong>ima. For our calculations,we choose V (x) =k(x 4 − 2x 2 +1) for some k>0, whichhastwom<strong>in</strong>imaatx = −1and x =1.Theenergyfunctionandthe<strong>in</strong>variantdistributionaredisplayed<strong>in</strong>Figure III.4aand Figure III.4b. Thetwo-wellpotentialisagoodmodelforcerta<strong>in</strong>molecular<strong>in</strong>teractionswhere a certa<strong>in</strong> degree of freedom favours two characteristic positions. The regions aroundthe potential m<strong>in</strong>ima correspond to metastable states, we therefore expect one dom<strong>in</strong>antslow process apart from the stationary process. We apply the Roothan-Hall method us<strong>in</strong>gthirteen Gaussian functions centred around the two m<strong>in</strong>ima, with uniform variance equalto 0.5. In order to evaluate the results, we also computed the eigenvalues of an MSMtransition matrix. Here, we used a f<strong>in</strong>e discretization of the state space <strong>in</strong>to 100 sets, mostof them situated close to the potential m<strong>in</strong>ima. A comparison of the results can be seen<strong>in</strong> Figure III.4. Most importantly, we see that the implied time scales are estimated verywell. Both methods converge quickly to about the same value. The stationary distributionis very well approximated, and we obta<strong>in</strong> a conv<strong>in</strong>c<strong>in</strong>g result for the second eigenfunction,clearly display<strong>in</strong>g a characteristic sign change between the two metastable regions. Thecomputational effort required by the Roothan-Hall method is much smaller, s<strong>in</strong>ce we onlyneed to compute an eleven by eleven matrix <strong>in</strong>stead of a 100 × 100 matrix.


30 III. Diffusion processes and application <strong>in</strong> one dimension54.543.530.90.80.70.60.5Estimated φ1Exact φ1V(x)2.5φ1(x)0.420.31.50.210.10.500−1.5 −1 −0.5 0 0.5 1 1.5x−0.1−3 −2 −1 0 1 2 3x(a)(b)10.9950.99Estimated λ2 from Roothan−HallEstimated λ2 from MSM4.5Second eigenvalue λ20.9850.980.9750.970.9650.960.9550.950 50 100 150 200Time lag τ(c)Implied time scale t243.5Estimated t2 from Roothan−HallEstimated t2 from MSM30 50 100 150 200Time lag τ(d)10.8Estimated φ20.60.40.2φ2(x)0−0.2−0.4−0.6−0.8−1−3 −2 −1 0 1 2 3x(e)Figure III.4.: Application of the Roothan-Hall method to the one-dimensional two-well potential.A simulation of 20 million time steps with ∆t =10 −3 was used with thirteenGaussian hats centred at −2, −1.5, −1.2, −1, −0.8, −0.5, 0, 0.5, 0.8, 1,1.2, 1.5, 2. Thevariancesweresetto1 for the centres at x = −2, −1.5, 0, 1.5, 2and to 0.5 for all others. We compare the results with a 100 set MSM discretization.(a) Potential function V (x). (b) First eigenfunction φ 1 . (c) Secondeigenvalue λ 2 . (d) Second implied time scale t 2 . (e) Second eigenfunction φ 2 .


IV.Application to moleculesIn this chapter, we are go<strong>in</strong>g to apply the variational method to diffusion processes <strong>in</strong> ahigher dimensional state space. For systems with many degrees of freedom, the potentialenergy function often possesses multiple m<strong>in</strong>ima, and conta<strong>in</strong>s at times highly complicated<strong>in</strong>teractions between several of the coord<strong>in</strong>ates. Evaluation of the partition function becomesvery difficult, and requires specialized algorithms like Markov cha<strong>in</strong> Monte Carlo methods.Comput<strong>in</strong>g eigenvalues and characteristic time scales can be done us<strong>in</strong>g Markov state models,but the choice of adequate sets requires suitable cluster<strong>in</strong>g methods. This also leads to ahuge computational effort if the state space is very high-dimensional. We have the hope thatthe use of variational methods can help to reduce this effort.IV.1. The example systemLet us consider a system which is like a very much simplified small molecule. Let it consistof N atoms, with N be<strong>in</strong>g small, either equal to 4 or to 5 <strong>in</strong> what follows. Denote theirposition vectors by r i ∈ R 3 , i ∈{1,...,N}. Neighbour<strong>in</strong>gatomsareconnectedbyabond.Denote the distance vectors between those atoms by r ij := r j − r i and the distances byr ij := r ij .Forthreeneighbour<strong>in</strong>gatoms,wedef<strong>in</strong>ethebondangleθ ijk byθ ijk := cos −1 − r ij | r jk r ij r jk, (IV.1)which is the angle between r ij and r jk . Additionally, for four atoms, we can def<strong>in</strong>e thedihedral angle ψ ijkl as the angle between the plane spanned by r ij , r jk and the one spannedby r jk , r kl .Us<strong>in</strong>gthenormalvectorsn ijk and n jkl ,givenbyn ijk := r ij × r jk ,(IV.2)which are perpendicular to the respective planes, we f<strong>in</strong>d that the dihedral is the anglebetween the two normal vectors: ψ ijkl =cos −1 nijk | n jkl . (IV.3)n ijk n jkl


32 IV. Application to molecules‘-168.931.49Figure IV.1.: Schematic draw<strong>in</strong>g of the four atom system, correspond<strong>in</strong>g either to system Aor B. It shows the dihedral angle <strong>in</strong> blue. Created with [VMD, 1996].Asketchofsuchasystemisshown<strong>in</strong>Figure IV.1.Let us now def<strong>in</strong>e a potential function V ,whichgeneratesaforcefieldact<strong>in</strong>gonthemolecule.Abondbetweentwoatomsistypicallymodelledlikeaspr<strong>in</strong>g. Therefore,wedef<strong>in</strong>eharmonicpotentials V ij ,depend<strong>in</strong>gonthedistancer ij ,foreachpairofconnectedatoms:V ij := 1 2 k ij (d ij − r ij ) 2 .(IV.4)Here, k ij is a constant def<strong>in</strong><strong>in</strong>g the strength of the potential and d ij is the distance of m<strong>in</strong>imalenergy. The system will drive the bond lengths towards the m<strong>in</strong>imum distances d ij .Similarly,we set up harmonic potentials V ijk for each bond angle:V ijk := 1 2 k ijk (d ijk − θ ijk ) 2 .(IV.5)The dihedral angles usually have a number of favoured positions. In order to model this, wedef<strong>in</strong>e the dihedral potential asV ijkl := k ijkl (1 − cos(nψ ijkl )) ,(IV.6)for ψ ijkl ∈ [−π, π). The position ψ ijkl = π is identical to ψ ijkl = −π, consequently,itisenough to have V ijkl def<strong>in</strong>ed on the above <strong>in</strong>terval. This potential has n m<strong>in</strong>imum positions,


IV.1. The example system 33221.81.81.61.61.41.41.21.2V(ψ)1V(ψ)10.80.80.60.60.40.40.20.20−4 −3 −2 −1 0 1 2 3 4ψ0−4 −3 −2 −1 0 1 2 3 4ψFigure IV.2.: The potential energy function V (ψ ijkl )=k ijkl (1 − cos(nψ ijkl )) for the dihedralangle, with k ijkl =1and n =2, n =4,respectively.as displayed <strong>in</strong> Figure IV.2. Ifn =2,thesetwom<strong>in</strong>imaaresituatedatψ ijkl =0;−π. Lastly,we would also like to <strong>in</strong>clude Coulomb potentials V C def<strong>in</strong>ed by:V C (r ij )= 14π 0e 1 e 2r ij, (IV.7)which describe the electrostatic repulsion or attraction between two atoms. Here, 0 is theelectric constant, and e 1 , e 2 are the charges of the two atoms. In our simulation, the chargeswill be equal to one elementary charge e =1.602 · 10 −19 C <strong>in</strong> absolute value, but can haveopposite signs. The total potential energy function V is then the sum of all <strong>in</strong>dividualenergies.For our computations, we will consider three small systems, which we call system A, systemB and system C.Thefirstisafouratomsystemwhichonly<strong>in</strong>cludesthethreeharmonicbond<strong>in</strong>teractions, two bond angle <strong>in</strong>teractions, and a dihedral potential with n =2.Thesecondone also consists of four atoms, but we set n =4for the dihedral and add an attractiveCoulomb <strong>in</strong>teraction between atoms one and four. In the last case, we have N =5,fourbonds, three bond angles and two dihedrals, one with n =4and the other with n =2.If we now add random perturbations and Smoluchowski dynamics, we obta<strong>in</strong> a diffusionprocess with <strong>in</strong>variant distributionµ(x) = 1 Zexp(−βV (x)),(IV.8)where Z is the partition function, as before, and x ∈ R 3N is the full position vector conta<strong>in</strong><strong>in</strong>gall coord<strong>in</strong>ates of the atoms.It is important to po<strong>in</strong>t out once more that the stationary distribution and all the othereigenfunctions are def<strong>in</strong>ed on the 3N-dimensional Euclidean space. As such, they probablytake a highly complicated shape. At least for our examples however, the state of the system


34 IV. Application to molecules12121010Absolute frequency14 x 104 Bond angle θ86Absolute frequency14 x 104 Bond angle θ86442201.6 1.7 1.8 1.9 2 2.1 2.2 2.301.75 1.8 1.85 1.9 1.95 2 2.05 2.1 2.15 2.2Figure IV.3.: Statistical distribution of the two bond angles of a four atom system withCoulomb 1-4 attraction. The position of m<strong>in</strong>imum energy is θ =1.9373 =111.00 deg. We see that the <strong>in</strong>teraction causes a slight shift of both distributions,but because of the strong bonds, the Gaussian type distribution is wellconserved. Clearly, the changes of the distributions caused by the <strong>in</strong>teractionscan be much more drastic, but still we hope that they will not altogether destroythe orig<strong>in</strong>al shape.as well as its potential energy is completely determ<strong>in</strong>ed by the bond lengths, the bond anglesand the dihedral angles. These coord<strong>in</strong>ates are called the <strong>in</strong>ternal coord<strong>in</strong>ates. Projectedonto these <strong>in</strong>ternal coord<strong>in</strong>ates, the eigenfunctions will probably have a much simpler form.Clearly, these projections will <strong>in</strong> general not be equal to the eigenfunctions we would f<strong>in</strong>dif a one-dimensional particle was propagated under the <strong>in</strong>fluence of the energy depend<strong>in</strong>gon that coord<strong>in</strong>ate only, s<strong>in</strong>ce the <strong>in</strong>ternal coord<strong>in</strong>ates are not <strong>in</strong>dependent of each other(see Figure IV.3). But still, we have the hope that the projections somehow resemble thesefunctions. This is the central idea of the follow<strong>in</strong>g calculations: If the <strong>in</strong>ternal coord<strong>in</strong>ates arenot too heavily dependent on one another, we can try to apply the variational method withlocal functions, dependent on just one <strong>in</strong>ternal coord<strong>in</strong>ate, and placed near the m<strong>in</strong>imumpositions of the <strong>in</strong>dividual potential energies. Choos<strong>in</strong>g these functions does not require theuse of cluster<strong>in</strong>g techniques, but can be based on knowledge of the molecular energy function,which is known as an expression of the <strong>in</strong>ternal coord<strong>in</strong>ates. S<strong>in</strong>ce the number of <strong>in</strong>ternalcoord<strong>in</strong>ates grows at most like O(N 2 ),thesamewouldbetrueforthenumberofbasisfunctions needed. As before, we will stick to Gaussian functions <strong>in</strong> the square root weightedspace throughout the follow<strong>in</strong>g examples, and we will only use functions which depend onone of the dihedral coord<strong>in</strong>ates.


IV.2. Simulation 35Quantity Symbol Unit ValueBoltzmann constant k B kJ/mol · K 8.3144621 · 10 −3Avogadro constant N A mol −1 6.02214129 · 10 23Electric constant 0 Fm −1 8.854187817 · 10 −12Elementary charge e C 1.602176565 · 10 −19Atomic mass unit u kg 1.660538912 · 10 −27Mass of one carbon atom m kg 12.001uReference distance between atoms d b nm 0.153Reference bond angle d a deg 111.00Temperature T K 400 (A,B) or 600 (C)Friction γ ps −1 200Bond force constant k b kJ/mol · nm 2 2 × 10 5Angle force constant k a kJ/mol · rad 2 2 × 10 3Dihedral force constant k d kJ/mol 5.92Table IV.1.: Overview of constants and parameters used <strong>in</strong> the simulation.IV.2. SimulationIn order to produce simulation trajectories of the above system, we implement a small Matlabbasedmolecular dynamics program. It allows us to def<strong>in</strong>e all necessary parameters, such astemperature, friction, mass of the atoms, potentials to be used, simulation time step andnumber of time steps. After <strong>in</strong>itializ<strong>in</strong>g all these, it propagates the system <strong>in</strong> its Euclideancoord<strong>in</strong>ates, that is, for each time step, it computes the gradient of the potential <strong>in</strong> Euclideancoord<strong>in</strong>ates and adds a random perturbation. After each step, it computes the <strong>in</strong>ternalcoord<strong>in</strong>ates and saves them to a b<strong>in</strong>ary file. Optionally, one can also store the Euclideancoord<strong>in</strong>ates as a b<strong>in</strong>ary file or as an .xyz-file to be used with an MD-viewer.As for the sett<strong>in</strong>gs, we try to make a realistic simulation and use physical units, <strong>in</strong> particularnm for distances, ps for time and kJ/mol for energies. We also try to use realistic values forthe reference distances and angles, as well as for the force constants. However, we also wishfor a certa<strong>in</strong> type of behaviour to show up. First, we choose the bonds and the angles to bevery strong, mean<strong>in</strong>g that they should be distributed closely around the reference positions,which requires k ij and k ijk to be chosen sufficiently large for all bonds and all bond angles.We encounter a number of difficulties here. If k ij is large, the Euler discretization canquickly lead to a blow-up. Consequently, we either have to make the simulation time stepvery small or friction large enough. Additionally, we want the dihedral angles to display ametastable behaviour, like the type of behaviour we have studied <strong>in</strong> the previous examples.This requirement limits the temperature, because a high temperature will flatten the energylandscape, and also keeps friction from be<strong>in</strong>g chosen too large, s<strong>in</strong>ce this would lead to verylong wait<strong>in</strong>g times until a transition occurs. Furthermore, it keeps the time step from be<strong>in</strong>gmade too small, s<strong>in</strong>ce this would make simulations cover<strong>in</strong>g sufficiently many transitions very


36 IV. Application to moleculesexpensive. It turns out to be quite difficult to f<strong>in</strong>d sett<strong>in</strong>gs match<strong>in</strong>g these requirements. Inparticular, we have to turn away from values used <strong>in</strong> real world examples a number of times.Acompletelistoftheparametersweusedisshown<strong>in</strong>Table IV.1.IV.3. Application of the Roothan-Hall methodWith a sufficiently long simulation trajectory at hand, we can turn to the approximation ofeigenvalues and time scales. For system A, we expect one dom<strong>in</strong>ant slow process to show up.We start with a small basis set, consist<strong>in</strong>g of no more than two Gaussian functions centredat the two m<strong>in</strong>imum positions, <strong>in</strong> order to get a rough approximation of the eigenvalues.Furthermore, we apply the method to a large basis set of twenty-seven functions, most ofthem centred closely around the two m<strong>in</strong>ima, with variance equal to 0.1. Detailsaboutthebasis sets used <strong>in</strong> this chapter are given <strong>in</strong> Table IV.2. As a reference, we also computethe eigenvalues and -vectors of two Markov state model discretizations. One of them justsplits up the state space <strong>in</strong>to two sets along the dihedral coord<strong>in</strong>ate. The other one is a f<strong>in</strong>ediscretization <strong>in</strong>volv<strong>in</strong>g 100 sets along the same coord<strong>in</strong>ate. Results show<strong>in</strong>g the estimatedimplicit time scales, the dom<strong>in</strong>ant eigenvalues, as well as the first two eigenfunctions, aredepicted <strong>in</strong> Figure IV.4. Weseethat<strong>in</strong>bothcases,theslowprocesscanberesolved,andtheimplicit time scales converge as the lag time τ <strong>in</strong>creases. As expected, the f<strong>in</strong>e discretizationsconverge much faster than the coarse ones. Most importantly, we see that the large basisset Roothan-Hall approximation is as good as the f<strong>in</strong>e MSM for most lag times. Also, bothmethods yield a very good approximation of the stationary density, compared to the statisticalestimate obta<strong>in</strong>ed from sampl<strong>in</strong>g, and a conv<strong>in</strong>c<strong>in</strong>g estimate of the second eigenfunction,display<strong>in</strong>g the characteristic sign change between the m<strong>in</strong>ima. This is a good result, it showsus that the variational method yields an approximation which is comparable to a f<strong>in</strong>e MSM.We also learn from our attempts that the choice of basis functions is very <strong>in</strong>fluential to theapproximation quality. If the functions are too strongly peaked, they are not sufficiently correlatedand the eigenvalues go down. However, if they are too widely spread, they correlatetoo much and the H-matrix is just poorly estimated. As a consequence, the computationreturns eigenvalues which are greater than one or have non-zero imag<strong>in</strong>ary parts. But as wesaid <strong>in</strong> the beg<strong>in</strong>n<strong>in</strong>g of the chapter, the shape of the energy function can be used to choosethe right functions and move <strong>in</strong> the right direction.For system B, the process will now display four metastable regions, but they will be populatedwith different frequencies. Whereas the m<strong>in</strong>imum which is closest to atom number one willbe visited much more often because of the electrostatic attraction, the others will be lesspopulated. We therefore expect a number of dom<strong>in</strong>ant slow processes to appear, describ<strong>in</strong>gthe transitions between different m<strong>in</strong>ima. Aga<strong>in</strong>, we apply the Roothan-Hall method withtwo different basis sets, a small one consist<strong>in</strong>g of four functions, and a large one consist<strong>in</strong>gof 35 functions, which are distributed closely around the four m<strong>in</strong>ima. We f<strong>in</strong>d that threedom<strong>in</strong>ant slow processes are present, and compare the estimates for implicit time scales and


IV.3. Application of the Roothan-Hall method 37eigenvalues with a four set and a 100 set MSM discretization. The results are shown <strong>in</strong>Figure IV.5 and Figure IV.6. Theyaresimilartothepreviousexample,botheigenvaluesandtime scales as well as the eigenfunctions are approximated comparably well by both the largeset Roothan-Hall method and the f<strong>in</strong>e MSM.These examples had <strong>in</strong> common that the metastable behaviour was only dependent on ones<strong>in</strong>gle coord<strong>in</strong>ate. As a last step, we take a look at system C, where more than one coord<strong>in</strong>ateis relevant for the description of slow processes. For the Roothan-Hall method, wecomb<strong>in</strong>e the two basis sets used <strong>in</strong> the previous examples, that means we have 35 basisfunctions which depend on the first dihedral coord<strong>in</strong>ate and 27 which cover the second. Wecompare the results to those stemm<strong>in</strong>g from a Markov state model discretization, whichuses all possible comb<strong>in</strong>ations of 15 sets along each coord<strong>in</strong>ate, i.e. 225 sets altogether.The second, third and fourth eigenvalue and time scale, estimated with both methods, areshown <strong>in</strong> Figure IV.7. WeseethatthetimescalesestimatedwiththeRoothan-Hallmethodconverge earlier. This is a good sign, because we are able to estimate the eigenvalues andtime scales of the larger system by simply add<strong>in</strong>g up the basis functions used for the smallersystem, without hav<strong>in</strong>g to use all possible comb<strong>in</strong>ations of these functions.


38 IV. Application to moleculesSystem Potentials Basis setA Three bonds, two bond angles, one Small basis: Two Gaussians, centredat − π, π,bothwithvariance2 2dihedral with two m<strong>in</strong>ima.0.3. To simplify the computation,we shifted the m<strong>in</strong>ima to those twopositions.Large basis: 27 Gaussians, centredat −3.1, −3.0, −2.8, −2.6, −2.4,−2.2, −2.0, −1.5, −1.0, −0.8,−0.6, −0.4, −0.2, 0, 0.2, 0.4, 0.6,0.8, 1.0, 1.5, 2.0, 2.2, 2.4, 2.6, 2.8,3.0, 3.1. All of them with varianceBCThree bonds, two bond angles, onedihedral with four m<strong>in</strong>ima, electrostaticattraction between atomsone and four.Four bonds, three bond angles, twodihedrals with n =4for the firstone and n =2for the second one.0.1.Small basis: Four Gaussians, centredat − 3π, − π, π, 3π,allofthem4 4 4 4with variance 0.1. To simplify thecomputation, we shifted the m<strong>in</strong>imato those four positions.Large basis: 35 Gaussians centredat −3.0, −2.9, −2.8, −2.6, −2.2,−2.0, −1.8, −1.7, −1.6, −1.5,−1.4, −1.2, −1.0, −0.8, −0.4,−0.2, −0.1, 0, 0.1, 0.2, 0.4, 0.8,1.0, 1.2, 1.4, 1.5, 1.6, 1.7, 1.8, 2.0,2.2, 2.6, 2.8, 2.9, 3.0. All of themwith variance 0.1.The large set from B for the firstdihedral and the large set from Afor the second.Table IV.2.: Description of the example systems and the basis sets used for them.


IV.3. Application of the Roothan-Hall method 3912100.70.6Estimated φ1, Roothan−HallEstimated φ1, sampl<strong>in</strong>g80.5V(ψ)64φ1(ψ)0.40.30.220.10−4 −3 −2 −1 0 1 2 3 4ψ(a)0−4 −3 −2 −1 0 1 2 3 4ψ(b)Second eigenvalue λ21.00510.9950.990.9850.98Estimated λ2, large Roothan−HallEstimated λ2, large MSMEstimated λ2, small MSMEstimated λ2, small Roothan−HallImplied time scale t23530252015100.9750.970 50 100 150Time lag τ(c)Estimated t2, large Roothan−Hall5Estimated t2, large MSMEstimated t2, small Roothan−HallEstimated t2, small MSM00 50 100 150Time lag τ(d)0.60.40.2φ2(ψ)0−0.2−0.4−0.6Estimated φ2, Roothan−HallEstimated φ2, MSM−0.8−4 −3 −2 −1 0 1 2 3 4ψ(e)Figure IV.4.: Approximation results for system A. We used every third step of a 30 milliontrajectory correspond<strong>in</strong>g to a sampl<strong>in</strong>g time step ∆t =10 −3 ps. We comparethe results obta<strong>in</strong>ed with the small and the large basis set with a 2 set anda 100 set MSM. The functions displayed were computed from the large basissets. (a) Potential energy for the dihedral coord<strong>in</strong>ate. (b) Projection of firsteigenfunction φ 1 ,comparedtodirectestimatefromthesample. (c)Secondeigenvalue λ 2 . (d) Second implied time scale t 2 . (e) Projection of secondeigenfunction φ 2 .


40 IV. Application to molecules121080.90.80.70.6Estimated φ1, Roothan−HallEstimated φ1, sampl<strong>in</strong>gV(ψ)64φ1(ψ)0.50.40.320−4 −3 −2 −1 0 1 2 3 4ψ(a)0.20.10−4 −3 −2 −1 0 1 2 3 4ψ(b)10.99Estimated λ2, large Roothan−HallEstimated λ2, large MSMEstimated λ2, small Roothan HallEstimated λ2, small MSM181614Second eigenvalue λ20.980.970.96Implied time scale t2121080.950.940 50 100 150Time lag τ6Estimated t2, large Roothan−HallEstimated t2, large MSM4Estimated t2, small Roothan HallEstimated t2, small MSM20 50 100 150Time lag τ(c)(d)0.80.6Estimated φ2, Roothan−HallEstimated φ2, MSM0.40.2φ2(ψ)0−0.2−0.4−0.6−0.8−4 −3 −2 −1 0 1 2 3 4ψ(e)Figure IV.5.: Approximation results for system B. Every fifth step of a 30 million trajectorycorrespond<strong>in</strong>g to a sampl<strong>in</strong>g time step ∆t =10 −3 ps was used. We comparethe results obta<strong>in</strong>ed with the small and the large basis set to a 4 set and a100 set MSM discretization. The functions displayed were computed us<strong>in</strong>g thelarge basis sets. (a) Potential energy for the dihedral angle. (b) Projectionof first eigenfunction φ 1 ,comparedwithdirectestimatefromsampl<strong>in</strong>g. (c)Second eigenvalue λ 2 . (d) Second implied time scale t 2 . (e) Projection ofsecond eigenfunction φ 2 .


IV.3. Application of the Roothan-Hall method 4110.990.98Estimated λ3, large Roothan−HallEstimated λ3, large MSMEstimated λ3, small Roothan HallEstimated λ3, small MSM1210Third eigenvalue λ30.970.960.95Implied time scale t38640.940.930.920 50 100 150Time lag τEstimated t3, large Roothan−Hall2Estimated t3, large MSMEstimated t3, small Roothan HallEstimated t3, small MSM00 50 100 150Time lag τ(a)(b)0.80.60.4Estimated φ3, Roothan−HallEstimated φ3, MSM10.980.96Estimated λ4, large Roothan−HallEstimated λ4, large MSMEstimated λ4, small Roothan HallEstimated λ4, small MSMφ3(ψ)0.20−0.2Fourth eigenvalue λ40.940.92−0.40.9−0.60.88−0.8−4 −3 −2 −1 0 1 2 3 4ψ(c)0.860 50 100 150Time lag τ(d)760.80.6Estimated φ4, Roothan−HallEstimated φ4, MSM50.4Implied time scale t4432Estimated t4, large Roothan−Hall1Estimated t4, large MSMEstimated t4, small Roothan HallEstimated t4, small MSM00 50 100 150Time lag τ(e)φ4(ψ)0.20−0.2−0.4−0.6−4 −3 −2 −1 0 1 2 3 4ψ(f)Figure IV.6.: Cont<strong>in</strong>uation of Figure IV.5. (a) Third eigenvalue λ 3 . (b) Third implied timescale t 3 . (c) Projection of third eigenfunction φ 3 . (d) Fourth eigenvalue λ 4 .(e) Fourth implied time scale t 4 .(f)Projectionoffourtheigenfunctionφ 4 .


42 IV. Application to moleculesSecond eigenvalue λ210.9950.990.9850.980.9750.970.9650.96Estimated λ2, Roothan−HallEstimated λ2, MSM0.9550 20 40 60 80 100Time lag τ(a)Implied time scale t21412108642Estimated t2, Roothan−HallEstimated t2, MSM00 20 40 60 80 100Time lag τ(b)Third eigenvalue λ310.990.980.970.960.950.940.93Estimated λ3, Roothan−HallEstimated λ3, MS<strong>MI</strong>mplied time scale t35.554.543.532.521.50.920.910 20 40 60 80 100Time lag τ(c)1Estimated t3, Roothan−HallEstimated t3, MSM0.50 20 40 60 80 100Time lag τ(d)Fourth eigenvalue λ410.990.980.970.960.950.940.930.92Estimated λ4, Roothan−HallEstimated λ4, MS<strong>MI</strong>mplied time scale t45.554.543.532.521.50.910.90 20 40 60 80 100Time lag τ(e)1Estimated t4, Roothan−HallEstimated t4, MSM0.50 20 40 60 80 100Time lag τ(f)Figure IV.7.: Approximation results for system C. Every fifth step of a 10 million sampl<strong>in</strong>gtrajectory with time step ∆t =10 −3 ps was used. We compare the Roothan-Hall method with a comb<strong>in</strong>ation of the large basis sets of the previous examplesto a 15×15 MSM discretization. (a) Second eigenvalue λ 2 .(b)Secondimpliedtime scale t 2 . (c) Third eigenvalue λ 3 . (d) Third implied time scale t 3 . (e)Fourth eigenvalue λ 4 .(f)Fourthimpliedtimescalet 4 .


V. Summary and outlookWe have presented a method to approximate the eigenvalues and time scales of slow processes<strong>in</strong> stochastic dynamical systems based on the variational pr<strong>in</strong>ciple Theorem II.6 andthe Roothan-Hall method Theorem II.10. WehaveappliedthismethodwithGaussianfunctionsto one-dimensional and a few simple higher dimensional examples of diffusion processesgoverned by Smoluchowski dynamics. So far, we have seen that <strong>in</strong> all cases a conv<strong>in</strong>c<strong>in</strong>gestimate of the <strong>in</strong>volved time scales was achieved. Comparison with results obta<strong>in</strong>ed fromMarkov state model discretizations showed that the variational method can reproduce thequality of these.V.1. Future workFuture work on the subject will have to deal with a number of questions:V.1.1. Application to larger systemsCan we, <strong>in</strong> a cha<strong>in</strong> with many <strong>in</strong>ternal coord<strong>in</strong>ates, obta<strong>in</strong> a conv<strong>in</strong>c<strong>in</strong>g result by simply add<strong>in</strong>gup local basis functions, dependent on one coord<strong>in</strong>ate, chosen on the basis of the molecularenergy function? This assumes that the <strong>in</strong>ternal coord<strong>in</strong>ates are sufficiently <strong>in</strong>dependent toallow this approximation. If it works, it would avoid the use of cluster<strong>in</strong>g techniques andresult <strong>in</strong> a computationally affordable method to compute the dom<strong>in</strong>ant eigenvalues andrelevant time scales. As a first test, we will have to try a five atom system with additionalCoulomb <strong>in</strong>teractions, and see if it works.


44 V. Summary and outlookV.1.2. Choice of basis functionsSo far, we have exclusively used Gaussian functions, and more or less guessed their shapeparameters from the energy function. Can we make a more precise statement about therelation between the shape of the basis functions and the approximation quality, and whatother types of functions can be used?V.1.3. Numerical stabilityWe will have to address sources of numerical <strong>in</strong>stabilities <strong>in</strong>volved <strong>in</strong> our computations. Asmentioned before, choos<strong>in</strong>g basis functions which are either too uncorrelated or correlatedtoo much can lead to mean<strong>in</strong>gless results. Can we f<strong>in</strong>d a way to quantify these <strong>in</strong>stabilitiesand determ<strong>in</strong>e how they are related to the choice of functions?V.1.4. Importance of sampl<strong>in</strong>gThe method does not work without the computation of simulation trajectories. In the examplewe studied, it was easy to compute trajectories which visit all the metastable regionsfrequently often. But what does this mean for greater systems? How can we determ<strong>in</strong>ethe m<strong>in</strong>imum length of a sufficiently long trajectory, and how much does the use of f<strong>in</strong>itesimulations affect all the quantities <strong>in</strong>volved <strong>in</strong> the method, like the H- andS-matrix?


A. AppendixA.1. Diffusion <strong>in</strong> a quadratic potentialLet us prove the properties of the diffusion <strong>in</strong> a quadratic potential as <strong>in</strong>troduced <strong>in</strong> sectionIII.4. First,letusderiveananalyticexpressionfortheproblem’ssolution.Onstochastic<strong>in</strong>tegrals and Ito’s formula, see [Evans, ch.4].Lemma A.1: If the process is <strong>in</strong>itially distributed accord<strong>in</strong>g to X 0 ,thenthedistributionattime t is given by the stochastic <strong>in</strong>tegralX t = e −θt X 0 + σ t0dB s e −θ(t−s) .(A.1)Proof. The proof can also be found on [Wikipedia]. Let X t be the solution of the correspond<strong>in</strong>gstochastic differential equation, i.e.dX t = −θX t dt + σdB t .(A.2)Consider the function f(X t ,t):=e θt X t . Accord<strong>in</strong>g to Ito’s formula, f(X t ,t) satisfies theSDE:d(f(X t ,t)) = ∂f ∂fdt +∂t ∂x dX t + 1 ∂ 2 f2 ∂x 2 σ2 dt(A.3)= θe θt X t dt − θe θt X t dt + e θt σdB t = e θt σdB t . (A.4)The process f(X t ,t) is therefore given by the stochastic <strong>in</strong>tegrale θt X t = f(X t ,t)=X 0 + σ t0dB s e θs .(A.5)


46 A. AppendixMultiply<strong>in</strong>g the last expression by e −θt gives the desired result.Now we can confirm the claims about the expectation and variance.stochastic <strong>in</strong>tegrals we use can be found aga<strong>in</strong> <strong>in</strong> [Evans, ch. 4].The properties ofLemma A.2: The solution X t from Lemma A.1 satisfiesE [X t ]=e −θt E [X 0 ] ,ω 2 [X t ]=e −2θt ω 2 [X 0 ]+(1− e −2θt )α,(A.6)(A.7)with α = k BTb= σ22θ .Proof. Directly us<strong>in</strong>g Equation A.1, wef<strong>in</strong>d tE [X t ]=e −θt E [X 0 ]+σE dB s e −θ(t−s) = e −θt E [X 0 ] , (A.8)0s<strong>in</strong>ce the expectation of stochastic <strong>in</strong>tegrals over scalar functions is always zero. Similarly,us<strong>in</strong>g that the expectation of the square of a stochastic <strong>in</strong>tegral over a scalar function equalsthe normal <strong>in</strong>tegral over the square of that function, we computeE Xt2 = e −2θt E t t 2X02 +2σe −θt E [X 0 ] E dB s e −θ(t−s) + σ 2 E dB s e −θ(t−s)0= e −2θt E tX0 2 + σ 2 E ds e −2θ(t−s) (A.9)(A.10)0= e −2θt E X02 σ 2+2θ (1 − e−2θt ).(A.11)Therefore,ω 2 [X t ]=E Xt2 − (E [Xt ]) 2 = e −2θt E X02 + α(1 − e −2θt ) − e −2θt (E [X 0 ]) 2 (A.12)= e −2θt ω 2 [X 0 ]+α(1 − e −2θt ). (A.13)0Next, let us determ<strong>in</strong>e the transition kernel p(x, y; τ). Ifthecurrentpositionoftheprocessis known to be x ∈ Ω, itscurrentdistributionisadeltafunctioncentredatx. Consequently,by the preced<strong>in</strong>g Lemma, the distribution at time τ>0 has expectation value e −θτ x andvariance α(1 − e −2θτ ). We can verify that this distribution is a Gaussian function withprecisely these shape parameters:


A.1. Diffusion <strong>in</strong> a quadratic potential 47Lemma A.3: With ν := (1−e −2θτ ),thetransitionfunctionp(x, y; τ) is given by a Gaussianp(x, y; τ) =1√2πανexp− (y − e−θτ x) 22αν. (A.14)Proof. We show that p satisfies the Smoluchowski Equation III.6 with <strong>in</strong>itial conditionp(x, · ;0) = δ x . S<strong>in</strong>ce the <strong>in</strong>itial condition is clearly fulfilled, we check that the differentialequation holds for all τ>0. In order to do so, we note that θ = b2 = D α mγk B,whereasT α 2= θ. Ontheonehandwef<strong>in</strong>d:D= bα mγ ∂p 2(y − e −θτ∂τ = − x)θe −θτ x2αν − (y − e −θτ x) 2 4αθe −2θτp − 1 4α 2 ν 22 p2πα2θe−2θτ (A.15)2πανb 2=mγk B T p − yxe−θτ − e −2θτ x 2 − yxe −3θτ + x 2 e −4θτ − y 2 e −2θτ +2yxe −3θτ − x 2 e −4θτν 2− p θe−2θτνb 2=mγk B T p − yxe−θτ − e −2θτ x 2 + yxe −3θτ − y 2 e −2θτν 2On the other hand we have:1 ∂∂2(byp)+Dmγ ∂y ∂y p = θpν2 ν − b2 − yxe −θτ )νmγk B T p(y2 ν 2+b 2 − 2yxe −θτ + x 2 e −2θτmγk B T py2 ν 2= −p θe−2θτν++− p θe−2θτν− pθν(A.16)(A.17). (A.18)(A.19)(A.20)b 2 + yxe −θτ + y 2 e −2θτ − yxe −3θτmγk B T p−y2 ν(A.21)b 2 − 2yxe −θτ + x 2 e −2θτmγk B T py2 ν 2= −p θe−2θτνS<strong>in</strong>ce all terms cancel, the equation holds.+(A.22)b 2mγk B T p−yxe−θτ + y 2 e −2θτ − yxe −3θτ + x 2 e −2θτν 2 .(A.23)Know<strong>in</strong>g this, we can prove the statement about the eigenfunctions of the propagator:


48 A. AppendixProof of Lemma III.4. For convenience, we restrict the proof to the case that α =1and leave out all pre-factors. The case i =1was already treated when we discussed thestationary distribution, so let us check the statement for i =2. By partial <strong>in</strong>tegration wef<strong>in</strong>d that:dx p(x, y; τ)x exp − x2= −2== e−θτν− e−2θτνdx p(x, y; τ) d dx exp − x22dx e −θτ (y − e−θτ x)νyp(x, y; τ)expdx p(x, y; τ)exp− x22dx p(x, y; τ)x exp− x22 − x22(A.24)(A.25)(A.26). (A.27)Upon multiplication by ν, weseethatthelasttermcancels,andweareleftwith: dx p(x, y; τ)x exp− x2= e −θτ y dx p(x, y; τ)exp− x2= e −θτ y exp22− y2.2(A.28)Hav<strong>in</strong>g established the first two cases, we can prove the assertion for general i by <strong>in</strong>duction.We will need the recursion relation for Hermite polynomials, which readsH i+1 (x) =xH i (x) − iH i−1 (x),(A.29)and also holds for the eigenfunctions φ i .Furthermore,similarlytothefirstcase,wecanusethat φ i (x) =(−1) i di exp(− x2 ),whichisonewaytodef<strong>in</strong>etheHermitefunctions. Thedx i 2course of the proof is then quite the same as for i =2: dx p(x, y; τ)φ i+1 (x) =(−1) i+1 dx p(x, y; τ) di+1dx exp − x2(A.30)i+1 2=(−1) i dx e −θτ (y − e−θτ x)p(x, y; τ) diνdx exp − x2(A.31)i 2= e−θτνy dx p(x, y, τ)φ i (x) − e−2θτdx p(x, y; τ)xφ i (x)ν(A.32)= e−θτνλ iyφ i (y) − e−2θτνdx p(x, y; τ)[φ i+1 (x)+iφ i−1 (x)] .(A.33)Aga<strong>in</strong>, multiply<strong>in</strong>g the equation by ν elim<strong>in</strong>ates the terms conta<strong>in</strong><strong>in</strong>g φ i+1 and the factore −2θτ .Weareleftwith:dx p(x, y; τ)φ i+1 (x) =e −θτ λ i yφ i (y) − ie −2θτ dxp(x, y; τ)φ i−1 (x) (A.34)= e −θτ λ i yφ i (y) − ie −2θτ λ i−1 φ i−1 (y). (A.35)


A.1. Diffusion <strong>in</strong> a quadratic potential 49By assumption, we have that e −θτ λ i−1 = λ i . With one more application of the recursionrelation, we arrive at the f<strong>in</strong>al result:dx p(x, y; τ)φ i+1 (x) =e −θτ λ i [yφ i (y) − iφ i−1 (y)](A.36)= e −θτ λ i φ i+1 (y), (A.37)which proves both the assertions about the eigenfunctions and the correspond<strong>in</strong>g eigenvalues.The pre-factors are then chosen to assure that the eigenfunctions are normalized.


AcknowledgementsIwouldliketothankallthosewhohavesupportedmewhileIwaswork<strong>in</strong>gonthisthesis.In particular, thank you to Frank Noé, for offer<strong>in</strong>g me to work on the subject and for allthe time you spent help<strong>in</strong>g me and giv<strong>in</strong>g me advice throughout the last year. I am alsovery grateful to Guillermo Pérez Hernández, who spent long hours help<strong>in</strong>g me with theprogramm<strong>in</strong>g and many other problems. Thank you very much, Phillip Berndt, for all of thegreat conversations we had <strong>in</strong> the past years, and for answer<strong>in</strong>g all my questions at any time.Thank you, Tamara Wallenhauer, for all the support and encouragement you had for me, aswell as for your patience with me. I would also like to say thanks to my family and manyother of my friends for help<strong>in</strong>g me through this time.


Bibliography[Behrends, 2011] Ehrhard Behrends,Markovprozesse und stochastische Differentialgleichungen,<strong>FU</strong> Berl<strong>in</strong>, 2011,http://page.mi.fu-berl<strong>in</strong>.de/bhrnds/stochdg2011/[Evans] Lawrence C. Evans,An Introduction to Stochastic Differential Equations,University of California,http://math.berkeley.edu/~evans/SDE.course.pdf[Noé, 2011] Frank Noé,A <strong>Variational</strong> Approach to Modell<strong>in</strong>g Slow Processes <strong>in</strong> Stochastic Dynamical Systems,<strong>FU</strong> Berl<strong>in</strong>, 2011,http://compmolbio.biocomput<strong>in</strong>g-berl<strong>in</strong>.de/<strong>in</strong>dex.php/publications[Pr<strong>in</strong>z et al, 2011] Jan-Hendrik Pr<strong>in</strong>z, Hao Wu, Marco Sarich, Bett<strong>in</strong>a Keller, Mart<strong>in</strong> Senne,Mart<strong>in</strong> Held, John D. Chodera, Christof Schütte and Frank Noé,Markov models of molecular k<strong>in</strong>etics: Generation and validation,The Journal of Chemical Physics 134, 174105, 2011[Sarich, Noé, Schütte, 2010] Marko Sarich, Frank Noé and Christof Schütte,On the Approximation Quality of Markov State Models,SIAM Multiscale Model. Simul., 8 : 1154 - 1177, 2010.[Schütte, Huis<strong>in</strong>ga, Deuflhard, Fischer, 1999] Christof Schütte, Alexander Fischer, WilhelmHuis<strong>in</strong>ga and Peter DeuflhardA Direct Approach to <strong>Conformation</strong>al <strong>Dynamics</strong> based on Hybrid Monte Carlo,J. Comput. Phys., 151 : 146 - 168, 1999[Szabo, Ostlund, 1989] Attila Szabo and Neil S. Ostlund,Modern Quantum Chemistry,1st edition, revised, McGraw-Hill Publish<strong>in</strong>g, New York, 1989[VMD, 1996] W. Humphrey, A. Dalke and K. Schulten,VMD - Visual Molecular <strong>Dynamics</strong>,J. Molec. Graphics, 14 : 33 - 38.[Weber, 2010] Markus Weber,


54 BibliographyA Subspace Approach to Molecular Markov State Models via an Inf<strong>in</strong>itesimal Generator,ZIB report 09-27, 2010,http://www.zib.de/weber/[Werner, 2002] Dirk Werner,Funktionalanalysis,4th edition, Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>, 2002[Wikipedia] Ornste<strong>in</strong>-Uhlenbeck process,http://en.wikipedia.org/wiki/Ornste<strong>in</strong>-Uhlenbeck_process[Zwanzig, 2001] Robert Zwanzig,Nonequilibrium Statistical Mechanics,1st edition, Oxford University Press, Oxford, 2001

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!