11.07.2015 Views

Stochastic Analysis and Mathematical Finance - Institut for ...

Stochastic Analysis and Mathematical Finance - Institut for ...

Stochastic Analysis and Mathematical Finance - Institut for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Stochastic</strong> <strong>Analysis</strong><strong>and</strong> <strong>Mathematical</strong> <strong>Finance</strong>with applications of the Malliavin calculusto the calculation of risk numbersAlex<strong>and</strong>er SokolSpeciale <strong>for</strong> c<strong>and</strong>.scient graden i matematik<strong>Institut</strong> <strong>for</strong> Matematiske FagKøbenhavns UniversitetThesis <strong>for</strong> the master degree in mathematicsDepartment of <strong>Mathematical</strong> SciencesUniversity of CopenhagenDate of Submission: September 8, 2008Advisor: Rolf PoulsenCo-advisors: Bo Markussen & Martin JacobsenDepartment of <strong>Mathematical</strong> SciencesUniversity of Copenhagen


Contents1 Introduction 52 Preliminaries 112.1 The Usual Conditions <strong>and</strong> Brownian motion . . . . . . . . . . . . . . . . 112.2 <strong>Stochastic</strong> processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.3 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.4 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.5 Localisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 The Itô Integral 393.1 The space of elementary processes . . . . . . . . . . . . . . . . . . . . . 403.2 Integration with respect to Brownian motion . . . . . . . . . . . . . . . 513.3 Integration with respect to integrators in M L W . . . . . . . . . . . . . . 563.4 Integration with respect to st<strong>and</strong>ard processes . . . . . . . . . . . . . . . 673.5 The Quadratic Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . 763.6 Itô’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883.7 Itô’s Representation Theorem . . . . . . . . . . . . . . . . . . . . . . . . 923.8 Girsanov’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1023.9 Why The Usual Conditions? . . . . . . . . . . . . . . . . . . . . . . . . . 1173.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1214 The Malliavin Calculus 1274.1 The spaces S <strong>and</strong> L p (Π) . . . . . . . . . . . . . . . . . . . . . . . . . . . 1294.2 The Malliavin Derivative on S <strong>and</strong> D 1,p . . . . . . . . . . . . . . . . . . 1344.3 The Malliavin Derivative on D 1,2 . . . . . . . . . . . . . . . . . . . . . . 1424.4 Orthogonal subspaces of L 2 (F T ) <strong>and</strong> L 2 (Π) . . . . . . . . . . . . . . . . 1544.5 Further theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1744.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175


Chapter 1IntroductionThis thesis is about stochastic integration, the Malliavin calculus <strong>and</strong> mathematicalfinance. The final <strong>for</strong>m of the thesis has developed very gradually, <strong>and</strong> was certainlynot planned in detail from the beginning.A short history of the development of the thesis. I started out with a vagueidea about developing the stochastic integral with respect to what in Øksendal (2005)is known as an Itô process, only to a higher level of rigor than seen in, say, Øksendal(2005) or Steele (2000), <strong>and</strong> in a self-contained manner, <strong>and</strong> furthermore showingthat this theory rigourously could be used to obtain the fundamental results of mathematicalfinance. The novelty of this is that ordinary accounts of this type almostalways either has some heuristic points or apply more advanced results, such as theLévy characterisation theorem, which are left unproved in the context of the account.I also wanted the thesis to contain some practical aspects, <strong>and</strong> I knew that the calculationof risk numbers is a very important issue <strong>for</strong> practicioners. I had heard about theresults of Fournié et al. (1999), concisely introduced by Benhamou (2001), applyingthe Malliavin calculus to the problem of calculating risk numbers such as the deltaor gamma. When investigating this, I obtained two conclusions. First, the litteraturepertaining the Malliavin calculus is rather difficult, with proofs containing very littledetail. Second, the numerical investigations of the applications of the Malliavin calculusto the calculation of risk numbers have mostly been constrained to the contextof the Black-Scholes model. My idea was to attempt to amend these two issues, first


6 Introductionoff by developing the fundamentals of the Malliavin calculus in higher detail, <strong>and</strong> secondlyby applying the results to a more advanced model than the Black-Scholes model,namely the Heston model.In the first few months of work, I developed the hope that I could find the time<strong>and</strong> ability not only to construct the stochastic integral <strong>for</strong> Brownian motion <strong>and</strong>continuous finite variation procesess, but also consider discontinuous finite variationprocesses. Furthermore, I expected to be able to develop the entire fundament ofthe Malliavin calculus, including the Skorohod integral, the result on the Malliavinderivative of the solution to a stochastic differential equation <strong>and</strong> detailed proofs ofthe results of Fournié et al. (1999). I also vainly expected to find time to considerPDE methods <strong>for</strong> pricing in finance <strong>and</strong> to investigate calculation of risk numbers <strong>for</strong>path-dependent options. In retrospect, all this was very optimistic, <strong>and</strong> I have endedup restricting myself to stochastic integration <strong>for</strong> continuous processes, discussing onlythe fundamental results of the Malliavin derivative operator <strong>and</strong> not even going as faras to introduce the Skorohod integral. This means that the theory of the Malliavincalculus presented here is not even strong enough to be used to prove the results ofFournié et al. (1999). On the other h<strong>and</strong>, it is indisputably very rigorous <strong>and</strong> detailed,<strong>and</strong> thus constitutes a firm foundation <strong>for</strong> further theory. Finally, I also had to <strong>for</strong>getabout the idea of PDE methods <strong>and</strong> path-dependent options in finance.The final content of the thesis, then, is this: A development of the stochastic integral<strong>for</strong> Brownian motion <strong>and</strong> continuous finite variation processes, applied to set up somesimple financial models <strong>and</strong> to produce sufficient conditions <strong>for</strong> the absence of arbitrage.The Malliavin deriative operator is also developed, but it is necessary to referto the litterature <strong>for</strong> the results applicable to finance. Some Monte Carlo methods <strong>for</strong>evaluating expectations <strong>and</strong> sensitivities are reviewed, <strong>and</strong> finally we apply all of thisto pricing <strong>and</strong> calculating risk numbers in the Black-Scholes <strong>and</strong> Heston models.It can be sometimes difficult to distinguish between well-known <strong>and</strong> original work.Many of the results in this thesis are well-known, but <strong>for</strong>mulated in a considerablydifferent manner, often opting <strong>for</strong> the most pedagogical approach available. There arealso many minor original results. The highlights of the original contributions in thethesis are these:• In Theorem 3.8.3, a proof of the Girsanov theorem not invoking the Lévy characterisationtheorem, building on the ideas presented in Steele (2000).


7• In Theorem 4.3.5, an extension of the chain rule of the Malliavin calculus, removingthe condition of boundedness of the partial derivatives.• In Section 5.6, identification of an erroneous expression of the Malliavin estimatorin Benhamou (2001) <strong>for</strong> the Heston model <strong>and</strong> correction of this.Other significant results are of course all of the numerical results <strong>for</strong> the Black-Scholes<strong>and</strong> Heston models in Section 5.5 <strong>and</strong> Section 5.6. Also, the proof of agreement of theintegrals in Theorem 3.8.4 is new. Furthermore, in Appendix C there can be found anew proof of the existence of the quadratic variation <strong>for</strong> continuous local martingalesnot using stochastic integration at all. Also, a version of Urysohn’s Lemma is provedyielding an expression <strong>for</strong> the Lipschitz constant of the bump function.Structure of the thesis. Overall, the thesis first develops the stochastic integral,secondly develops some of the basic results of the Malliavin calculus <strong>and</strong> finally appliesall of this to mathematical finance.In Chapter 2, we develop some preliminary results which will be essential to the restof the thesis. We review some results on Brownian motion <strong>and</strong> the usual conditions,martingales <strong>and</strong> the like. Furthermore, we develop a general localisation concept whichwill be essential to the smooth development of the stochastic integral.Chapter 3 develops the Itô integral. We begin with some results on the space ofelementary processes, rigorously proving that the usual definition of the stochasticintegral is independent of the representation of the elementary process. Furthermore,we prove a general density result <strong>for</strong> elementary processes. These preparations makethe development of the integral <strong>for</strong> Brownian motion very easy. We then extendthe integral to integrate with respect to integral proceses <strong>and</strong> continuous processesof finite variation. After having constructed the stochastic integral in the level ofgenerality necessary, we define the quadratic variation <strong>and</strong> use it to prove Itô’s <strong>for</strong>mula,Itô’s representation theorem <strong>and</strong> Girsanov’s theorem. We also prove Kazamaki’s <strong>and</strong>Novikov’s conditions <strong>for</strong> an exponental martingale to be a true martingale. Finally,we discuss the pros <strong>and</strong> cons of the usual conditions.We then proceed to Chapter 4, containing the basic results on the Malliavin calculus.We define the Malliavin derivative slightly differently than in Nualart (2006), initiallyusing coordinates of the Brownian motion instead of stochastic integrals with deterministicintegr<strong>and</strong>s. We then extend the operator by the usual closedness argument.


8 IntroductionNext, we prove the chain rule <strong>and</strong> the integration-by-parts <strong>for</strong>mula. We use the generalizedchain rule to show that our definition of the Malliavin derivative coincides withthat in Nualart (2006). Finally, we develop the Hilbert space theory of the Malliavinderivative <strong>and</strong> use it to obtain a chain rule <strong>for</strong> Lipschitz trans<strong>for</strong>mations.Chapter 5 contains the applications to mathematical finance. We first define a simpleclass of financial market models <strong>and</strong> develop sufficent criteria <strong>for</strong> the absence ofarbitrage, <strong>and</strong> we discuss what a proper price <strong>for</strong> a contingent claim is. After this,we take a quick detour to introduce some fundamental concepts from the theory ofSDEs, <strong>and</strong> we go through some Monte Carlo methods <strong>for</strong> evaulation of expectations<strong>and</strong> sensitivities. Finally, we begin the numerical experiments. We first consider theBlack-Scholes model, where we evaluate call prices, call deltas <strong>and</strong> digital deltas. Thepurpose of this is mainly a preliminary evaluation of the methods in a simple context,allowing us to gain an idea of their effectiveness <strong>and</strong> applicability. After this, we tryour luck at the Heston model. We prove absence of arbitrage building on some ideasfound in Cheridito et al. (2005). We then consider some discretisation schemes <strong>for</strong>simulating from the Heston model, <strong>and</strong> apply these to calculation of the digital delta.We compare different methods <strong>and</strong> conclude that the localised Malliavin method isthe superior one. Finally, we outline the difference between our Malliavin estimator<strong>and</strong> the one found in Benhamou (2001), illustrating the resulting discrepancy.Finally, in Chapter 6, we discuss our results <strong>and</strong> opportunities <strong>for</strong> further work.The appendices also contain several important results, with original proofs. AppendixA mostly contains well-known results, but the sections on Hermite polynomials <strong>and</strong>Hilbert spaces contain proofs of results which are hard to find in the litterature. AppendixB contains basic results from measure theory, but also has in Section B.3 anoriginal result on convergence of normal distributions which is essential to our developmentof the Hilbert space theory of the Malliavin calculus. Furthermore, in AppendixC, we prove some novel results which ultimately turned out not to be necessary <strong>for</strong> themain parts of the thesis: An existence proof <strong>for</strong> the quadratic variation <strong>and</strong> a Lipschitzversion of Urysohn’s lemma.Prerequisites. The thesis is written with a reader in mind who has a good grasp ofthe fundamentals of real analysis, measure theory, probability theory such as can befound in Carothers (2000), Hansen (2004a), Dudley (2002), Jacobsen (2003) <strong>and</strong>Rogers & Williams (2000a). Some Hilbert space theory <strong>and</strong> functional analysis is alsoapplied, <strong>for</strong> introductions to this, see <strong>for</strong> example Hansen (2006) <strong>and</strong> Meise & Vogt


9(1997).Final words. I would like to thank by advisor Rolf Poulsen <strong>and</strong> by co-advisors BoMarkussen <strong>and</strong> Martin Jacobsen <strong>for</strong> their support in writing this thesis. Furthermore,I would like to give a special warm thanks to Martin Jacobsen <strong>and</strong> Ernst Hansen <strong>for</strong>teaching me measure theory <strong>and</strong> probability theory <strong>and</strong> spending so much time alwaysanswering mails <strong>and</strong> discussing whatever mathematical problems I have had at h<strong>and</strong>.


10 Introduction


Chapter 2PreliminariesThis chapter contains results which are fundamental to the theory that follows, inparticular the stochastic calculus, which relies heavily on the machinery of stoppingtimes, martingales <strong>and</strong> localisation. Many of the basic results in this chapter areassumed to be well-known, <strong>and</strong> we there<strong>for</strong>e often refer to the litterature <strong>for</strong> proofsof the results. Other results, such as the results on progressive measurability <strong>and</strong> theBrownian filtration, are more esoteric, <strong>and</strong> we give full proofs.The first chapter of Karatzas & Shreve (1988) <strong>and</strong> the second chapter of Rogers &Williams (2000a) are excellent sources <strong>for</strong> most of the results in this section.2.1 The Usual Conditions <strong>and</strong> Brownian motionLet (Ω, F, P, F t ) be a filtered probability space. Recall that the usual conditions <strong>for</strong> afiltered probability space are the conditions:1. The σ-algebra F is complete.2. The filtration is right-continuous, F t = ∩ s>t F s .3. For t ≥ 0, F t contains all null sets in F.


12 PreliminariesIn every section of this thesis except this one, we are going to assume the usualconditions. We are also going to work with Brownian motion throughout this thesis.We there<strong>for</strong>e need to spend some time making sure that we can make our Brownianmotion work together with the usual conditions. This is the subject matter of thissection.The reasons <strong>for</strong> assuming the usual conditions <strong>and</strong> ways to avoid assuming the usualconditions will be discussed in Section 3.9. Many of our results also hold without theusual conditions, but to avoid clutter, we will consistently assume the usual conditionsthroughout the thesis.Our first aim is to underst<strong>and</strong> how to augment a filtered probability space so that itfulfills the usual conditions. In the following, N denotes the null sets of F. Be<strong>for</strong>e wecan construct the desired augmentation of a filtered probability space, we need a fewlemmas.Lemma 2.1.1. Letting G = σ(F, N ), it holds thatG = {F ∪ N|F ∈ F, N ∈ N },<strong>and</strong> P can be uniquely extended from F to a probability measure P ′ on G by puttingP ′ (F ∪ N) = P (F ). The space (Ω, G, P ′ ) is called the completion of (Ω, F, P ).Proof. We first prove the equality <strong>for</strong> G. Define H = {F ∪ N|F ∈ F, N ∈ N }. It isclear that H ⊆ G. We need to prove the opposite inclusion. To do so, we prove directlythat H is a σ-algebra containing F <strong>and</strong> N , the inclusion follows from this. It is clearthat Ω ∈ H. If H ∈ H with H = F ∪ N, we obtain, with B ∈ F such that P (B) = 0<strong>and</strong> N ⊆ B,H c = (F ∪ N) c= (B c ∩ (F ∪ N) c ) ∪ (B ∩ (F ∪ N) c )= (B ∪ F ∪ N) c ∪ (B ∩ (F ∪ N) c )= (B ∪ F ) c ∪ (B ∩ (F ∪ N) c ),so since (B ∪ F ) c ∈ F <strong>and</strong> B ∩ (F ∪ N) c ∈ N , we find H c ∈ H. If (H n ) ⊆ H withH n = F n ∪ N n , we find(n⋃∞) (⋃∞)⋃H n = F n ∪ N n ,n=1n=1n=1


2.1 The Usual Conditions <strong>and</strong> Brownian motion 13showing ∪ ∞ n=1H n ∈ H. We have now proven that H is a σ-algebra. Since it containsF <strong>and</strong> N , we conclude G ⊆ H.It remains to show that P can be uniquely extended from F to G. We begin by provingthat the proposed extension is well-defined. Let G ∈ G with two decompositionsG = F 1 ∪ N 1 <strong>and</strong> G = F 2 ∪ N 2 . We then have G c = F2 c ∩ N2. c Thus, if ω ∈ F2 c <strong>and</strong>ω ∈ G, then ω ∈ N 2 , showing F2 c ⊆ G c ∪ N 2 <strong>and</strong> there<strong>for</strong>eF 1 ∩ F c 2 ⊆ F 1 ∩ (G c ∪ N 2 ) = F 1 ∩ N 2 ⊆ N 2 ,<strong>and</strong> analogously F 2 ∩F c 1 ⊆ N 1 . We can there<strong>for</strong>e conclude P (F 1 ∩F c 2 ) = P (F 2 ∩F c 1 ) = 0<strong>and</strong> as a consequence,P (F 1 ) = P (F 1 ∩ F 2 ) + P (F 1 ∩ F c 2 ) = P (F 2 ∩ F 1 ) + P (F 2 ∩ F c 1 ) = P (F 2 ).This means that putting P ′ (G) = P (F ) <strong>for</strong> G = F ∪ N is a definition independent ofthe representation of G, there<strong>for</strong>e well-defined. That P ′ is a probability measure on Gextending P is obvious.In the following, we let (Ω, G, P ) be the completion of (Ω, F, P ). In particular, we usethe same symbol <strong>for</strong> the measure on F <strong>and</strong> its completion. This will not cause anyproblems.Lemma 2.1.2. Let H be a sub-σ-algebra of G. Thenσ(H, N ) = {G ∈ G|G∆H ∈ N <strong>for</strong> some H ∈ H},where ∆ is the symmetric difference, G∆H = (G \ H) ∪ (H \ G).Proof. Define K = {G ∈ G|G∆H ∈ N <strong>for</strong> some H ∈ H}. We begin by arguing thatK ⊆ σ(H, N ). Let G ∈ K <strong>and</strong> let H ∈ H be such that G∆H ∈ N . We then findG = (H ∩ G) ∪ (G \ H)= (H ∩ (H ∩ G c ) c ) ∪ (G \ H)= (H ∩ (H \ G) c ) ∪ (G \ H).Since P (G∆H) = 0, H \ G <strong>and</strong> G \ H are both null sets, <strong>and</strong> we conclude thatG ∈ σ(H, N ). Thus K ⊆ σ(H, N ).To show the other inclusion, we will show that K is a σ-algebra containing H <strong>and</strong>N . If G ∈ H, we have G∆G = ∅ ∈ N <strong>and</strong> there<strong>for</strong>e G ∈ K. If N ∈ N , we have


18 PreliminariesThe conclusion follows.Theorem 2.1.8 is the result that will allow us to assume the usual conditions in whatfollows. Next, we prove Blumenthal’s zero-one law.Lemma 2.1.9. It holds that F t+ = σ(F t , N t+ ), where N t+ are the null sets of F t+ .Proof. It is clear that σ(F t , N t+ ) ⊆ F t+ , so it will suffice to show the other inclusion.Let A ∈ F t+ , <strong>and</strong> put ξ = 1 A − E(1 A |F t ). Our first step will be to prove that ξis almost surely zero, <strong>and</strong> to do so, we first prove E(ξ1 B ) = 0 <strong>for</strong> B ∈ F ∞ , whereF ∞ = σ(∪ t≥0 F t ).To this end, note that with U t = σ(W t+s − W t ) s≥0 , F ∞ = σ(F t , U t ). The sets of the<strong>for</strong>m C ∩ D, where C ∈ F t <strong>and</strong> D ∈ U t , are then a generating system <strong>for</strong> F ∞ , stableunder intersections. Using Lemma 2.1.6, we see that U t <strong>and</strong> F t+ are independent.Since ξ is F t+ measurable, we conclude, with C ∈ F t <strong>and</strong> D ∈ U t ,E(ξ1 C∩D ) = P (D)E(ξ1 C )= P (D)E((1 A − E(1 A |F t ))1 C ).Now, by the definition of conditional expectation, EE(1 A |F t )1 C ) = E1 A 1 C , <strong>and</strong> weconclude E(ξ1 C∩D ) = 0. Then, by the Dynkin Lemma, E(ξ1 B ) = 0 <strong>for</strong> all B ∈ F ∞ ,<strong>and</strong> there<strong>for</strong>e ξ is zero almost surely. Since ξ is F t+ measurable, this means that thereexists N ∈ F t+ with P (N) = 0 such that ξ = 0 on N c , <strong>and</strong> there<strong>for</strong>e1 A = 1 A 1 N + 1 A 1 N c= 1 A∩N + ξ1 N c + E(1 A |F t )1 N c= 1 A∩N + E(1 A |F t )1 N c.Because A ∩ N ∈ N t+ , the right-h<strong>and</strong> side is measurable with respect to σ(F t , N t+ )<strong>and</strong> there<strong>for</strong>e A ∈ σ(F t , N t+ ).Lemma 2.1.10 (Blumenthal’s 0-1 law). For any A ∈ G 0 , P (A) is either zero orone.Proof. Let H be the sets of G 0 where the lemma holds. It is clear that H is a Dynkinsystem. Using Lemma 2.1.9, we obtainG 0 = σ(F 0+ , N ) = σ(F 0 , N 0+ , N ) = σ(F 0 , N ).


FEDERAZIONE ITALIANA PALLACANESTROCOMITATO ITALIANO ARBITRIIn tutte le situazioni, nel corso della gara, in cui gli arbitri segnalano unsalto a due si eseguirà una rimessa dalle linee perimetrali secondo laregola del possesso alternato.Si evidenzia che il principio del possesso alternato si applica solo nei casisopraindicati, mentre tutte le “normali” rimesse laterali seguono ledisposizioni regolamentari.Nel processo del possesso alternato la palla diventa viva con una rimessain gioco dalle linee perimetrali, anziché con un salto a due.Nel corso della gara, qu<strong>and</strong>o si verifica una situazione di salto a due(fischio arbitrale seguito dal segnale n° 24 braccia alte e pollici rivoltiverso l’alto,seguiti dall’indicazione della direzione della freccia ), il giocoriprenderà procedendo con il metodo del possesso alternato.Il diritto di una squadra al possesso alternato sarà sempre indicato da unaPALETTA – FRECCIA visibile ( manuale od elettronica ) posta sul tavolodegli Ufficiali di Campo, che punterà in direzione del canestro avversario.*dal regolamento : IL CANESTRO AVVERSARIO È QUELLO IN CUI LASQUADRA SEGNA I PUNTI VALIDI *Responsabile del posizionamento della paletta-freccia è il segnapunti.A seguito del salto a due effettuato all’inizio del 1° periodo il segnapuntiindividuerà la squadra che ha acquisito il primo possesso legale dellapalla e posizionerà la paletta-freccia nella direzione del canestro dellasquadra stessa ( quello in cui attacca la squadra avversaria)Nelle eventuali successive situazioni di salto a due le squadre sialterneranno nell’effettuare la rimessa dalle linee perimetrali del campo.Il verso della paletta/freccia, indicante il nuovo diritto di rimessa laterale,verrà cambiato nel momento in cui, dopo che l’arbitro ha messo la palla adisposizione del giocatore, questa è stata toccata o ha toccato ungiocatore in campo.Nella situazione in cui la squadra che effettua la rimessa da fuori campoper possesso alternato, commetta una violazione alle norme stabilite perquesto caso ( spostamento dal punto di rimessa o utilizzo di più di 5secondi per la rimessa stessa) perde l’opportunità del diritto alla rimessaalternata.IL SEGNAPUNTI CAMBIERÀ LA DIREZIONE DELLA PALETTA/FRECCIA.Nella situazione in cui, nel corso di una rimessa per possessoalternato,venga commesso un fallo da una delle due squadre, la squadranon perderà il diritto di rimessa, l’arbitro procederà alla gestione dellasanzione e, ALLA RIPRESA DEL GIOCO, IL SEGNAPUNTI NONCAMBIERÀ IL VERSO DELLA PALETTA/FRECCIA.Nella situazione di salto a due iniziale, con possesso illegale della palla,l’arbitro fischierà la violazione e farà rimettere dalle linee perimetrali allasquadra avente diritto.Nel momento preciso in cui la palla verrà toccata in campo,Pagina 20 di 20SETTEMBRE 2006


2.2 <strong>Stochastic</strong> processes 21Lemma 2.2.2. Let Σ π be the familiy of sets A ∈ B[0, ∞) ⊗ F such that A satisfiesA ∩ [0, t] × Ω ∈ B[0, t] ⊗ F t <strong>for</strong> all t ∈ [0, ∞). Then Σ π is a σ-algebra, <strong>and</strong> a processX is progressively measurable if <strong>and</strong> only if it is Σ π -measurable.Proof. That Σ π is a σ-algebra is clear. The equality(X ∈ A) ∩ [0, t] × Ω = (X |[0,t]×Ω ∈ A)shows that progressive measurability is equivalent to Σ π measurability.Lemma 2.2.1 <strong>and</strong> Lemma 2.2.2 show that the concepts of being measurable <strong>and</strong>adapted <strong>and</strong> of being progressive correspond to conventional measurability propertieswith respect to the σ-algebras Σ A <strong>and</strong> Σ π , respectively. These results will enableus to apply the usual machinery of measure theory when dealing when these measurabilityconcepts. The next results demonstrate some of the interplay between theregularity properties of stochastic processes. In particular, Lemma 2.2.3 shows thatΣ π ⊆ Σ A .Lemma 2.2.3. Let X be progressive. Then X is measurable <strong>and</strong> adapted.Proof. That X is measurable follows from Lemma 2.2.2. To show that X is adapted,note that when X is progressive, X |[0,t]×Ω is B[0, t] ⊗ F t -measurable, <strong>and</strong> there<strong>for</strong>eω ↦→ X(t, ω) is F t -measurable.Lemma 2.2.4. Let X be adapted <strong>and</strong> continuous. Then X is progressively measurable.Proof. Define X n (t) = X( 12 n [2 n t]). We can then writeX n (t) =∞∑k=0( ) kX2 n 1 [k2 n , k+1)(t).2 nSince X is adapted, X( k2) is F n k measurable. There<strong>for</strong>e, each term in the sum is2n progressive. Since X n converges pointwise to X by the continuity of X, we concludethat X is progressive as the limit of progressive processes.Comment 2.2.5 In the proof of the lemma, we actually implicitly employed the resultfrom Lemma 2.2.2 that progressive measurability is equivalent to measurability withrespect to a σ-algebra Σ π , in the sense that we know that ordinary measurability


22 Preliminariesis preserved by pointwise limits. Had we not known that progressive measurability ismeasurability with respect to Σ π , we would have to prove manually that it is preservedby pointwise convergence. Note that the result would also hold if continuity wasweakened to left-continuity or right-continuity.◦Lemma 2.2.6. Let X <strong>and</strong> Y be measurable processes. If X <strong>and</strong> Y are versions, theset {(t, ω) ∈ [0, ∞) × Ω|X t (ω) ≠ Y t (ω)} is a λ ⊗ P null set.Proof. First note that A = {t, ω) ∈ [0, ∞) × Ω|X t (ω) ≠ Y t (ω)} is B[0, ∞) ⊗ F measurablesince X <strong>and</strong> Y both are measurable. By the Tonelli Theorem, thenshowing the claim.(λ ⊗ P )(A) =∫ ∞0P (X t ≠ Y t ) dt = 0,Comment 2.2.7 Note that the other implication does not hold: We can only concludethat P (X t ≠ Y t ) = 0 almost surely if {(t, ω) ∈ [0, ∞) × Ω|X t ≠ Y t } is a λ ⊗ P null set.Note also that it is crucial that both X <strong>and</strong> Y are measurable in the above, otherwisethe set {(t, ω) ∈ [0, ∞) × Ω|X t (ω) ≠ Y t (ω)} would not be B[0, ∞) ⊗ F measurable. ◦2.3 Stopping timesThis section contains some basic results on stopping times. A stopping time is astochastic variable τ : Ω → [0, ∞] such that (τ ≤ t) ∈ F t <strong>for</strong> any t ≥ 0. We say thatτ is finite if τ maps into [0, ∞). We say that τ is bounded if τ maps into a boundedsubset of [0, ∞). If X is a stochastic process <strong>and</strong> τ is a stopping time, we denote by X τthe process Xt τ = X τ∧t <strong>and</strong> call X τ the process stopped at τ. We denote by X[0, τ]the process X[0, τ] t = X t 1 [0,τ] (t) <strong>and</strong> call X[0, τ] the process zero-stopped at τ.Furthermore, we define the stopping-time σ-algebra F τ of events determined at τ byF τ = {A ∈ F|A ∩ (τ ≤ t) ∈ F t <strong>for</strong> all t ≥ 0}. Clearly, F τ is a σ-algebra. Ourfirst goal is to develop some basic results on stopping times <strong>and</strong> their interplay withstopping-time σ-algebras. By B 0 , we denote the elements A ∈ B with 0 /∈ A.Lemma 2.3.1. If τ <strong>and</strong> σ are stopping times, then so is τ ∧ σ, τ ∨ σ <strong>and</strong> τ + σ.Furthermore, if σ ≤ τ, then F σ ⊆ F τ .


2.3 Stopping times 23Proof. See Karatzas & Shreve (1988), Lemma 1.2.9 <strong>and</strong> Lemma 1.2.15.Lemma 2.3.2. Let τ be a stopping time. Then (τ < t) ∈ F t <strong>for</strong> all t ≥ 0.Proof. We have (τ < t) = ∪ n≥1(τ ≤ t −1n)∈ Ft , since (τ ≤ t − 1 n ) ∈ F t− 1 n ⊆ F t.Lemma 2.3.3. Let X be progressively measurable, <strong>and</strong> let τ be a stopping time. ThenX τ is F τ measurable <strong>and</strong> X τ is progressively measurable.Proof. See Karatzas & Shreve (1988), Proposition 1.2.18.Lemma 2.3.4. For any stopping time τ, the process (t, ω) ↦→ 1 [0,τ(ω)] (t) is progressive.In particular, if X is Σ π measurable, then X[0, τ] is Σ π measurable as well.Proof. Let t ≥ 0. Our fundamental observation is{(s, ω) ∈ [0, t] × Ω|1 [0,τ(ω)] (s) = 0} = {(s, ω) ∈ [0, t] × Ω|τ(ω) < s}.We need to prove that the latter is B[0, t] ⊗ F t measurable. Since 1 [0,τ] only takesthe values 0 <strong>and</strong> 1, this is sufficient to prove progressive measurability. To this end,define the mapping f : [0, t] × (τ ≤ t) → [0, t] 2 by f(s, ω) = (s, τ(ω)). Let τ t be therestriction of τ to (τ ≤ t). We then have (τ t ≤ s) ∈ F s ⊆ F t <strong>for</strong> any s ≤ t. Sincethe sets of the <strong>for</strong>m [0, s], s ≤ t, <strong>for</strong>m a generating family <strong>for</strong> B[0, t], we conclude thatτ t is F t − B[0, t] measurable. There<strong>for</strong>e, f is B[0, t] ⊗ F t − B[0, t] 2 measurable. WithA = {(x, y) ∈ [0, t] 2 |x > y}, A ∈ B[0, t] 2 <strong>and</strong> we obtain{(s, ω) ∈ [0, t] × Ω|τ(ω) < s} = {(s, ω) ∈ [0, t] × Ω|f(s, ω) ∈ A}= f −1 (A),which is B[0, t]⊗F t measurable, as desired. The remaining claims of the lemma followsimmediately.Lemma 2.3.5. Let X be a continuous <strong>and</strong> adapted process, <strong>and</strong> let F be closed. Thenthe first entrance time τ = inf{t ≥ 0|X t ∈ F } is a stopping time.Proof. See Rogers & Williams (2000a), Lemma 74.2.Lemma 2.3.6. Let τ k , k ≤ n be a family of stopping times. Then the i’th orderedvalue of (τ k ) k≤n is also a stopping time.


24 PreliminariesComment 2.3.7 Note that it does not matter how we order the stopping times incase of ties, since we are only interested in the ordered values of the stopping times,not the actual ordering.◦Proof. Let t ≥ 0, <strong>and</strong> let τ (i) be the i’th ordered value of (τ k ) k≤n . Then τ (i) ≤ tprecisely if i or more of the stopping times are less than t. That is,⋃(τ (i) ≤ t) =(τ k ≤ t, k ∈ I) ∩ (τ k > t, k /∈ I) ∈ F t .I⊆{1,...,n},|I|≥iLemma 2.3.8. Let τ <strong>and</strong> σ be stopping times. Assume that Z ∈ F σ . Then it holdsthat both Z1 (σ


2.4 Martingales 25Since Z ∈ F σ , we find (Z ∈ B) ∩ (σ ≤ t) ∈ F t . And since we know (τ < σ) ∈ F τ∧σ ,(σ ≤ τ) = (τ < σ) c ∈ F σ∧τ , so (σ ≤ τ) ∩ (σ ∧ τ ≤ t) ∈ F t . This demonstrates(Z1 (σ≤τ) ∈ B) ∩ (σ ∧ τ ≤ t) ∈ F t , as desired.2.4 MartingalesIn this section, we review the basic theory of martingales. As be<strong>for</strong>e, we work in afiltered probability space (Ω, F, F t , P ) satisfying the usual conditions. If M is someprocess, we put M ∗ t = sup s≤t |M s | <strong>and</strong> M ∗ = M ∗ ∞. A process M is said to be amartingale if M s is adapted <strong>and</strong> E(M t |F s ) = M s whenever s ≤ t. We say that M issquare-integrable if M is bounded in L 2 . As described in Appendix B.5, we say thatM is uni<strong>for</strong>mly integrable iflim sup E|M t |1 (|Mt |>x) = 0.x→∞t≥0In this case, M is also bounded in L 1 by Lemma B.5.2. By Lemma B.5.1, if M t isbounded in L p <strong>for</strong> any p > 1, it is uni<strong>for</strong>mly integrable. We denote the space ofmartingales by M. The space of martingales stating at zero is denoted M 0 .We will mostly be concerned with continuous martingales. However, at some pointswe will be interested in cadlag martingales as well. The following result shows that wealways can find a cadlag version of any martingale.Theorem 2.4.1. Let M be a martingale. Then M has a cadlag version.Proof. This follows from Theorem II.67.7 of Rogers & Williams (2000a).Next, we go on to state some of the classic results of martingale theory.Theorem 2.4.2 (Doob’s maximal inequality). Let M be a nonnegative cadlagsubmartingale. For c > 0 <strong>and</strong> t ≥ 0, cP (Mt∗ ≥ c) ≤ EM t 1 (M ∗t ≥c).Proof. See Rogers & Williams (2000a), Theorem 70.1.Theorem 2.4.3 (Martingale Convergence Theorem). Let M be a continuoussubmartingale. If M + is bounded in L 1 , then M is almost surely convergent to an


26 Preliminariesintegrable variable. If M is also uni<strong>for</strong>mly integrable, the limit M ∞ exists in L 1 aswell, <strong>and</strong> E(M ∞ |F t ) = M t . In this case, we say that M ∞ closes the martingale.Proof. The first part follows from Theorem 1.3.15 in Karatzas & Shreve (1988), <strong>and</strong>the rest follows from Theorem II.69.2 in Rogers & Williams (2000a).Corollary 2.4.4. Let M be a continuous martingale. M is closed by a variable M ∞if <strong>and</strong> only if M is uni<strong>for</strong>mly integrable, <strong>and</strong> in this case M t converges to M ∞ almostsurely <strong>and</strong> in L 1 .Proof. From Theorem 2.4.3, we know that if M is uni<strong>for</strong>mly integrable, M is closedby some variable <strong>and</strong> we have convergence almost surely <strong>and</strong> in L 1 . It will there<strong>for</strong>esuffice to show the other implication. Assume that M is closed by a variable M ∞ , weneed to show that M is uni<strong>for</strong>mly integrable. By Jensen’s inequality, we know that|M t | = |E(M ∞ |F t )| ≤ E(|M ∞ ||F t ). In particular, M is bounded in L 1 , <strong>and</strong> there<strong>for</strong>eM t is convergent almost surely. There<strong>for</strong>e, M ∗ is almost surely finite. We obtainE|M t |1 (|Mt|>x) ≤ EE(|M ∞ ||F t )1 (|Mt|>x)= E|M ∞ |1 (|Mt|>x)<strong>and</strong> since M ∗ is almost surely finite, we concludelim supx→∞supt≥0so M is uni<strong>for</strong>mly integrable.≤E|M ∞ |1 (M ∗ >x),E|M t |1 (|Mt|>x) ≤ lim sup E|M ∞ |1 (M ∗ >x) = 0,x→∞Theorem 2.4.5 (Doob’s L p -inequality). Let M be a continuous martingale. Forany t ≥ 0 <strong>and</strong> p > 1, ‖M ∗ t ‖ p ≤ q‖M t ‖ p , where q is the dual exponent to p.Proof. See Rogers & Williams (2000a), Theorem 70.2.Lemma 2.4.6. If M is a martingale bounded in L 2 , then M is almost surely convergent<strong>and</strong> the limit M ∞ is square integrable. Furthermore, M ∞ closes the martingale.Proof. See Rogers & Williams (2000a), Theorem 70.2.


2.4 Martingales 27Theorem 2.4.7 (Optional Sampling). Let M be a continuous martingale. Letσ ≤ τ be stopping times. Then E(M τ |F σ ) = M σ if either M is uni<strong>for</strong>mly integrableor σ <strong>and</strong> τ are bounded.Proof. Assume first that M is uni<strong>for</strong>mly integrable. By 2.4.3, M is then convergentalmost surely <strong>and</strong> in L 1 with limit M ∞ , <strong>and</strong> M ∞ closes the martingale. Then Theorem1.3.22 of Karatzas & Shreve (1988) yields the result.Next assume that σ <strong>and</strong> τ are bounded, say by T . Then M τ = M T τ <strong>and</strong> M σ = M T σ .M T is a uni<strong>for</strong>mly integrable martingale by Corollary 2.4.4, so we obtainE(M τ |F σ ) = E(M T τ |F σ ) = M T σ = M σ .Since any square-integrable martingale is uni<strong>for</strong>mly integrable, optional sampling holds<strong>for</strong> square-integrable martingales <strong>for</strong> all stopping times.Definition 2.4.8. By cM 2 0, we denote the space of continuous martingales boundedin L 2 . We endow cM 2 0 with the the seminorm given by ‖M‖ cM 20= √ EM 2 ∞.It is clear that ‖ · ‖ cM 20is a well-defined seminorm.Lemma 2.4.9. Convergence in cM 2 0 is equivalent to uni<strong>for</strong>m L 2 -convergence <strong>and</strong>implies almost sure uni<strong>for</strong>m convergence along a subsequence. Also, ‖M − N‖ cM 20= 0if <strong>and</strong> only if M <strong>and</strong> N are indistinguishable.Proof. From Doob’s L 2 -inequality, Lemma 2.4.5, we have <strong>for</strong> any M, N ∈ cM 2 0 that‖M − N‖ cM 20≤ ‖(M − N) ∗ ‖ 2 ≤ 2‖M ∞ − N ∞ ‖ 2 = 2‖M − N‖ cM 20.There<strong>for</strong>e, it is immediate that convergence in cM 2 0 is equivalent to uni<strong>for</strong>m L 2 -convergence. Now assume that M n converges to M in cM 2 0. In particular, then(M n − M) ∗ converges to 0 in L 2 . There<strong>for</strong>e, there is a subsequence converging almostsurely, which means that (M nk −M) ∗ converges almost surely to 0, <strong>and</strong> this is preciselythe almost sure uni<strong>for</strong>m convergence proposed.To se that ‖M−N‖ cM 20= 0 if <strong>and</strong> only if M <strong>and</strong> N are indistinguishable, first note thatif M <strong>and</strong> N are indistinguishable, then M ∞ = lim M t = lim N t = N ∞ almost surely,


28 Preliminaries<strong>and</strong> there<strong>for</strong>e ‖M − N‖ cM 20= ‖M ∞ − N ∞ ‖ 2 = 0. Conversely, if ‖M − N‖ cM 20= 0then by Doob’s L 2 -inequality, ‖(M − N) ∗ ‖ 2 = 0, showing that (M − N) ∗ is almostsurely zero, <strong>and</strong> this implies indistinguishability.Lemma 2.4.10. The space cM 2 0 is complete.Proof. Let (M n ) be a cauchy sequence in cM 2 0. Then (M n ∞) is a cauchy sequencein L 2 (F ∞ ), there<strong>for</strong>e convergent to some M ∞ ∈ L 2 (F ∞ ). Now define M by puttingM t = E(M ∞ |F t ), it is clear from Jensen’s inequality that M is a martingale boundedin L 2 . Since‖(M n − M) ∗ ∞‖ 2 ≤ 2‖M n ∞ − M ∞ ‖ 2 ,we find that M n converges uni<strong>for</strong>mly in L 2 to M. As in the proof of Lemma 2.4.9, wethen find that there is a subsequence such that M n kconverges almost surely uni<strong>for</strong>mlyto M. In particular, since M n kis continuous, M is almost surely continuous. Sincewe have assumed the usual conditions, we can pick be a modification N of M whichis adapted <strong>and</strong> continuous <strong>for</strong> all sample paths. We then find that N ∈ cM 2 0 <strong>and</strong> M nconverges to N in cM 2 0, as desired.Lemma 2.4.11. Let M be a continuous martingale <strong>and</strong> let τ be a stopping time. Thenthe stopped process M τ is also a continuous martingale.Proof. Let 0 ≤ s ≤ t. We writeE(M τ t |F s ) = 1 (τ≤s) E(M τ t |F s ) + 1 (τ>s) E(M τ t |F s )<strong>and</strong> consider the two terms separately. Since 1 (τ≤s) is F s measurable, we find1 (τ≤s) E(M τ t |F s ) = E(M τ t 1 (τ≤s) |F s )= E(M τ s 1 (τ≤s) |F s )= M τ s 1 (τ≤s) ,since M τ t 1 (τ≤s) = M τ s 1 (τ≤s) , which is F s measurable. For the second term, we find1 (τ>s) E(M τ t |F s ) = E(1 (τ>s) M τ t |F s ), <strong>and</strong> <strong>for</strong> F ∈ F s we have F ∩ (τ > s) ∈ F τ∧s byLemma 2.3.8, yieldingE(1 F E(1 (τ>s) M τ t |F s )) = E(1 F 1 (τ>s) M τ t )= E(1 F 1 (τ>s) E(M τ t |F τ∧s ))= E(1 F 1 (τ>s) M τ s ),


2.4 Martingales 29by optional sampling, so that 1 (τ>s) E(M τ t |F s ) = 1 (τ>s) M τ s . All in all, we concludeE(M τ t |F s ) = M τ s .The following lemma will be useful in several of the calculations we are to make whendeveloping the stochastic integral.Lemma 2.4.12. Let M ∈ cM 2 0, <strong>and</strong> let (t 0 , . . . , t n ) be a partition of [s, t]. Then( n∣ )∑∣∣∣∣E (M tk − M tk−1 ) 2 F s = E((M t − M s ) 2 |F s )k=1Proof. We see thatE((M t − M s ) 2 |F s )⎛( n) ∣ ⎞ 2∑∣∣∣∣∣= E ⎝ M tk − M tk−1 F s⎠k=1n∑n∑= E((M tk − M tk−1 ) 2 |F s ) + E((M tk − M tk−1 )(M ti − M ti−1 )|F s ),k=1k≠iwhere, <strong>for</strong> k < i,E((M tk − M tk−1 )(M ti − M ti−1 )|F s )= E((M tk − M tk−1 )E(M ti − M ti−1 |F ti−1 )|F s )= 0.This shows the lemma.We end the section with a very useful criterion <strong>for</strong> determining when a process is amartingale or a uni<strong>for</strong>mly integrable martingale.Lemma 2.4.13. Let M be a progressive process such that the limit exists almost surely.If EM τ = 0 <strong>for</strong> any bounded stopping time τ, M is a martingale. If EM τ = 0 <strong>for</strong> anystopping time τ, M is a uni<strong>for</strong>mly integrable martingale.Proof. The statement that M is a uni<strong>for</strong>mly integrable martingale if EM τ = 0 <strong>for</strong> anystopping time τ is Theorem II.77.6 of Rogers & Williams (2000a). If we only havethat EM τ = 0 <strong>for</strong> any bounded stopping time, M t is a uni<strong>for</strong>mly integrable martingale<strong>for</strong> any t ≥ 0 <strong>and</strong> there<strong>for</strong>e, M is a martingale.


30 Preliminaries2.5 LocalisationIn our construction of the stochastic integral, we will employ the technique of localisation,where we extend properties <strong>and</strong> definitions to processes where a given propertyonly holds locally in the sense that if we somehow cut off the process from a certainpoint onwards, then the property holds.We will mainly be using two localisation concepts: With H a family of stochasticprocesses, we say that a process is locally in H if there exists a sequence of stoppingtimes increasing to infinity such that X τn is in H <strong>and</strong> we say that a process is zerolocallyin H if there exists a sequence of stopping times increasing to infinity such thatX[0, τ n ] is in H <strong>for</strong> n ≥ 1. We also write that X is L-locally in H <strong>and</strong> that X isL 0 -locally in H, respectively.We will need a variety of results about these localisation procedures. To avoid havingto prove the same results twice, we define an abstract concept of localisation <strong>and</strong>prove the results <strong>for</strong> this abstract type of localisation. This also has the convenientside-effect of demonstrating what is the essential properties of localisation: The abilityto determine a process from its local properties, <strong>and</strong> the ability to paste together a newprocess from localised processes. Our definition of localisation is sufficiently strong tohave these properties, but also sufficiently broad to encompass the two concepts oflocalisation that we will need.In the following, we denote by SP the set of real stochastic processes on a filteredprobability space (Ω, F, P, F t ) satisfying the usual conditions.Definition 2.5.1. A localisation concept L is a family of mappings f τindexed by the set of stopping times, such that: SP → SP,1. For any stopping times τ <strong>and</strong> σ, f τ ◦ f σ = f τ∧σ .2. If f τ (X) = f τ (Y ), then X t (ω) = Y t (ω) whenever t ≤ τ(ω).The interpretation of the above is the following. The mappings f τ correspond tocutoff functions, such that f τ (X) is the restriction of X to {(t, ω)|t ≤ τ(ω)} in somesense. The first property ensures that the order of cutting off is irrelevant. The secondproperty clarifies that f τ (X) corresponds to cutting off X to {(t, ω)|t ≤ τ(ω)}. The twolocalisation concepts mentioned earlier is covered by our definition. Specifically, define


2.5 Localisation 31the localisation concept of stopping L as the family of mappings f τ (X) = X τ <strong>and</strong> thelocalisation concept of zero-stopping L 0 as the family of mappings f τ (X) = X[0, τ].Then L <strong>and</strong> L 0 are both localisation concepts according to Definition 2.5.1.Our abstract localisation concept does not cover more esoteric localisation conceptssuch as the concept of pre-stopping as found in Protter (2005), Chapter IV. If, however,we exchanged {(t, ω)|t ≤ τ(ω)} with {(t, ω)|t < τ(ω)}, it would cover pre-stoppingas well. This is unnecessary <strong>for</strong> our purposes, however, <strong>and</strong> we there<strong>for</strong>e omit thisextension.In the following, let L = (f τ ) be a localisation concept. Let H be a subset of SP.Definition 2.5.2. We say that a sequence τ n of stopping times is determining if τ nincreases to infinity. Let X be a stochastic process. We say that X is L-locally in Hif there is a determining sequence τ n such that f τn (X) ∈ H <strong>for</strong> all n ≥ 1. In this case,we say that τ n is a localising sequence <strong>for</strong> X. The set of elements L-locally in H isdenoted L(H).Note that in the above, we defined two different labels <strong>for</strong> sequences of stopping times.A sequence of stopping times (τ n ) is determining if it increases to infinity. The samesequence is localising <strong>for</strong> X to H if f τn (X) ∈ H <strong>for</strong> n ≥ 1. The interpretation of Definition2.5.2 is that when a process X is locally in H, we can cut off with the localisingsequence τ n <strong>and</strong> then the result f τn (X) will be in H. The reason <strong>for</strong> using determiningsequences is to ensure that the “localised” versions f τn (X) actually determine theelement, as the following result shows.Lemma 2.5.3. If τ n is a determining sequence <strong>and</strong> f τn (X) = f τn (Y ) <strong>for</strong> all n ≥ 1,then X <strong>and</strong> Y are equal.Proof. It follows that X t (ω) <strong>and</strong> Y t (ω) are equal whenever t ≤ τ n (ω). Since τ n tendsto infinity, we obtain X = Y .We will now introduce the concept of a stable subset <strong>and</strong> develop some basic resultsabout localisation <strong>and</strong> stability. When we have done this, we continue to the mainresults about localisation: The ability to paste localised processes together <strong>and</strong> theability to extend mappings H 1 → H 2 to mappings between localised versions of H 1<strong>and</strong> H 2 . In order to make the contents of the following easier to underst<strong>and</strong>, we willcomment on the meaning of the results in the context of our staple example, that of


34 PreliminariesTheorem 2.5.11. Let L 1 = (f τ ) <strong>and</strong> L 2 = (g τ ) be two localisation concepts. Let H 1<strong>and</strong> H 2 be spaces of stochastic processes such that H 1 is L 1 -stable <strong>and</strong> H 2 is L 2 -stable.Let α : H 1 → H 2 be a mapping with g τ (α(X)) = α(f τ (X)). There exists an extensionα : L 1 (H 1 ) → L 2 (H 2 ), uniquely defined by the criterion that g τ (α(X)) = α(f τ (X)).Proof. We proceed by first constructing a c<strong>and</strong>idate mapping α 0 from L 1 (H 1 ) toL 2 (H 2 ) <strong>and</strong> then show that it is an extension of α with the desired properties.Step 1: Construction of the mapping. Let X ∈ L 1 (H 1 ) with localising sequenceτ n . Theng τn (α(f τn+1 (X))) = α((f τn ◦ f τn+1 )(X)) = α(f τn (X)).The Pasting Lemma 2.5.10 yields the existence of some process Y with the propertythat g τn (Y ) = α(f τn (X)). We want to define α 0 (X) as this process Y . For this to bemeaningful, we need to check that Y does not depend on the localising sequence used.There<strong>for</strong>e, let τ ∗ n be another localising sequence <strong>for</strong> X <strong>and</strong> let Y ∗ be the element suchthat g τ ∗ n(Y ∗ ) = α(f τ ∗ n(X)). We want to show that Y = Y ∗ . Since τ n <strong>and</strong> τ ∗ n both arelocalising sequences <strong>for</strong> X, by Lemma 2.5.8, so is τ ∗ n ∧ τ n , <strong>and</strong>g τ ∗n ∧τ n(Y ) = (g τ ∗n◦ g τn )(Y )= g τ ∗n(α(f τn (X)))= α((f τ ∗n◦ f τn )(X))= α((f τn ◦ f τ ∗n)(X))= g τn (α(f τ ∗n(X)))= (g τn ◦ g τ ∗n)(Y ∗ )= g τ ∗n ∧τ n(Y ∗ )Since τ ∗ n ∧ τ n is determining, Y = Y ∗ . There<strong>for</strong>e, we can define α 0 : L 1 (H 1 ) → L 2 (H 2 )by putting α 0 (X) = Y . We need to show that α 0 is an extension of α <strong>and</strong> has thelocalisation property mentioned in the theorem.Step 2: α 0 is an extension of α. To show that α 0 is an extension, let an elementX ∈ H 1 be given, <strong>and</strong> let τ n be a determining sequence. Then τ n is localising <strong>for</strong> X,<strong>and</strong> there<strong>for</strong>e g τn (α 0 (X)) = α(f τn (X)) = g τn (α(X)). Thus α 0 (X) = α(X).Step 3: α 0 has the localisation property. We show g σ (α 0 (X)) = α 0 (f σ (X)) <strong>for</strong>any σ. Still letting τ n be localising <strong>for</strong> X, we know by Lemma 2.5.7 that τ n is also


2.5 Localisation 35localising <strong>for</strong> any f σ (X), <strong>and</strong> there<strong>for</strong>eg τn (g σ (α 0 (X))) = g σ (g τn (α 0 (X)))= g σ (α(f τn (X)))= α((f σ ◦ f τn )(X))= α((f τn ◦ f σ )(X))= g τn (α 0 (f σ (X))).Since τ n is determining, we conclude g σ (α 0 (X)) = α 0 (f σ (X)).Step 4: Uniqueness. Finally, assume that α 1 <strong>and</strong> α 2 are two extensions of αsatisfying the criterion in the theorem. We want to show that α 1 = α 2 . Let anelement X ∈ L 1 (H 1 ) be given <strong>and</strong> let τ n be a localising sequence. We obtaing τn (α 1 (X)) = α 1 (f τn (X))= α(f τn (X))= α 2 (f τn (X))= g τn (α 2 (X)).Since τ n is determining, α 1 (X) = α 2 (X), as desired.Corollary 2.5.12. Let H 1 <strong>and</strong> H 2 be linear spaces. If the mapping α : H 1 → H 2 inTheorem 2.5.11 is linear <strong>and</strong> each element of the localisation concepts L 1 <strong>and</strong> L 2 arelinear, then the extension α : L 1 (H 1 ) → L 2 (H 2 ) is linear as well.Proof. By Lemma 2.5.9, L 1 (H 1 ) <strong>and</strong> L 2 (H 2 ) are linear spaces, so the conclusion ofthe corollary is well-defined. The extension satisfies g τ (α(X)) = α(f τ (x)). LetX, Y ∈ L 1 (H 1 ) <strong>and</strong> let λ, µ ∈ R. Let τ n be a localising sequence <strong>for</strong> X <strong>and</strong> Y ,then f τn (X), f τn (Y ) ∈ H 1 . There<strong>for</strong>e, by the linearity of α on H 1 ,g τn (α(λX + µY )) = α(f τn (λX + µY ))= α(λf τn (X) + µf τn (Y ))= λα(f τn (X)) + µα(f τn (Y ))= λg τn (α(X)) + µg τn (α(Y ))= g τn (λα(X) + µα(Y )).Since τ n is determining, we may conclude that α is linear.


36 PreliminariesLemma 2.5.13. Assume that H is linear <strong>and</strong> that f τ∨σ + f τ∧σ = f τ + f σ . If H isL-stable, then L(L(H)) = L(H).Proof. Since H is L-stable, L(H) is L-stable as well, <strong>and</strong> there<strong>for</strong>e L(H) ⊆ L(L(H)).We need to show the other inclusion. Be<strong>for</strong>e doing so, we make an observation. Let τ<strong>and</strong> σ be stopping times, <strong>and</strong> assume that f τ (X), f σ (X) ∈ H. Since H is stable <strong>and</strong>linear, we find f τ∨σ (X) = f τ (X) + f σ (X) − f τ∧σ (X) ∈ H.Now let X ∈ L(L(H)) <strong>and</strong> let τ n be a localising sequence such that f τn (X) ∈ L(H).For any n, let σm n be a localising sequence such that f σ n m(f τn (X) ∈ H. Since σm n tendsto infinity almost surely, lim m P (σm n ≤ n) = 0 <strong>for</strong> all n ≥ 1. For any n ≥ 1, pickm(n) such that P (σm(n) n ≤ n) ≤ 12. Obviously, then ∑ ∞n n=1 P (σn m(n)≤ n) < ∞, sothe Borel-Cantelli Lemma yields σm(n)n a.s.a.s.−→ ∞. Since also τ n −→ ∞, we may concludeσm(n) n ∧ τ a.s.n −→ ∞. Now putρ n = maxk≤n σk m(k) ∧ τ k.Then ρ is a sequence of stopping times tending almost surely to infinity, <strong>and</strong> by ourearlier observation, f ρn (X) ∈ H. There<strong>for</strong>e X ∈ L(H).The criterion in Lemma 2.5.13 is satisfied both <strong>for</strong> stopping <strong>and</strong> zero-stopping. Weare now done with general results on localisation. We proceed to give a few resultsregarding more specific matters of localisation.Definition 2.5.14. Let H be a class of stochastic proceses. We denote the continuouselements of H by cH.Lemma 2.5.15. It always holds that cL(H) = L(cH).Proof. Let X ∈ cL(H), <strong>and</strong> let τ n be a localizing sequence <strong>for</strong> X. Since we haveX τ nt = X τn∧t <strong>and</strong> X is continuous, clearly X τn is continuous as well <strong>and</strong> there<strong>for</strong>eX ∈ L(cH). Contrarily, assume that X ∈ L(cH), <strong>and</strong> let τ n be a localizing sequence<strong>for</strong> X. Since X τ nis continuous, X is continuous on [0, τ n ]. Because τ n tends to infinity,it follows that X is continuous <strong>and</strong> there<strong>for</strong>e X ∈ cL(H).Lemma 2.5.16. Let M be a local martingale. Then M is locally a uni<strong>for</strong>mly integrablemartingale.


38 PreliminariesProof. Let τ n be a localising sequence <strong>for</strong> M <strong>and</strong> let τ be any bounded stoppingtime. Then, τ n ∧ τ is a bounded stopping time <strong>for</strong> each n, <strong>and</strong> M τn ∧τ is bounded inL p , there<strong>for</strong>e uni<strong>for</strong>mly integrable. Since M is continuous, M τn ∧τ converges to M τpointwise. In particular, M τn ∧τ converges in probability to M τ . By Lemma B.5.3,M τn∧τ converges in L 1 to M τ . There<strong>for</strong>e, EM τ = lim EM τn∧τ = 0. By Lemma2.4.13, M is a martingale.


Chapter 3The Itô IntegralIn this chapter, we will develop the theory of integration with respect to a simpleclass of stochastic processes, which nevertheless will be rich enough to cover all ofour purposes. Our goal will be to create a relatively self-contained <strong>and</strong> thoroughpresentation of the theory. We will not merely be concerned with results <strong>for</strong> use inthe later development of the Malliavin calculus <strong>and</strong> the applications in finance, butwill seek to convey a general underst<strong>and</strong>ing of the properties of the integral. Ourconstruction will mainly follow the methods of Steele (2000), Karatzas & Shreve(1988), Øksendal (2005) <strong>and</strong> Rogers & Williams (2000b).We will work on a filtered probability space (Ω, F, F t , P ) with a F t brownian motionW satisfying the usual conditions. Such a probability space exists according to ourresults from Section 2.1. Our first task will be to define a meaningful stochastic integralwith respect to W . The construction will be done in several steps, first defining theintegral <strong>for</strong> a simple class of integr<strong>and</strong>s, <strong>and</strong> then extending the space of integr<strong>and</strong>s inseveral steps.After establishing the basic theory of integration with respect to one-dimensional Brownianmotion, we will consider a n-dimensional Brownian motion <strong>and</strong> extend the integralto integrators of the <strong>for</strong>m M t = ∑ n∫ tk=1 0 Y s k dWs k . The development of the integral<strong>for</strong> this type of processes will mainly be done by reusing parts of the theory <strong>for</strong> theone-dimensional brownian case. We will later see that in a special case, this class ofintegr<strong>and</strong>s actually is equal to the class of local martingales. After defining the integral


40 The Itô Integral<strong>for</strong> these integrators, we extend the theory to integrators which also have a componentof continuous paths with finite variation. The stochastic integral of a process Y withrespect to a process X will be denoted I X (Y ) or Y · X. We also use the notation(Y · X) t = ∫ t0 Y s dX s .After the construction of the integral, we will prove some of the fundamental theoremsof the theory: the Itô <strong>for</strong>mula, the Martingale Representation Theorem <strong>and</strong> Girsanov’sTheorem.Be<strong>for</strong>e beginning work on the stochastic integral with respect to Brownian motion,we develop some general results on the space of elementary processes that will provevery useful. This allows us a separation of concerns: We will be able to see whichelements of the theory essentially only concern the nature of the space of elementaryprocesses <strong>and</strong> which elements concern the Brownian motion specifically. We will alsoavoid letting the development of the integral be disturbed by the somewhat tediousresults <strong>for</strong> the space of elementary processes.The plan <strong>for</strong> the chapter is thus as follows.1. Develop some basic results <strong>for</strong> elementary processes.2. Construct the integral with respect to Brownian motion.3. Construct the integral with respect to integrators of type M t = ∑ n∫ tk=1 0 Y s k dWs k .4. Construct the integral with respect to integrators with finite variation.5. Establish the basic results of the theory.3.1 The space of elementary processesThe purpose of this section is twofold: First, we will show how to define the integralrigorously on the set bE of elementary processes. Second, we will prove a density result<strong>for</strong> bE <strong>and</strong> use this result to demonstrate how to extend a class of mappings from bEto certain L 2 spaces.Definition 3.1.1. Let Z be a stochastic variable <strong>and</strong> let σ <strong>and</strong> τ be stopping times. Wedefine the process Z(σ, τ] by Z(σ, τ] t = Z1 (σ,τ] (t). We define the class of elementary


3.1 The space of elementary processes 41processes bE by{ n }∑bE = Z k (σ k , τ k ]∣ σ k ≤ τ k are bounded stopping times <strong>and</strong> Z k ∈ bF σk .k=1Clearly, bE is a linear space. Let M be any stochastic process, <strong>and</strong> let H ∈ bE be givenwith H = ∑ nk=1 Z k(σ k , τ k ]. Intuitively, we would like to define the integral I M (H) t ofH with respect to M over [0, t] by the Riemann-sum-like expressionI M (H) t =n∑Z k (M τk ∧t − M σk ∧t).k=1In order <strong>for</strong> this definition to work, however, we must make sure that this definitiondoes not depend on the representation of H. It is clear that if H = ∑ nk=1 Z k(σ k , τ k ], wealso find H = ∑ n−1k=1 Z k(σ k , τ k ] + 1 2 Z k(σ k , τ k ] + 1 2 Z k(σ k , τ k ], or, with bounded stoppingtimes ρ k such that σ k ≤ ρ k ≤ τ k , H = ∑ n−1k=1 Z k(σ k , ρ k ] + Z k (ρ k , τ k ]. In other words,H can be represented on the <strong>for</strong>m ∑ nk=1 Z k(σ k , τ k ] in many equivalent ways. Wewill now check that the definition of the stochastic integral does not depend on therepresentation.Lemma 3.1.2. Let H, K ∈ bE with H = ∑ nk=1 Z k(σ k , τ k ] <strong>and</strong> K = ∑ mk=1 Z′ k (σ′ k , τ k ′ ].There exists representations of H <strong>and</strong> K based on the same stopping times. Oneparticular such representation can be obtained by puttingH =2n+2m−1∑i=1Y i (U i , U i+1 ] <strong>and</strong> K =2n+2m−1∑i=1Y ′i (U i , U i+1 ],where U i is the i’th ordered value amongst the 2n + 2m stopping times in the family{σ 1 , . . . , σ n , τ 1 , . . . , τ n , σ ′ 1, . . . , σ ′ m, τ ′ 1, . . . , τ ′ m} <strong>and</strong>Y i =n∑k=1Z k 1 (σk ≤U i


42 The Itô Integralsame lemma that Z k 1 (σk ≤U i


3.1 The space of elementary processes 43representations yield the same result. Assume that we have two different representationsof H, H = ∑ nk=1 Z k(σ k , τ k ] = ∑ mk=1 Z′ k (σ′ k , τ k ′ ]. We want to show thatn∑m∑Z k (M τk ∧t − M σk ∧t) = Z k(M ′ τ ′k ∧t − M σ ′k∧t).k=1By Lemma 3.1.2, we have H = ∑ 2n+2m−1i=1Y i (U i , U i+1 ] = ∑ 2n+2m−1i=1Y ′i (U i, U i+1 ],where U i is the i’the ordered value amongst the 2n + 2m stopping times in the family{σ 1 , . . . , σ n , τ 1 , . . . , τ n , σ ′ 1, . . . , σ ′ m, τ ′ 1, . . . , τ ′ m} <strong>and</strong>Y i =n∑k=1We will show the two equalitiesk=1Z k 1 (σk ≤U i


44 The Itô IntegralTheorem 3.1.3 ensures the existence of the stochastic integral I M on bE, taking thevalues we would expect no matter what the representation of the bE-element. In thefollowing, we will interchangeably use the notation I M (H) <strong>and</strong> H · M <strong>for</strong> the integralof H with respect to M over [0, t]. We will also write I M (H) t = ∫ t0 H s dM s . Note thatusing the notation from Chapter 2 <strong>for</strong> stopped processes, we can write the processI M (H) in compact <strong>for</strong>m, with H = ∑ nk=1 Z k(σ k , τ k ], byI M (H) =n∑Z k (M τ k− M σ k).k=1We also want to import meaning to the symbol ∫ tH(u) dM(u). The following lemmassshows what the proper interpretation is.Lemma 3.1.4. Let 0 ≤ x ≤ y <strong>and</strong> 0 ≤ t ≤ s. It holds that(x, y] ∩ (s, t] = (x ∧ t ∨ s, y ∧ t ∨ s],underst<strong>and</strong>ing that x ∧ t ∨ s = (x ∧ t) ∨ s <strong>and</strong> y ∧ t ∨ s = (y ∧ t) ∨ s.Proof. We consider the three possible cases separately.The case x ≤ y ≤ s ≤ t. We find(x, y] ∩ (s, y] = ∅(x ∧ t ∨ s, y ∧ t ∨ s] = (x ∨ s, y ∨ s] = (s, s] = ∅,so the proposition holds in this case.The case x ≤ s ≤ y ≤ t. In this case, we see that(x, y] ∩ (s, y] = (s, y](x ∧ t ∨ s, y ∧ t ∨ s] = (x ∨ s, y ∨ s] = (s, y],which verifies the statement in this case also.The case s ≤ t ≤ x ≤ y. In the final case, we obtain(x, y] ∩ (s, y] = ∅(x ∧ t ∨ s, y ∧ t ∨ s] = (t ∨ s, t ∨ s] = (t, t] = ∅,<strong>and</strong> this concludes the proof of the lemma.


3.1 The space of elementary processes 45Lemma 3.1.5. Let H ∈ bE with H = ∑ nk=1 Z k(σ k , τ k ]. It holds thatn∑Z k (M τk ∧t∨s − M σk ∧t∨s) = I M (H) t − I M (H) s .k=1Proof. It will suffice to consider the term Z k (M τk ∧t∨s − M σk ∧t∨s) <strong>for</strong> each of the threepossible configurations of the intervals (σ k , τ k ] <strong>and</strong> (s, t], <strong>and</strong> prove it equal to thecorresponding term of I M (H) t − I M (H) s .The case s ≤ t ≤ σ k ≤ τ k . We getZ k (M τk ∧t − M σk ∧t) − Z k (M τk ∧s − M σk ∧s)= Z k (M t − M t ) − Z k (M s − M s )= Z k (M τk ∧t∨s − M σk ∧t∨s).showing the statement in this case.The case s ≤ σ k ≤ t ≤ τ k . Here, we findZ k (M τk ∧t − M σk ∧t) − Z k (M τk ∧s − M σk ∧s)= Z k (M t − M σk ) − Z k (M s − M s )= Z k (M τk ∧t∨s − M σk ∧t∨s).The case s ≤ t ≤ σ k ≤ τ k . In the final case, we obtainZ k (M τk ∧t − M σk ∧t) − Z k (M τk ∧s − M σk ∧s)= Z k (M τk − M σk ) − Z k (M τk − M σk )= Z k (M τk ∧t∨s − M σk ∧t∨s),as desired.Conclusion. All in all, we conclude thatn∑Z k (M τk ∧t∨s − M σk ∧t∨s)=k=1n∑Z k (M τk ∧t − M σk ∧t) − Z k (M τk ∧s − M σk ∧s)k=1= I M (H) t − I M (H) s ,<strong>and</strong> the lemma is proved.


46 The Itô IntegralLemma 3.1.5 shows that the natural definition of the integral ∫ tH(u) dM(u) coincidesswith I M (H) t −I M (H) s , which is of course not particularly surprising. In the following,we will by ∫ ts H(u) dM(u) denote the variable I M (H) t − I M (H) s .Lemma 3.1.6. The integral mapping I M on bE is linear.Proof. Let H <strong>and</strong> K in bE be given with representationsH =<strong>and</strong> let λ, µ ∈ R. Thenn∑m∑Z k (σ k , τ k ] <strong>and</strong> K = Y i (V i , W i ]k=1i=1I M (λH + µK)( tn)∑m∑= I M λZ k (σ k , τ k ] + µY i (V i , W i ]=k=1i=1n∑m∑λZ k (M τ k(t) − M σ k(t)) + µY i (M Wi (t) − M Vi (t))k=1i=1= λI M (H) t + µI M (K) t ,which was to be shown.We have now shown how to define the integral on bE <strong>for</strong> any process M in a sensibleway, <strong>and</strong> we have checked linearity of the integral. In order to extend the integral tolarger spaces of integr<strong>and</strong>s, we will need to require more structure on M. We will seehow to do this in the following sections.Next, we will show an extension result which will be crucial to the later development ofthe integral. First, we show that there is a large class of L 2 spaces on [0, ∞) × Ω whichcontain bE as a dense subspace. Recall that Σ π denotes the progressive σ-algebra on[0, ∞) × Ω.Theorem 3.1.7. Let µ be a measure on Σ π which is absolutely continuous with respectto λ ⊗ P , such that µ([0, T ] × Ω) < ∞ <strong>for</strong> all T > 0. Then bE is a dense subspace ofL 2 (µ).Proof. By ‖ · ‖ µ , we denote the st<strong>and</strong>ard seminorm of L 2 (µ). We first show that bE isa subspace of L 2 (µ). Let H ∈ bE be given with H = ∑ nk=1 Z k(σ k , τ k ]. Using Lemma


3.1 The space of elementary processes 472.3.4, we see that H is progressive. We find‖H‖ 2 µ =≤=n∑∫Z k (ω)1 (σk (ω),τ k (ω)](t) dµ(t, ω)k=1n∑∫‖Z k ‖ ∞ 1 [0,‖σk ‖ ∞ +‖τ k ‖ ∞ ]×Ω dµ(t, ω)k=1n∑‖Z k ‖ ∞ µ([0, ‖σ k ‖ ∞ + ‖τ k ‖ ∞ ] × Ω) < ∞,k=1as desired. The proof of the density of bE proceeds in four steps, each consisting ofshowing that a particular subset of L 2 (µ) is dense in L 2 (µ). The four subsets are:1. Processes which are zero from a point onwards.2. Processes which are zero from a point onwards <strong>and</strong> bounded.3. Processes which are zero from a point onwards, bounded <strong>and</strong> continuous.4. The set bE.Step 1: Approximation by processes which are zero from a point onwards.Let X ∈ L 2 (µ) <strong>and</strong> define X n = X1 [0,T ]×Ω . Since [0, T ] × Ω ∈ Σ π , X n is progressive.Clearly ‖X n ‖ µ ≤ ‖X‖ µ < ∞, so X n ∈ L 2 (µ). And X n is zero from T onwards. Finally,∫lim ‖X − X n ‖ 2 µ = lim 1 (T,∞)×Ω X 2 dµ = 0by dominated convergence. This means that in the following, in our work to showdensity of bE in L 2 (µ), we can assume that the process to be approximated is zerofrom a point onwards.Step 2: Approximation by bounded processes. We next show that each elementof L 2 (µ), zero from a point onwards, can be approximated by bounded processes zerofrom a point onwards. There<strong>for</strong>e, let X ∈ L 2 (µ) be zero from T onwards <strong>and</strong> defineX n (t) = X(t)1 [−n,n] (X t ). Then X n is progressive <strong>and</strong> bounded. And we clearly have‖X n ‖ µ ≤ ‖X‖ µ < ∞, so X n ∈ L 2 (µ), <strong>and</strong> X n is zero from T onwards. Furthermore,∫lim ‖X − X n ‖ 2 µ = lim 1 (|X|>n) X 2 dµ = 0nby dominated convergence.


48 The Itô IntegralStep 3: Approximation by continuous processes. We now show that eachelement of L 2 (µ), bounded <strong>and</strong> zero from a point onwards, can be approximated bycontinuous processes of the same kind. Let X ∈ L 2 (µ) be bounded <strong>and</strong> zero from Tonwards. PutX n (t) = 1 1n∫ t(t− 1 n )+ X(s) ds,where x + = max{0, x}. Clearly, X n is continuous <strong>and</strong> ‖X n ‖ ∞ ≤ ‖X‖ ∞ < ∞, so X nis bounded. For t ≥ T + 1 n , X n(t) = 0. We will show that the sequence X n is in L 2 (µ)<strong>and</strong> approximates X.Since X is progressive, X |[0,t]×Ω is B[0, t] ⊗ F t measurable. There<strong>for</strong>e, X n (t) is F tmeasurable. Since X n is also continuous, X n is progressive by Lemma 2.2.4. Since X nis bounded <strong>and</strong> zero from a certain point onwards, we conclude X n ∈ L 2 (µ).It remains to show convergence in ‖ · ‖ µ of X n to X. By Theorem A.2.4, X n (ω)converges Lebesgue almost surely to X(ω) <strong>for</strong> all ω ∈ Ω. This means that we haveconvergence λ⊗P everywhere. Absolute continuity yields that X n converges µ-almostsurely to X. Since X n <strong>and</strong> X have common support [0, T + 1] × Ω of finite measure<strong>and</strong> are bounded by the common constant ‖X‖ ∞ , by dominated convergence we obtainlim ‖X − X n ‖ µ = 0.Step 4: Approximation by elementary processes. Finally, we show that <strong>for</strong> eachX ∈ L 2 (µ) which is continuous, bounded <strong>and</strong> zero from some point T <strong>and</strong> onwards,we can approximate X with elements of bE.To do this, we defineX n (t) =∞∑k=0( ) kX2 n 1 [k2 n , k+1)(t).2 nSince X n is zero from T <strong>and</strong> onwards, the sum defining X n is zero from the termnumber [2 n T ] + 2 <strong>and</strong> onwards. We there<strong>for</strong>e conclude X n ∈ bE. Since k2= [2n t]n 2 nwhen k2≤ t < k+1n 2, we can also write X n n (t) = X( [2n t]2). In particular, X n n is zerofrom T + 1 <strong>and</strong> onwards. Since lim [2n t]2= t <strong>for</strong> all t ≥ 0, it follows from the continuitynof X that X n converges pointwise to X. As in the previous step, since X n <strong>and</strong> Xhas common support of finite measure <strong>and</strong> are bounded by the common constant‖X‖ ∞ , dominated convergence yields lim ‖X − X n ‖ µ = 0. This shows the claim of thetheorem.Note that in the proof of Theorem 3.1.7, we actually showed a stronger result than


3.1 The space of elementary processes 49just that bE is dense in L 2 (µ). Our final stage of approximations were with processesin the set{ n }∑bS = Z k (s k , s k ]∣ s k ≤ t k are deterministic times <strong>and</strong> Z k ∈ bF σkk=1so we have in fact demonstrated that bS is dense in L 2 (µ), even though we will notneed this stronger fact.In our coming definition of the integral, we will have at our disposal an integral mappingI : bE → cM 2 0 which is linear <strong>and</strong> isometric, when bE is considered as a subspaceof a suitable space L 2 space on Σ π . The following theorem shows that such mappingsalways can be extended from bE to the L 2 space.Theorem 3.1.8. Let µ be a measure on Σ π , absolutely continuous with respect toλ ⊗ P <strong>and</strong> such that µ([0, T ] × Ω) < ∞ <strong>for</strong> all T > 0. Let I : bE → cM 2 0 be linear <strong>and</strong>isometric, considering bE as a subspace of L 2 (µ). There exists a linear <strong>and</strong> isometricextension of I to L 2 (µ). This mapping is unique up to indistinguishability.Comment 3.1.9 The linearity stated in the theorem is to be understood as linearityup to indistinguishability. This kind of “up to indistinguishability” qualifier will beimplicit in much of the following.◦Proof. First, we construct the mapping. Let X ∈ L 2 (µ). By Theorem 3.1.7, bEis dense in L 2 (B), so there exists a sequence H n in bE converging to X in L 2 (µ).In particular, H n is a cauchy sequence in L 2 (µ). By the isometry property of I onbE, I(H n ) is then a cauchy sequence in cM 2 0. Since cM 2 0 is complete by Lemma2.4.9, there exists Y ∈ cM 2 0 such that I(H n ) converges to Y , <strong>and</strong> Y is unique up toindistinguishability. We now have a c<strong>and</strong>idate <strong>for</strong> I(X). However, we need to makesure that this c<strong>and</strong>idate does not depend on the sequence in bE approximating X.Assume there<strong>for</strong>e that K n is another such sequence, <strong>and</strong> let Z be a limit of I(K n ).Then, we have‖Y − Z‖ cM 20≤ lim sup ‖Y − I(H n )‖ cM 20+ ‖I(H n ) − I(K n )‖ cM 20+ ‖I(K n ) − Z‖ cM 20= lim sup ‖I n (H n − K n )‖ cM 20= lim sup ‖H n − K n ‖ cM 20,<strong>and</strong> since lim sup ‖H n − K n ‖ µ ≤ lim sup ‖H n − X‖ µ + ‖X − K n ‖ µ = 0, we concludethat <strong>for</strong> each X ∈ L 2 (µ), there is an element Y of cM 2 0, unique up to indistinguisha-


50 The Itô Integralbility, such that <strong>for</strong> any sequence (H n ) in bE converging to X in L 2 (µ), it holds thatlim I(H n ) = Y . We define I(X) as some element in the equivalence class of Y .Having constructed the mapping I, we need to check the properties of the mappinggiven in the theorem. To prove the isometry property, let X ∈ L 2 (µ) <strong>and</strong> let (H n ) ⊆ bEconverge to X in L 2 (µ). Then I(X) = lim I(H n ) in cM 2 0, <strong>and</strong> by the isometry propertyon bE,‖I(X)‖ cM 20= lim ‖I(H n )‖ cM 20= lim ‖H n ‖ µ = lim ‖X‖ µ .Next, we show linearity. We wish to prove that <strong>for</strong> X, Y ∈ L 2 (B) <strong>and</strong> λ, µ ∈ R, it holdsthat I(λX +µY ) = λI(X)+µI(Y ) up to indistinguishability. We know that the claimholds <strong>for</strong> elements of bE. Letting (H n ) <strong>and</strong> (K n ) be sequences in bE approximatingX <strong>and</strong> Y , respectively, we then findI(λX + µY ) = I(λ lim H n + µ lim K n )= lim I(λH n + µK n )= lim λI(H n ) + µI(K n )= λI(X) + µI(Y ),up to indistinguishability, where the limits are in L 2 (µ) <strong>and</strong> cM 2 0.This proves the statements about I. It remains to prove uniqueness of I. Assumethere<strong>for</strong>e that we have two mappings, I <strong>and</strong> J, with the properties stated in thetheorem. Obviously, I(H) = J(H) <strong>for</strong> H ∈ bE. Let X ∈ L 2 (µ) <strong>and</strong> let (H n ) be asequence in bE approximating H. Then,‖I(X) − J(X)‖ cM 20= ‖ lim I(H n ) − lim J(H n )‖ cM 20= lim ‖I(H n ) − J(H n )‖ cM 20= 0,showing that I(X) <strong>and</strong> J(X) are indistinguishable.Comment 3.1.10 The proof of the existence of the extension in Theorem 3.1.8 wasmade without reference to st<strong>and</strong>ard theorems. Another, less direct, method of proof,would be to define the integral modulo equivalence classes making the pseudometricspaces involved into metric spaces. In this case, results <strong>for</strong> continuous extensionsfrom the theory of metric spaces can be applied directly to obtain the extension, seeTheorem 8.16 of Carothers (2000).◦We are now ready to begin the development of the stochastic integral proper.


3.2 Integration with respect to Brownian motion 513.2 Integration with respect to Brownian motionIn this section, we define the stochastic integral with respect to Brownian motion. Theconstruction takes place in three phases. Our results from Section 2.5 <strong>and</strong> Section 3.1have paved much of the way <strong>for</strong> our work, so we will be able to concentrate on theimportant parts of the process, not having to deal with technical details. The threephases are:1. Definition of the integral on bE by a Riemann-sum definition.2. Definition of the integral on the space L 2 (W ) by a density argument.3. Definition of the integral on L 2 (W ) by a localisation argument.Let W be a one-dimensional F t Brownian motion. We begin by defining <strong>and</strong> investigatingthe stochastic integral on bE. From Theorem 3.1.3 <strong>and</strong> Lemma 3.1.6, weknow that the mapping I W from bE to the space of stochastic processes given by, <strong>for</strong>H = ∑ ∞k=1 Z k(σ k , τ k ],n∑I W (H) = Z k (W τ k− W σ k)k=1is well-defined <strong>and</strong> linear. We will now show that I W maps into the space cM 2 0 ofcontinuous square-integrable martingales zero at zero.Lemma 3.2.1. W t <strong>and</strong> W 2 t − t are F t martingales.Comment 3.2.2 It is well known that if W t <strong>and</strong> W (t) 2 − t are martingales withrespect to the filtration generated by the Brownian motion W itself. The importantstatement of the lemma is that in our case, where we have an F t Brownian motion,the martingales can be taken with respect to F t .◦Proof. We first show that W is an F t martingale. Let s ≤ t be given, it then holdsthat W t − W s = W s+t−s − W s is independent of F s <strong>and</strong> normally distributed. Inparticular, E(W t − W s |F s ) = E(W t − W s ) = 0, showing the martingale property.To show that W 2 t − t is a F t martingale, we note that <strong>for</strong> s ≤ t, we haveW 2 t − W 2 s = (W t − W s ) 2 + 2(W t − W s )W s ,


52 The Itô Integral<strong>and</strong> there<strong>for</strong>e, since W t − W s is independent of F s ,E(W 2 t − W 2 s |F s ) = E((W t − W s ) 2 |F s ) + 2W s E(W t − W s |F s )showing E(W 2 t − t|F s ) = W 2 s − s.= t − s,Theorem 3.2.3. For any H ∈ bE, I W (H) ∈ cM 2 0 <strong>and</strong> EI W (H) 2 ∞ = E ∫ ∞0H 2 t dt.Proof. The process I W (H) is clearly continuous, since W is continuous.Step 1: The martingale property. We use Lemma 2.4.13. Let H ∈ bE withH = ∑ nk=1 Z k[σ k , τ k ), we then have lim t→∞ I W (H) t = ∑ nk=1 Z k(W σk − W τk ), soI W (H) has a well-defined almost sure limit. Now let τ be any stopping time. Since τ k<strong>and</strong> σ k are bounded stopping times, optional sampling <strong>and</strong> Lemma 2.4.11 yieldsE(H · W ) τ = E=n∑k=1Z k (W τ kτ− W σ kτ )n∑E(Z k E(Wτ τ k− Wσ τ k|F σk ))k=1= 0.By Lemma 2.4.13, then, H · W is a martingale.Step 2: Square-integrability. To show that I W (H) is bounded in L 2 , it willsuffice to show the moment equality <strong>for</strong> I W (H) ∞ , since I W (H) 2 is a submartingale<strong>and</strong> its second moments are there<strong>for</strong>e increasing. As be<strong>for</strong>e, let H ∈ bE withH = ∑ nk=1 Z k[σ k , τ k ). By Lemma 3.1.2, we can assume that the stopping times areordered such that τ k ≤ σ k+1 <strong>for</strong> all k < n.We findEI W (H) 2 ∞(∑ n= E Z k (W τk − W σk )= E=k=1n∑k=1 i=1) 2n∑Z k Z i (W τk − W σk )(W τi − W σi )n∑EZk(W 2 τk − W σk ) 2 +k=1k≠in∑EZ k Z i (W τk − W σk )(W τi − W σi ).


3.2 Integration with respect to Brownian motion 53Here we have, applying optional sampling <strong>for</strong> bounded stopping times <strong>and</strong> using thatW 2 t − t by Lemma 3.2.1 is a F t -martingale,EZ 2 k(W τk − W σk ) 2= EZ 2 kE((W τk − W σk ) 2 |F σk )= EZ 2 kE(W 2 τ k− τ k + W 2 σ k+ σ k − 2W τk W σk |F σk ) + EZ 2 kE(τ k − σ k |F σk )= EZ 2 kE(W 2 τ k− τ k − (W 2 σ k− σ k )|F σk ) + EZ 2 k(τ k − σ k )= EZ 2 k(τ k − σ k ).Furthermore, <strong>for</strong> i < k we find that, again by optional sampling, using the ordering ofthe stopping times,Finally, we concludeas desired.EZ k Z i (W τk − W σk )(W τi − W σi )= EZ i (W τi − W σi )Z k E((W τk − W σk )|F σk )= 0.EI W (H) 2 ∞ =n∑EZk(τ 2 k − σ k ) = Ek=1∫ ∞0H 2 u du,We have now proven that I W maps bE into cM 2 0. If we can embed bE into a L 2 -spacesuch that I W becomes an isometry, we can use Theorem 3.1.8 to extend I W to thisspace.Definition 3.2.4. By L 2 (W ) we denote the set of adapted <strong>and</strong> measurable processesX such that E ∫ ∞0X 2 t dt < ∞. L 2 (W ) is then equal to the L 2 space of square integrablefunctions on [0, ∞) × Ω with the progressive σ-algebra Σ π under the measure λ ⊗ P .We denote the L 2 seminorm on L 2 (W ) by ‖ · ‖ W .Theorem 3.2.5. There exists a linear isometric mapping I W : L 2 (W ) → cM 2 0 suchthat <strong>for</strong> H ∈ bE, with H = ∑ nk=1 Z k[σ k , τ k ), I W (H) = ∑ nk=1 Z k(W τ k− W σ k). Thismapping is unique up to indistinguishability.Proof. Considering bE as a subspace of L 2 (W ), we have by Theorem 3.2.3 <strong>for</strong> H ∈ bEthat ‖I W (H)‖ 2 = ‖IcM 2 W (H) ∞ ‖ 2 2 = E ∫ ∞H(t) 2 dt = ‖H‖ 200 W , so I W : bE → cM 2 0 isisometric. Since it is also linear, the conditions of Theorem 3.1.8 are satisfied, <strong>and</strong> theconclusion follows.


54 The Itô IntegralComment 3.2.6 The isometry property of I W is known as Itô’s isometry.◦We still need to make a final extension of the stochastic integral with respect toBrownian motion. This extension is done by the technique of localisation. It is duringthis step that the results of Section 2.5 will prove worth their while.The basic idea of the localisation is that if a process is in L 2 (W ) when we cut it off<strong>and</strong> put it to zero from a certain point onwards, then we can define the integral up tothis time. If the cutoff times tend to infinity, we can paste the integrals together <strong>and</strong>obtain a reasonable stochastic integral.The cutoff procedure will, in order to obtain an appropriate level of generality, bedone at a stopping time. We will need two kinds of localization, described in Section2.5, namely stopping <strong>and</strong> zero-stopping. We review the localisation concepts here<strong>for</strong> completeness. We say that a process M is locally, or L-locally, in cM 2 0 if thereis a sequence of stopping times τ n increasing to infinity such that M τ n, defined byM τ nt = M τn ∧t, is in cM 2 0. We say that a process X is zero-locally, or L 0 -locally,in L 2 (W ) if there is a sequence of stopping times τ n increasing to infinity such thatX[0, τ n ], defined by X[0, τ n ] t = X t 1 [0,τ] (t), is in L 2 (W ).We denote the processes L-locally in cM 2 0 by cM L 0 , <strong>and</strong> we denote the processes L 0 -locally in L 2 (W ) by L 2 (W ). We call cM L 0 the space of continuous local martingales.We are going to use Theorem 2.5.11 to prove that there is a unique extension of I Wfrom L 2 (W ) → cM 2 0 to L 2 (W ) → cM L 0 such that I W (X) τ = I W (X[0, τ]) <strong>for</strong> anystopping time τ. We will there<strong>for</strong>e need to check the hypotheses of this theorem.Lemma 3.2.7. The spaces L 2 (W ) <strong>and</strong> L 2 (W ) are linear <strong>and</strong> L 0 -stable. The spacescM 2 0 <strong>and</strong> cM L 0 are linear <strong>and</strong> L-stable.Proof. By Lemma 2.5.18, L 2 (W ) is L 0 -stable, <strong>and</strong> it is clearly linear. By Lemma 2.5.9,L 2 (W ) is linear <strong>and</strong> L 0 -stable. By Lemma 2.4.11, martingales are L-stable. Then cM 2 0is also L-stable. Since cM 2 0 is also linear, Lemma 2.5.9 yields that cM L 0 is linear.Theorem 3.2.8. Let X ∈ L 2 (W ) <strong>and</strong> let τ be a stopping time. Then it holds thatI(X) τ = I(X[0, τ]).Proof. We first show that the result holds <strong>for</strong> bE, <strong>and</strong> then extend by a density argument.


3.2 Integration with respect to Brownian motion 55Step 1: The elementary case. Let H ∈ bE with H = ∑ nk=1 Z k(σ k , τ k ]. We havethatn∑n∑H[0, τ](t) = Z k 1 (σk ,τ k ](t)1 [0,τ] (t) = Z k 1 (σk ≤τ)1 (σk ∧τ,τ k ∧τ](t),k=1<strong>and</strong> by Lemma 2.3.8, Z k 1 (σk ≤τ) ∈ F σk ∧τ . There<strong>for</strong>e, H[0, τ] ∈ bE, <strong>and</strong> we haveI(H[0, τ]) t ==k=1n∑Z k 1 (σk ≤τ)(W τk ∧τ∧t − W σk ∧τ∧t)k=1n∑Z k (W τk ∧τ∧t − W σk ∧τ∧t)k=1= I(H) τ t .Step 2: The general case. Now let X ∈ L 2 (W ) be arbitrary <strong>and</strong> let X n be asequence in bE converging to X. We trivially have ‖X n [0, τ]−X[0, τ]‖ W ≤ ‖X n −X‖ W ,so X n [0, τ] converges to X[0, τ] in L 2 (W ). Since I(X n [0, τ]) = I(X n ) τ , the isometryproperty of the integral yields‖I(X[0, τ]) − I(X) τ ‖ cM 20≤ ‖I(X[0, τ]) − I(X n [0, τ])‖ cM 20+ ‖I(X n ) τ − I(X) τ ‖ cM 20≤ ‖I(X[0, τ]) − I(X n [0, τ])‖ cM 20+ ‖I(X n ) − I(X)‖ cM 20= ‖X[0, τ] − X n [0, τ]‖ W + ‖X n − X‖ W≤ 2‖X n − X‖ W .Thus, I(X τ ) = I(X) τ up to indistinguishability, as desired.With Theorem 3.2.8 in h<strong>and</strong>, we are ready to extend the integral.Theorem 3.2.9. There exists an extension of I W : L 2 (W ) → cM 2 0 to a mappingfrom L 2 (W ) to cM L 0 . The extension is uniquely determined by the criterion that(X · W ) τ = X[0, τ] · W <strong>for</strong> any X ∈ L 2 (W ). The extension is linear.Proof. We apply Theorem 2.5.11 with the localisation concepts L 0 of zero-stopping<strong>and</strong> L of stopping. From Lemma 3.2.7, we know that L 2 (W ) is L 0 -stable <strong>and</strong> cM 2 0is L-stable. And by Theorem 3.2.8, the localisation hypothesis of Theorem 2.5.11 issatisfied. Theorem 2.5.11 there<strong>for</strong>e immediately yields the desired extension, <strong>and</strong> byCorollary 2.5.12, the extension is linear.


56 The Itô IntegralTheorem 3.2.9 yields the “pasting” definition of the integral of processes in L 2 (W )discussed earlier. For general X ∈ L 2 (W ) with localising sequence τ n , the propertyI W (X) τ n= I W (X[0, τ n ]) relates the values of the integral <strong>for</strong> general X to the valuesof the integral on L 2 (W ).We have now defined the stochastic integral with respect to Brownian motion <strong>for</strong> asmany processes as we will need. Our next goal is to extend the integral to other integratorsthan Brownian motion. Be<strong>for</strong>e this, we summarize our results: Our stochasticintegral I W with respect to Brownian motion is a linear mapping from L 2 (W ) to cM L 0 .It has the following properties:1. For X ∈ L 2 (W ), X · W is a square-integrable martingale.2. For H ∈ bE with H = ∑ nk=1 Z k(σ k , τ k ], H · W = ∑ nk=1 Z k(W τ k− W σ k).3. For any X ∈ L 2 (W ) <strong>and</strong> any stopping time τ, (X · W ) τ = X[0, τ] · W .3.3 Integration with respect to integrators in M L WWe now use the results from Section 3.2 on integration with respect to Brownianmotion to define the stochastic integral <strong>for</strong> larger classes of integrators <strong>and</strong> integr<strong>and</strong>s.We begin with a lemma to set the mood. We still denote by W a one-dimensional F tBrownian motion.Lemma 3.3.1. Let H ∈ bE with H = ∑ ∞k=1 Z k(σ k , τ k ] <strong>and</strong> let Y ∈ L 2 (W ). ThenHY ∈ L 2 (W ) <strong>and</strong> ∑ nk=1 Z k((Y · W ) τ k− (Y · W ) σ k) = HY · W .Proof. Since H is bounded <strong>and</strong> Σ π -measurable, it is clear that HY ∈ L 2 (W ). To provethe other statement, note that by linearity, it will suffice to consider H of the <strong>for</strong>mH = Z(σ, τ].Step 1: The case Y ∈ bE. Assume Y = ∑ nk=1 X k(σ k , τ k ], we then findZ(σ, τ]Y ==n∑Z(σ, τ]X k (σ k , τ k ]k=1n∑ZX k 1 (σk ≤τ)(σ k ∧ τ ∨ σ, τ k ∧ τ ∨ σ],k=1


3.3 Integration with respect to integrators in M L W 57by Lemma 3.1.4, where X k 1 (σk ≤τ) is F σk ∧τ measurable by Lemma 2.3.8, so ZX k 1 (σk ≤τ)is F σk ∧τ∨σ measurable. This shows Z(σ, τ]Y ∈ bE, <strong>and</strong> we may then concludeZ((Y · W ) τ − (Y · W ) σ )( n)∑n∑= Z X k (W τk∧τ − W σk∧τ ) − X k (W τk∧σ − W σk∧σ )===k=1k=1n∑ZX k (W τk∧τ − W σk∧τ − W τk∧σ + W σk∧σ )k=1n∑ZX k (W τk∧τ∨σ − W σk∧τ∨σ )k=1n∑ZX k 1 (σk ≤τ) (W τk∧τ∨σ − W σk∧τ∨σ )k=1= HY · W.Step 2: The case Y ∈ L 2 (W ). Assuming Y ∈ L 2 (W ), let H n be a sequence inbE converging towards Y . Since H is bounded, lim ‖HY − HH n ‖ W = 0. From thecontinuity of the integral, H n ·W converges to Y ·W . There<strong>for</strong>e, (H n ·W ) τ −(H n ·W ) σconverges to (Y · W ) τ − (Y · W ) σ . Because Z is bounded, we obtainZ((Y · W ) τ − (Y · W ) σ ) = limnZ((H n · W ) τ − (H n · W ) σ )where the limits are in L 2 (W ).= limnHH n · W= limnHY · W,Step 3: The case Y ∈ L 2 (W ). Let τ n be a localising sequence <strong>for</strong> Y . Then(Z(Y · W ) τ − (Y · W ) σ ) τ = Z((Y [0, τ n ] · W ) τ − (Y [0, τ n ] · W ) σ )= HY [0, τ n ] · W= (HY · W ) τ n,<strong>and</strong> letting τ n tend to infinity, we obtain the result.What Lemma 3.3.1 shows is that if we consider Y ·W as an integrator, then the integralof an element H ∈ bE based on the Riemann-sum definition of Section 3.1 satisfies


58 The Itô IntegralH · (Y · W ) = HY · W . We call this property the associativity of the integral. We willuse this as a criterion to generalize the stochastic integrals to integrators of the <strong>for</strong>mY · W where Y ∈ L 2 (W ). The most obvious thing to do would be to define, givenM = Y · W with Y ∈ L 2 (W ), the integral of a process X with XY ∈ L 2 (W ) withrespect to M by X ·M = XY ·W . This is what we will do, but with a small twist. Untilnow, in this section <strong>and</strong> the preceeding, we have only considered a one-dimensionalF t Brownian motion. In our financial applications, we are going to need to work withmultidimensional Brownian motion, <strong>and</strong> we are going to need integrators that dependon several coordinates of such a multidimensional Brownian motion at once. There<strong>for</strong>e,we will now consider an n-dimensional F t Brownian motion W = (W 1 , . . . , W n ), asdescribed in Definition 2.1.5. Our goal will be to define the stochastic integral withrespect to integrators of the <strong>for</strong>mM t =n∑k=1∫ t0Y ks dW k sby associativity, as described above. We maintain the definitions of Section 3.2 byletting L 2 (W ) denote the Σ π measurable elements of Y ∈ L 2 (λ ⊗ P ) <strong>and</strong> lettingL 2 (W ) denote the processes L 0 -locally in L 2 (W ). The next lemma shows that each ofthe coordinate processes fall under the category we worked with in the last section.Lemma 3.3.2. If W = (W 1 , . . . , W n ) is an n-dimensional F t Brownian motion, then<strong>for</strong> each k ≤ n, W k is a one-dimensional F t Brownian motion.Proof. Let t ≥ 0. We know that s ↦→ W t+s − W t is a n-dimensional Brownian motionwhich is independent of F t . There<strong>for</strong>e s ↦→ Wt+s k − Wtk is a one-dimensional Brownianmotion independent of F t .In the following, whenever we write Y ∈ L 2 (W ) n , we mean that Y is an n-dimensionalproces, Y = (Y 1 , . . . , Y n ), such that Y k ∈ L 2 (W ) <strong>for</strong> each k. This notation is ofcourse also extended to other spaces than L 2 (W ).Definition 3.3.3. Let cM L W denote the class of processes M such that there existsY ∈ L 2 (W ) n with M t = ∑ n∫ tk=1 0 Y s k dWs k . In this case, we also write M = Y · W .In order to define the integral with respect to processes in cM L W by associativity, weneed to check that the integr<strong>and</strong> processes Y k in the representation of elements ofare unique. This is our next order of business.cM L W


3.3 Integration with respect to integrators in M L W 59Lemma 3.3.4. For any i, j ≤ n with i ≠ j, W i W j is a martingale.Proof. Since the conditional distribution of (Wt i − Ws, i W j t − Ws j ) given F s is twodimensionalnormal with mean zero <strong>and</strong> zero covariation, we haveE(W i t W j t + W i sW j s |F s )= E((W i t − W i s)(W j t − W j s )|F s ) + E(W i t W j s |F s ) + E(W i sW j t |F s )= W j s E(W i t |F s ) + W i sE(W j t |F s )= 2W j s W i s,so E(W i t W j t |F s ) = W i sW j s , <strong>and</strong> W i W j is a martingale.Lemma 3.3.5. Let i, j ≤ n with i ≠ j. For any Y i , Y j ∈ L 2 (W ), (Y i · W i )(Y j · W j )is a uni<strong>for</strong>mly integrable martingale.Proof. We use Lemma 2.4.13. Clearly, (Y i · W i )(Y j · W j ) has an almost sure limit. Itwill there<strong>for</strong>e suffice to show that <strong>for</strong> any stopping time τ,E∫ τ0Y is dW i s∫ τ0Y js dW j s = 0.Step 1: The elementary case. First consider the case where Y i , Y j ∈ bE withWe then find= E==∫ τE0( n∑n∑Y i =k=1k=1 m=1n∑n∑Zk(σ i k , τ k ] <strong>and</strong> Y j =k=1∫ τYs i dWsin∑n∑k=1 m=10Z i k((W i ) τ k τY js dW j sEZ i kZ j m((W i ) τ k τEZ i kZ j m((W i ) τ k τ− (W i ) σ kτ )) ( n∑k=1n∑Z j k (σ k, τ k ].k=1Z j k ((W j ) τ k τ− (W j ) σ kτ )− (W i ) σ kτ )((W j ) τ m τ − (W j ) σ m τ )− (W i ) σ kτ )((W j ) τmτ − (W j ) σmτ ).Now, by Lemma 3.3.4, W i W j is a martingale, <strong>and</strong> there<strong>for</strong>e (W i W j ) τ is also a martingale.Using the optional stopping theorem, we there<strong>for</strong>e find that the above is zero.We may then conclude E ∫ τ0 Y ∫s i dWsi τ0 Y s j dWs j = 0.)


60 The Itô IntegralStep 1: The general case. In the case of general Y i , Y j ∈ L 2 (W ), there existssequences (H n ) <strong>and</strong> (K n ) in bE converging to Y i <strong>and</strong> Y j , respectively. By Lemma3.1.2, we may assume that H n <strong>and</strong> K n are based on the same stopping times, <strong>and</strong>there<strong>for</strong>e, by what we have just shown,E∫ τ0H n (s) dW i s∫ τ0K n (s) dW j s = 0.Now, since ∫ τ0 H n(s) dWs i converges to ∫ τ0 Y s i dWsi <strong>and</strong> ∫ t0 K n(s) dWs j converges to∫ τ0 Y s j dWsj in L 2 (W ) by the Itô isometry, Lemma B.2.1 yields that the product convergesin L 1 <strong>and</strong> there<strong>for</strong>e∫ τ ∫ τ∫ τ∫ τE Ys i dWsi Ys j dWs j = 0 = lim E H n (s) dWsi K n (s) dWs j = 0,00n00as desired.Lemma 3.3.6. If Y ∈ L 2 (W ) n , then E(Y · W ) 2 ∞ = ∑ nk=1 ‖Y k ‖ 2 W .Proof. By Lemma 3.3.5, the Itô isometry <strong>and</strong> optional sampling <strong>for</strong> uni<strong>for</strong>mly integrablemartingales,E( n∑k=1∫ ∞0Y ks dW k s) 2===n∑n∑Ek=1 i=1∫ ∞0n∑(∫ ∞Ek=10n∑‖Y k ‖ 2 W .k=1Y ks dW k s) 2Ys k dWsk∫ ∞0Y is dW i sLemma 3.3.7. Assume M ∈ cM L W with M = ∑ nk=1 Y ks dW k s <strong>and</strong> M = ∑ nk=1 Zk s dW k s .Then Y k <strong>and</strong> Z k are equal λ ⊗ P almost surely <strong>for</strong> k ≤ n.Proof. First consider the case where Y k , Z k ∈ L 2 (W ) <strong>for</strong> k ≤ n. We then find, byLemma 3.3.6,E( n∑k=1∫ ∞0Y ks dW k s −n∑k=1∫ ∞0Z k s dW k s) 2= E=( n∑n∑k=1k=1∫ ∞0Y ks‖Y ks − Z k s ‖ 2 W .− Z k s dW k s) 2


62 The Itô Integralprocesses, that it preserves stopping in an appropriate manner, <strong>and</strong> associativity. Afterthis, we will work to underst<strong>and</strong> under which conditions the integral is a squareintegrablemartingale. We will also develop some approximation results that will beuseful in our work with the integral. In the following, let M ∈ cM L W with M = Y · W ,Y ∈ L 2 (W ) n , unless anything else is implied.It is clear that cM L W <strong>and</strong> L2 (M) are linear spaces, <strong>and</strong> by I M (X) = ∑ nk=1 I W k(XY k ),it is immediate that I M is linear <strong>for</strong> any M ∈ cM L W .Lemma 3.3.10 (Riemann representation). If H ∈ bE with H = ∑ mk=1 Z k(σ k , τ k ],H ∈ L 2 (M) <strong>and</strong> H · M = ∑ mk=1 Z k(M τ k− M σ k).Proof. By Lemma 3.3.1, HY k ∈ L 2 (W k ) <strong>for</strong> all k ≤ n, so H ∈ L 2 (M), <strong>and</strong>(H · M) t ====n∑i=1n∑∫ t0i=1 k=1m∑ ∑nZ kk=1H s Y is dW i sm∑Z k ((Y i · W i ) τ k− (Y i · W i ) σ k)i=1(Y i · W i ) τ k− (Y i · W i ) σ km∑Z k (M τ k− M σ k),k=1as desired.For Y ∈ L 2 (W ) n , we write Y [0, τ] = (Y 1 [0, τ], . . . , Y n [0, τ]). From Theorem 3.2.9, itis clear that (Y · W ) τ = Y [0, τ] · W .Lemma 3.3.11 (Localisation properties). The spaces cM L W <strong>and</strong> L2 (M) are stableunder stopping <strong>and</strong> zero-stopping, respectively, <strong>and</strong> L 2 (M) ⊆ L 2 (M τ ). It holds that(X · M) τ = X[0, τ] · M = X · M τ .Proof. cM L W is L-stable since (Y ·W )τ = Y [0, τ]·W <strong>and</strong> Y [0, τ] ∈ L 2 (W ) n by Lemma3.2.7. L 2 (M) is L 0 -stable because, if X ∈ L 2 (M), X[0, τ]Y k = (XY k )[0, τ], which isin L 2 (W ) by Lemma 3.2.7. To show that L 2 (M) ⊆ L 2 (M τ ), consider X ∈ L 2 (M).Then XY ∈ L 2 (W ) n , so XY [0, τ] ∈ L 2 (W ) n . Because M τ = Y [0, τ] · W , X ∈ L 2 (M τ )


3.3 Integration with respect to integrators in M L W 63follows. We then obtain (X · M) τ = X[0, τ] · M from(X · M) τ = (XY · W ) τ= (XY )[0, τ] · W= X[0, τ]Y · W= X[0, τ] · M.And the equality (X · M) τ = X · M τ follows since(X · M) τ = (XY · W ) τ= XY [0, τ] · W= X · M τ .Lemma 3.3.12 (Associativity). If Z ∈ L 2 (X · M), then ZX ∈ L 2 (M) <strong>and</strong> it holdsthat Z · (X · M) = ZX · M.Proof. We know that X · M = XY · W , so Z ∈ L 2 (X · M) means ZXY ∈ L 2 (W ) n ,which is equivalent to ZX ∈ L 2 (M). We then obtainas desired.Z · (X · M) = Z · (XY · W )= ZXY · W= ZX · MThe linearity, the localisation properties of Lemma 3.3.11 <strong>and</strong> the associativity ofLemma 3.3.12 are the basic rules <strong>for</strong> manipulating the integral. Our next goal isto identify classes of integr<strong>and</strong>s such that the integral becomes a square-integrablemartingale. For M ∈ cM L W with M = Y · W , Y ∈ L2 (W ) n , let L 2 (M) be the spaceof Σ π measurable processes X such that XY ∈ L 2 (W ) n .Lemma 3.3.13. L 2 (M) is L 0 -stable, <strong>and</strong> L 0 (L 2 (M)) = L 2 (M).Proof. We first show that L 2 (M) is L 0 -stable. Assume that X ∈ L 2 (M), such thatXY ∈ L 2 (W ) n , <strong>and</strong> let τ be a stopping time. Then X[0, τ]Y = (XY )[0, τ] ∈ L 2 (W ),so X[0, τ] ∈ L 2 (M). There<strong>for</strong>e, L 2 (M) is L 0 -stable.


64 The Itô IntegralTo prove the other statement of the lemma, assume X ∈ L 0 (L 2 (M)) <strong>and</strong> let τ n bea localising sequence, X[0, τ n ] ∈ L 2 (M). Since X[0, τ n ] is Σ π measurable, so is X.By definition, X[0, τ n ]Y ∈ L 2 (W ) n , so (XY )[0, τ n ] ∈ L 2 (W ) n , <strong>and</strong> it follows thatXY ∈ L 2 (W ) n , showing X ∈ L 2 (M) <strong>and</strong> there<strong>for</strong>e L 0 (L 2 (M)) ⊆ L 2 (M). Conversely,assume X ∈ L 2 (M). Then XY ∈ L 2 (W ) n , so there is a localising sequence τ n yielding(XY )[0, τ n ] ∈ L 2 (W ) n . Then X[0, τ n ]Y ∈ L 2 (W ) n <strong>and</strong> X[0, τ n ] ∈ L 2 (M), showingX ∈ L 0 (L 2 (M)).Lemma 3.3.14. L 2 (M) is equal to L 2 ([0, ∞) × Ω, Σ π , µ M ), where µ M is the measurehaving density ∑ nk=1 (Y k ) 2 with respect to λ ⊗ P . We endow L 2 (M) with the L 2 -normcorresponding to µ M <strong>and</strong> denote this norm by ‖ · ‖ M .Proof. Let X ∈ L 2 (M). Then X is Σ π measurable <strong>and</strong> XY ∈ L 2 (W ) n , so X 2 (Y k ) 2is integrable with respect to λ ⊗ P . Thus L 2 (M) ⊆ L 2 ([0, ∞) × Ω, Σ π , µ M ). On theother h<strong>and</strong>, let X ∈ L 2 ([0, ∞) × Ω, Σ π , µ M ). Then X is obviously Σ π measurable <strong>and</strong>∫‖XY k ‖ 2 W =∫X 2 (Y k ) 2 d(λ ⊗ P ) ≤X 2n ∑k=1∫(Y k ) 2 d(λ ⊗ P ) =X 2 dµ M < ∞,so X ∈ L 2 (M) <strong>and</strong> there<strong>for</strong>e L 2 ([0, ∞) × Ω, Σ π , µ M ) ⊆ L 2 (M).Comment 3.3.15 Note that since we only require Y ∈ L 2 (W ) n <strong>and</strong> not Y ∈ L 2 (W ) n ,the measure µ may easily turn out to be unbounded.◦Lemma 3.3.16. Let M ∈ cM L W <strong>and</strong> X ∈ L2 (M). X · M is a square-integrablemartingale if <strong>and</strong> only if X ∈ L 2 (M).Proof. If X ∈ L 2 (M), XY ∈ L 2 (W ) n <strong>and</strong> there<strong>for</strong>e, by linearity of cM 2 0,X · M = XY · W =n∑XY k · W k ∈ cM 2 0.k=1To prove the other implication, assume that X · M ∈ cM 2 0. Because X ∈ L 2 (M), weknow that XY ∈ L 2 (W ) n . Let τ m be a localising sequence with XY k [0, τ m ] ∈ L 2 (W )<strong>for</strong> k ≤ n. Since XY · W = X · M ∈ cM 2 0, (XY · W ) τ m∈ cM 2 0 as well. The Itô


3.3 Integration with respect to integrators in M L W 65isometry from Lemma 3.3.6 yields‖(XY k )[0, τ m ]‖ 2 W≤≤n∑‖XY k [0, τ m ]‖ 2 Wk=1n∑‖XY k ‖ 2 Wk=1= ‖XY · W ‖ 2 cM 2 0= ‖X · M‖ 2 cM 2 0,where the last expression is finite by assumption. Now, monotone convergence yields‖XY k ‖ W = lim sup ‖(XY k )[0, τ m ]‖ W ≤ ‖X · X‖ cM 20< ∞,mso XY k ∈ L 2 (W ) <strong>and</strong> thus X ∈ L 2 (M).Lemma 3.3.17. I M : L 2 (M) → cM 2 0 is isometric <strong>for</strong> any M ∈ cM L W .Proof. The conclusion is well-defined since I M (X) ∈ cM 2 0 <strong>for</strong> X ∈ L 2 (M) by Lemma3.3.16. To prove the result, we merely note <strong>for</strong> X ∈ L 2 (M) that by Lemma 3.3.6‖X · M‖ 2 cM 2 0= ‖XY · W ‖ 2 cM 2 0n∑= ‖XY k ‖ 2 W==k=1n∑∫X 2 (Y k ) 2 d(λ ⊗ P )k=1∫X 2 dµ M= ‖X‖ 2 M .Comment 3.3.18 The isometry property of I M is, as the isometry property of I Wgiven in Theorem 3.2.5, also known as Itô’s isometry.◦Lemma 3.3.16 solves the problem of determining when the stochastic integral is asquare-integrable martingale, <strong>and</strong> Lemma 3.3.17 tells us that <strong>for</strong> general integrators,we still have the isometry property that we proved in the case of Brownian motion.We have yet one more property of the integral that we wish to investigate. We are


66 The Itô Integralinterested in underst<strong>and</strong>ing when the integral process X · M can be approximated byintegrals of the <strong>for</strong>m H n · M <strong>for</strong> H n ∈ bE. To underst<strong>and</strong> this, we define cM 2 W as thesubclass of processes in cM L W of M = Y · W where Y ∈ L2 (W ) n .Lemma 3.3.19. cM 2 W is L-stable, <strong>and</strong> L(cM2 W ) = cML W .Proof. That cM 2 Wis L-stable follows since L2 (W ) is L 0 -stable. To show the otherstatement of the lemma, assume M ∈ cM L W , M = Y · W with Y ∈ L2 (W ) n . Let τ mbe a localising sequence such that Y [0, τ m ] ∈ L 2 (W ) n . Then M τm = Y [0, τ m ] · W , soM is locally in cM 2 W , showing cML W ⊆ L(cM2 W ). Conversely, assume M ∈ L(cM2 W )<strong>and</strong> let τ n be a localising sequence. Then we have M τ n∈ cM 2 W , <strong>and</strong> there existsY n ∈ L 2 (W ) such that M τ n= Y n · W . Clearly, thenY n+1 [0, τ n ] · W = (Y n+1 · W ) τn= (M τn+1 ) τn= M τn= Y n · W.In detail, this means ∑ nk=1 Y n+1[0, τ n ] · W k = ∑ nk=1 Y n k · W k . By Lemma 3.3.7,Y n+1 [0, τ n ] <strong>and</strong> Y n are then λ ⊗ P almost surely equal. Then the Pasting Lemma2.5.10 yields processes Y k such that Y k [0, τ n ] <strong>and</strong> Yn k [0, τ n ] are λ ⊗ P almost surelyequal <strong>for</strong> all n. In particular, Y ∈ L 2 (W ) n , <strong>and</strong> clearlyM τ n= Y n [0, τ n ] · W = Y [0, τ n ] · W = (Y · W ) τ n,so M = Y · W <strong>and</strong> L(cM 2 W ) ⊆ cML W .Lemma 3.3.20. If M ∈ cM 2 W , then bE is dense in L2 (M). In particular, <strong>for</strong> anyX ∈ L 2 (M), X · M can be approximated by elements H n · M with H n ∈ bE.Proof. Since M ∈ cM 2 W , M = Y · W <strong>for</strong> some Y ∈ L2 (W ) n . In particular, µ M is abounded measure, <strong>and</strong> by Theorem 3.1.7, bE is dense in L 2 (M). By Lemma 3.3.17,when X ∈ L 2 (M) <strong>and</strong> (H n ) ⊆ bE converges to X, we then obtain that H n · M alsoconverges to X · M.We are now done with our preliminary investigation of the stochastic integral <strong>for</strong>integrators in cM L W . Our next task will be to extend the integral to integrators witha component of finite variation. Be<strong>for</strong>e doing so, let is review our results so far. We


3.4 Integration with respect to st<strong>and</strong>ard processes 67have defined the stochastic integral X · M <strong>for</strong> integrators M ∈ cM L WX ∈ L 2 (M). We have proven that<strong>and</strong> integr<strong>and</strong>s• The integral is always a continuous local martingale.• The interal is in cM 2 0 if <strong>and</strong> only if X ∈ L 2 (M).• The integral is linear in the integr<strong>and</strong>.• I M is isometric from L 2 (M) to cM 2 0.• (X · M) τ = X[0, τ] · M = X · M τ .• bE ⊆ L 2 (M), <strong>and</strong> the integral takes its natural values on bE.• If M ∈ cM 2 W , µ M is bounded <strong>and</strong> bE is dense in L 2 (M).Note that nowhere in the above properties is the Brownian motion W , which ourwhole construction rests upon, mentioned. This fact will enable us to manipulate thestochastic integral without having to keep in mind the underlying definition in termsof the Brownian motion.3.4 Integration with respect to st<strong>and</strong>ard processesWe now come to the final extension of the stochastic integral, where we extend theintegral to cover integrators of the <strong>for</strong>m A + M, where M ∈ cM L W <strong>and</strong> A is adaptedwith continuous paths of finite variation. Since we already have defined the integralwith respect to M ∈ cM L W , we only have to define the integral with respect to A <strong>and</strong>make sure that the integrals with respect to A <strong>and</strong> M fit properly together.Because functions of finite variation correspond to measures, as can be seen fromAppendix A.1, we can use ordinary measure theory to define the integral with respectto A in a pathwise manner. In Appendix A.1 we have listed some fundamental resultsregarding functions of finite variation. For any mapping F : [0, ∞) → R, we have thevariation function,n∑V F (t) = sup |F (t k ) − F (t k−1 )|k=1


68 The Itô Integralwhere the supremum is over all partitions of [0, t]. V F is finite <strong>for</strong> all t if <strong>and</strong> onlyif F is of finite variation. By Lemma A.1.1, if F is continuous, then so is V F . ByTheorem A.1.2, we can associate to F a measure ν F such that ν F (a, b] = F (b) − F (a)<strong>for</strong> 0 ≤ a ≤ b, <strong>and</strong> as in Definition A.1.4, we can then define L 1 (F ) = L 1 (ν F ) <strong>and</strong>consider the integral with respect to F by putting ∫ ∞x(s) dF0 s = ∫ ∞x(s) dν0 F (s).We now transfer these ideas to stochastic processes. In order to make sure that ourstochastic integral with respect to processes of finite variation has sufficient measurabilityproperties, we will need to make the appropriate measurability assumptions onour integr<strong>and</strong>s. By cFV, we denote the class of adapted stochastic processes withcontinuous paths of finite variation. If A ∈ cFV, we denote by L 1 (A) the space ofprogressively measurable processes such that X(ω) ∈ L 1 (A(ω)) <strong>for</strong> all ω. By L 1 (A),we denote the space of processes locally in L 1 (A).We begin with a result which shows that we do not need to consider integrators whichare only locally of finite variation, since this would add nothing new to the theory.Lemma 3.4.1. cFV is a linear space, stable under stopping, <strong>and</strong> L(cFV) = cFV.Proof. It is clear that cFV is a linear space. We first show that cFV is stable understopping. Let A ∈ cFV be given, <strong>and</strong> let τ be any stopping time. Clearly, A τ hascontinuous paths. By Lemma 2.3.3, A τ is progressive, there<strong>for</strong>e adapted. Fix ω. Then,<strong>for</strong> any partition (t 0 , . . . , t n ) of [0, t],n∑|A τ t k(ω) − A τ t k−1(ω)| =k=1n∑|A tk ∧τ (ω) − A tk−1 ∧τ (ω)| ≤ V A(ω) (t),k=1so taking supremum over all partitions, we conclude V A τ (ω)(t) ≤ V A(ω) , <strong>and</strong> there<strong>for</strong>eA τ (ω) has finite variation as well, thus A τ ∈ cFV <strong>and</strong> cFV is stable under stopping.In particular, cFV ⊆ L(cFV), so to show equality it will suffice to show the otherinclusion. To this end, let A ∈ L(cFV), <strong>and</strong> let τ n be a localising sequence. Fix ω, lett > 0 <strong>and</strong> let n be so large that τ n (ω) ≥ t. Then V A(ω) (t) ≤ V A(ω) (τ n ) = V A τn (ω)(τ n ),which is finite, <strong>and</strong> we conclude that A(ω) has finite variation, showing A ∈ cFV.Next, we define the stochastic integral with respect to integrators A ∈ cFV <strong>and</strong>integr<strong>and</strong>s in L 1 (A). To show the adaptedness of the integral, we will need the conceptof a Markov kernel <strong>and</strong> the associated results, see Chapter 20 of Hansen (2004a) <strong>for</strong>this.


3.4 Integration with respect to st<strong>and</strong>ard processes 69Lemma 3.4.2. If X ∈ L 1 (A), the integral ∫ t0 X s(ω) dA s (ω) is well-defined as a pathwiseintegral. We write (X · A) t (ω) = ∫ t0 X s(ω) dA s (ω).Proof. Let τ n be a localising sequence <strong>for</strong> X <strong>and</strong> let ω ∈ Ω. Let n be so large thatτ n (ω) ≥ t, then∫ t0|X s (ω)| dA s (ω) ≤∫ t0|X[0, τ n ] s (ω)| dA s (ω) ≤∫ ∞0|X[0, τ n ] s (ω)| dA s (ω),which is finite since X[0, τ n ](ω) ∈ L 1 (A(ω)). Thus, the integral ∫ t0 X s(ω) dA s (ω) iswell-defined <strong>for</strong> all ω ∈ Ω.Lemma 3.4.3. Let X ∈ L 1 (A). The stochastic integral X · A is in cFV.Proof. First assume that X ∈ L 1 (A). We need to show that X · A is adapted, continuous<strong>and</strong> of finite variation. By Lemma A.1.3, the measure corresponding to A(ω)has no atoms, <strong>and</strong> there<strong>for</strong>e X · A is continuous. By Lemma A.1.6, X · A has paths offinite variation.To show that X ·A is adapted, let t > 0 <strong>and</strong> let µ(t, ω) be the measure on [0, t] inducedby A(ω) on [0, t]. We wish to show that (µ(t, ω)) ω∈Ω is a (Ω, F t ) Markov kernel on([0, t], B[0, t]). We there<strong>for</strong>e need to show that ω ↦→ µ(t, ω)(B) is F t measurable <strong>for</strong>any B ∈ B[0, t]. The family of elements of B[0, t] <strong>for</strong> which this holds is a Dynkin class,<strong>and</strong> it will there<strong>for</strong>e suffice to show the claim <strong>for</strong> intervals in [0, t]. Let 0 ≤ a ≤ b ≤ t.We then findµ(t, ω)[a, b] = A(b, ω) − A(a, ω),<strong>and</strong> by the adaptedness of A, this is F t measurable. Since X is progressively measurable,the extended Fubini Theorem then yields that(X · A) t (ω) =∫ t0X s (ω) dµ(t, ω)(s)is F t measurable as a function of ω, so X · A is adapted.To extend this to the local case, assume X ∈ L 1 (A) <strong>and</strong> let τ n be a localising sequence.Then, by the ordinary properties of the integral, (X · A) τn = (X[0, τ n ] · A). There<strong>for</strong>e,(X · A) τn is adapted, continuous <strong>and</strong> of finite variation. Clearly, then X · A is adaptedas well, <strong>and</strong> by Lemma 3.4.1 it is also continuous <strong>and</strong> of finite variation.Lemma 3.4.4. L 1 (A) is a linear space, <strong>and</strong> the integral I A : L 1 (A) → cFV is linear.Furthermore, if τ is any stopping time, X[0, τ] ∈ L 1 (A) <strong>and</strong> (X · A) τ = X[0, τ] · A.


70 The Itô IntegralProof. Since I A is defined as a pathwise Lebesgue integral, this follows from the resultsof ordinary integration theory.We now want to link the definition of the integral with respect to integrators in cFVtogether with our construction of the stochastic integral <strong>for</strong> integrators in cM L W . By S,we denote the space of processes of the <strong>for</strong>m A + M, where A ∈ cFV <strong>and</strong> M ∈ cM L W .We call processes in S st<strong>and</strong>ard processes. Be<strong>for</strong>e making the obvious definition ofintegrals with respect to processes in S, we need to make sure that the decompositionof such processes into finite variation part <strong>and</strong> local martingale part is unique.Lemma 3.4.5. If F has finite variation <strong>and</strong> is continuous, then V F can be writtenas V F (t) = sup ∑ nk=1 |F (t k) − F (t k−1 )|, where the supremum is taken over partitionswith rational timepoints.Proof. It will suffice to prove that <strong>for</strong> any ε > 0 <strong>and</strong> any partition (t 0 , . . . , t n ), thereexists another partition (q 0 , . . . , q n ) such thatn∑n∑|F (t k ) − F (t k−1 )| − |F (q k ) − F (q k−1 )|∣∣ < ε.k=1ε2nk=1To this end, choose δ parrying <strong>for</strong> the continuity of F in t 0, . . . , t n , <strong>and</strong> let, <strong>for</strong>k ≤ n, q k be some rational with |q k − t k | ≤ δ. Then|(F (t k ) − F (t k−1 )) − (F (q k ) − F (q k−1 ))| ≤ ε n ,<strong>and</strong> since | · | is a contraction, this implies||F (t k ) − F (t k−1 )| − |F (q k ) − F (q k−1 )|| ≤ ε n ,finally yielding≤n∑|F (t k ) − F (t k−1 )| −∣k=1n∑|F (q k ) − F (q k−1 )|∣k=1n∑||F (t k ) − F (t k−1 )| − |F (q k ) − F (q k−1 )||k=1≤ ε.Lemma 3.4.6. Let A ∈ cFV. Then V A is continuous <strong>and</strong> adapted.


3.4 Integration with respect to st<strong>and</strong>ard processes 71Proof. From Lemma A.1.1, we know that V A is continuous. To show that it is adapted,note that <strong>for</strong> any partition of [0, t] we have ∑ nk=1 |A t k− A tk−1 | ∈ F t . By Lemma 3.4.5,V A (t) can be written as the supremum of the above sums <strong>for</strong> partitions of [0, t] withrational timepoints. Since there are only countably many of these, it follows thatV A (t) ∈ F t .Lemma 3.4.7. Let M ∈ cM L 0 . If M has paths of finite variation, then M is evanescent.Proof. We first prove the lemma in a particularly nice setting <strong>and</strong> then use localisationto get the general result.Step 1: Bounded variation <strong>and</strong> square-integrability. First assume that V M isbounded <strong>and</strong> that M ∈ cM 2 0. Then, <strong>for</strong> any partition (t 0 , . . . , t n ) of [0, t] we obtainby Lemma 2.4.12,EM 2 t = E≤≤n∑(M tk − M tk−1 ) 2k=1E sup |M tk − M tk−1 |k≤nn∑|M tk − M tk−1 |k=1‖V M (t)‖ ∞ E sup |M tk − M tk−1 |.k≤nNow, by the continuity of M, sup k≤n |M tk − M tk−1 | converges almost surely to zero asthe partition grows finer. Noting that sup k≤n |M tk − M tk−1 | is bounded by ‖V M (t)‖ ∞ ,dominated convergence yields EMt2 = 0, so M t is almost surely zero. By continuity,M evanescent.Step 2: Bounded variation. Now assume only that V M (t) is bounded. Sincewe have M ∈ cM L 0 , M is locally in cM 2 0. Let τ n be a localising sequence. ThenV M τn (t) ≤ V M (t), so the previous step yields that M τn is evanescent. Letting n tendto infinity, we obtain that M is evanescent.Step 3: The general case. Finally, merely assume that M ∈ cM L 0 with finitevariation. By Lemma 3.4.6, V M is continuous <strong>and</strong> adapted. There<strong>for</strong>e, by Lemma2.3.5, τ n defined by τ n = inf{t ≥ 0|V M (t) ≥ n} is a stopping time. Since V M isbounded on compact intervals, τ n tends to infinity. M τ n∈ cM L 0 <strong>and</strong> V M τn (t) isbounded <strong>for</strong> any t ≥ 0. By what was already shown, we find that M τ nis evanescent.We conclude that M is evanescent.


72 The Itô IntegralLemma 3.4.8. Assume X = A+M, where A has finite variation <strong>and</strong> M ∈ cM L 0 . Thedecomposition of X into a finite variation process <strong>and</strong> a continuous local martingaleis unique up to indistinguishability.Proof. Assume that we have another decomposition X = B + N. Then it holds thatA − B = N − M. There<strong>for</strong>e, the process N − M is a continuous local martingale withpaths of finite variation. By Lemma 3.4.7, N − M is evanescent. There<strong>for</strong>e, A − Bis also evanescent. In other words, A <strong>and</strong> B are indistinguishable, <strong>and</strong> so are M <strong>and</strong>N.We are now finally ready to define the integral with respect to integrators in S. LetX ∈ S with decomposition X = A + M. We put L(X) = L 1 (A) ∩ L 2 (M). We thensimply define , <strong>for</strong> Y ∈ L(X),Y · X = Y · A + Y · M.This is well-defined up to indistinguishability, since if X = B + N is another decomposition,then Y · A <strong>and</strong> Y · B are indistinguishable since A <strong>and</strong> B are indistinguishable,<strong>and</strong> Y · M <strong>and</strong> Y · N are indistinguishable since M <strong>and</strong> N are indistinguishable.Lemma 3.4.9. Let X ∈ S with X = A + M. If Y ∈ L(X), then Y · X ∈ S <strong>and</strong> hascanonical decomposition given by Y · X = Y · A + Y · M.Proof. This follows immediately from the fact that Y · A ∈ cFV by Lemma 3.4.3 <strong>and</strong>the fact that Y · M ∈ cM L 0 by Definition 3.3.8.Lemma 3.4.10. L(X) <strong>and</strong> S are linear spaces. L(X) is L 0 -stable <strong>and</strong> S is L-stable.Proof. By Lemma 3.4.1 <strong>and</strong> Lemma 3.3.11, cFV <strong>and</strong> cM L W are both L-stable linearspaces. There<strong>for</strong>e, S is also a L-stable linear space. Likewise, by Lemma 3.4.4 <strong>and</strong>Lemma 3.3.11, L 1 (A) <strong>and</strong> L 2 (M) are both L 0 -stable linear spaces, <strong>and</strong> there<strong>for</strong>e L(X)is a L 0 -stable linear space.Lemma 3.4.11. The stochastic integral I X : L(X) → S is linear. If τ is a stoppingtime, (Y · X) τ = Y [0, τ] · X = Y · X τ .Proof. The conclusion is well-defined by Lemma 3.4.10. Since we have definedY · X = Y · A + Y · M,


3.4 Integration with respect to st<strong>and</strong>ard processes 73the linearity follows from the linearity of the integration with respect to A <strong>and</strong> M.Likewise, the stopping properties follow from the stopping properties with respect toA <strong>and</strong> M, shown in Lemma 3.4.4 <strong>and</strong> Lemma 3.3.11.Finally, we show that the integral defined <strong>for</strong> integrators X ∈ S <strong>and</strong> integrals in L(X)cannot be extended further by localisation, in other words, we show that we do notobtain larger classes of integrators <strong>and</strong> integr<strong>and</strong>s by localising once more.Lemma 3.4.12. Let X ∈ S with decomposition X = A + M. Then, <strong>for</strong> the spacesof integrators it holds that L(cM L W ) = cML W , L(cFV) = cFV <strong>and</strong> L(S) = S. Also,<strong>for</strong> the spaces of integr<strong>and</strong>s, we have L 0 (L(X)) = L(X), L 0 (L 2 (M)) = L 2 (M) <strong>and</strong>L 0 (L 1 (A)) = L 1 (A).Proof. This follows from Lemma 2.5.13.This concludes our construction of the stochastic integral. We have defined the stochasticintegral <strong>for</strong> any X ∈ S <strong>and</strong> Y ∈ L(X) by putting Y · X = Y · A + Y · M, whereX = A + M is the canonical decomposition. We have shown that the integral is linear<strong>and</strong> interacts properly with stopping times. Further properties of the integral follow byconsidering the components Y ·A <strong>and</strong> Y ·M <strong>and</strong> using respectively ordinary integrationtheory <strong>and</strong> the results from Section 3.3.Be<strong>for</strong>e proceeding to the next section, we prove a few useful properties of the stochasticintegral.Lemma 3.4.13. Let X ∈ S. If Y if progressively measurable <strong>and</strong> locally bounded,then Y ∈ L(X).Proof. Let X = A + M be the canonical decomposition, <strong>and</strong> let τ n be a localisingsequence such that M τ n∈ cM 2 W . Let σ n be a localising sequence <strong>for</strong> Y . By ordinaryintegration theory, Y [0, σ n ] ∈ L 1 (A). Since µ M τ k is bounded, Y [0, σ n ] ∈ L 2 (M τ k).There<strong>for</strong>e, Y [0, σ n ][0, τ k ] ∈ L 2 (M), so Y [0, σ n ] ∈ L 2 (M). In other words, Y is locallyin L 2 (M), so by Lemma 3.4.12, Y ∈ L 2 (M). We conclude Y ∈ L(X).Next, we prove a result that shows that in a weak sense, the integral with respect to ast<strong>and</strong>ard process is consistent with the conventional concept of integration as a limitof Riemann sums. First, we need a lemma.


74 The Itô IntegralLemma 3.4.14. Let X n be a sequence of processes <strong>and</strong> let τ k be a sequence of stoppingtimes tending to infinity. Let X be another process, <strong>and</strong> let t ≥ 0 be given. Assumethat X n <strong>and</strong> X are constant from t <strong>and</strong> onwards. If <strong>for</strong> any k ≥ 1, X n (τ k ) convergesto X(τ k ) as n → ∞, either almost surely or in L 2 , then X n (t) converges to X(t) inprobability.Proof. Whether the convergence of X n (τ k ) is almost sure or in L 2 , we have convergencein probability. To show convergence in probability of X n (t) to X(t), let ε > 0 <strong>and</strong>η > 0 be given. Choose N so large that P (τ k ≤ t) ≤ η <strong>for</strong> k ≥ N. For such k, we findThere<strong>for</strong>e,P (|X n (t) − X(t)| > ε) ≤ P (|X n (t) − X(t)| > ε, τ k > t) + P (τ k ≤ t)≤ P (|X n (τ k ) − X(τ k )| > ε) + η.lim sup P (|X n (t) − X(t)| > ε) ≤ η,n<strong>and</strong> since η was arbitrary, this shows the result.The usefulness of Lemma 3.4.14 may not be clear at present, but we will use it severaltimes in the following, including the very next result to be proven. To set the scene,let 0 ≤ s ≤ t, we say that π = (t 0 , . . . , t n ) is a partition of [s, t] if s = t 0 < · · · < t n = t.We define the norm of the norm ‖π‖ of the partition by ‖π‖ = max k≤n |t k − t k−1 |.Let (X π ) be a sequence of variables indexed by the set of partitions of [0, t], <strong>and</strong> letX be another variable. We say that X π converges to X in d as the norm of thepartitions tends to zero, or as the mesh tends to zero, if it holds that <strong>for</strong> any sequenceof partitions π n with norm tending to zero that X π ntends to X in d. Here, d representssome convergence concept, usually L 2 convergence or convergence in probability. Ingeneral, we will attept to <strong>for</strong>mulate our results in terms of convergence as the meshtends to zero instead of using particular sequences of partitions, as the notation <strong>for</strong> suchpartitions easily becomes cumbersome. An alternative solution would be to considernets instead of sequences. See Appendix C.1 <strong>for</strong> more on this approach.Lemma 3.4.15. Let X ∈ S <strong>and</strong> let Y be a bounded, adapted <strong>and</strong> continuous process.Then Y ∈ L(X), <strong>and</strong> the Riemann sumsn∑Y tk−1 (X tk − X tk−1 )k=1converge in probability to ∫ t0 Y s dX s as the mesh of the partition tends to zero.


3.4 Integration with respect to st<strong>and</strong>ard processes 75Proof. Let X = A + M be the canonical decomposition. Y ∈ L 1 (A) by Lemma 3.4.13.We need to show that <strong>for</strong> any sequence of partitions with norm tending to zero, theRiemann sums converge to the stochastic integral. Clearly, <strong>for</strong> any partition,n∑Y tk−1 (X tk − X tk−1 ) =k=1n∑Y tk−1 (A tk − A tk−1 ) +k=1n∑Y tk−1 (M tk − M tk−1 ).We will show that the first term converges to ∫ t0 Y s dA s <strong>and</strong> that the second termconverges to ∫ t0 Y s dM s . By Lemma A.1.5, the first term converges almost surelyto ∫ t0 Y s dA s , there<strong>for</strong>e also in probability, when the norm of the partitions tendsto zero. Next, consider the martingale part. First assume that M ∈ cM 2 W . Letπ n = (t n 0 , . . . , t n m n) be given such that ‖π n ‖ tends to zero <strong>and</strong> put∑m nYs n = Y t nk−11 (t nk−1 ,t k ](s).k=1Then Y n ∈ bE, <strong>and</strong> by continuity, Y n converges pointwise to Y [0, t]. Since Y isbounded <strong>and</strong> µ M is bounded, dominated convergence yields that lim ‖Y n − Y ‖ M = 0.By the Itô isometry, we conclude∑m nY t nk−1(M t nk− M t nk−1) =k=1∫ t0Y ns dM sL 2−→∫ t0k=1Y s [0, t] dM s =∫ t0Y s dM s .This shows the claim in the case where M ∈ cM 2 W . Now, if M ∈ cML W , by Lemma3.3.19 there exists a localising sequence τ n such that M τn ∈ cM 2 W . We then haven∑k=1Y tk−1 (M τ nt k− M τ nt k−1) −→L2∫ t0Y s dM τ nsas the mesh tends to zero. Lemma 3.4.14 then allows us to concluden∑∫ tY tk−1 (M tk − M tk−1 ) −→PY s dM s ,as desired.k=10Comment 3.4.16 Let us take a moment to see how exactly the somewhat abstractLemma 3.4.14 was used in the above proof, as we shall often use it again in the samemanner without further details. Our situation is that we know that M ∈ cM L W withM τ n∈ cM 2 W , <strong>and</strong> n ∑k=1Y tk−1 (M τnt k− Mt τnk−1) −→L2∫ t0Y s dM τ ns


76 The Itô Integralas the mesh tends to zero. Let π n = (t n 0 , . . . , t n m n) be some sequence of partitions withnorm tending to zero. We want to prove∑m nY t nk−1(M t nk− M t nk−1) −→Pk=1∫ t0Y s dM s .Define X n (u) = ∑ m nk=1 Y t n k−1 (M u t n k −Mu t n k−1 ) <strong>and</strong> X(u) = ∫ t0 Y s dM u s . Since all partitionsare over [0, t], X n <strong>and</strong> X are constant from t onwards. And by what we already haveshown, X n (τ k ) converges in L 2 to X(τ k ) <strong>for</strong> any k ≥ 1. Lemma 3.4.14 may now beinvoked to conclude∑m nY t nk−1(M t nk− M t nk−1) = X n (t) −→ PX(t) =k=1∫ t0Y s dM s ,as desired. Since the sequence of partitions was arbitrary, we conclude that we haveconvergence as the mesh tends to zero.Obviously, then, the use of Lemma 3.4.14 in the proof of Lemma 3.4.15 included agood deal of implicit calculations, simple as they may be. It is obvious that if we hadto go through all the details of taking out subsequences, applying Lemma 3.4.14, <strong>and</strong>going back to general partitions, our proofs would become very cluttered. We willthere<strong>for</strong>e not go through all these diversions when using the lemma. All of the implicitdetails in such cases, however, are completely analogous to the ones that we have gonethrough above.◦Now that we have the definition of the integral in place, we begin the task of developingthe basic results of the stochastic integral. We will prove Itô’s <strong>for</strong>mula, the martingalerepresentation theorem <strong>and</strong> Girsanov’s theorem. In order to do all this, we need todevelop one of the basic tools of stochastic calculus, the quadratic variation process.This is the topic of the next section.3.5 The Quadratic Variationif F <strong>and</strong> G are two mappings from [0, ∞) to R, we say that F <strong>and</strong> G has quadraticcovariation Q : [0, ∞) → R iflimn∑(F (t k ) − F (t k−1 ))(G(t k ) − G(t k−1 )) = Q(t)k=1


3.5 The Quadratic Variation 77<strong>for</strong> all t ≥ 0 as the mesh of the partitions tend to zero. The quadratic covariation ofF with itself is called the quadratic variation of F . We will now investigate a conceptmuch like this, but instead of real mappings, we will use st<strong>and</strong>ard processes. Thus,the basic point of this subsection is to underst<strong>and</strong> the nature of sums of the <strong>for</strong>mn∑(X tk − X tk−1 )(X t ′ k− X t ′ k−1)k=1where X <strong>and</strong> X ′ are st<strong>and</strong>ard processes, when the mesh of the partition tends to zero.Apart from being precisely the kind of variables that are necessary to consider in theproof of Itô’s <strong>for</strong>mula, it also turns out that the resulting quadratic covariation processis a tool of general applicability in the theory of stochastic calculus. We will show thatif X, X ′ ∈ S with decompositions X = A + M <strong>and</strong> X ′ = A ′ + M ′ , where M = Y · W<strong>and</strong> M ′ = Y ′ · W , Y, Y ′ ∈ L 2 (W ) n , thenn∑(X tk − X tk−1 )(X t ′ k− X t ′ k) −→Pk=1n∑k=1∫ t0Y ks (Y ′s) k dsas the mesh becomes finer <strong>and</strong> finer. For this reason, we will already from now oncall ∑ n∫ tk=1 0 Y s k (Y s) ′ k ds the quadratic covariation of X <strong>and</strong> X ′ , <strong>and</strong> we will use thenotation [X, X ′ ] t = ∑ n∫ tk=1 0 Y s k (Y s) ′ k ds. Note that the quadratic covariation doesnot depend on the finite variation components of X <strong>and</strong> X ′ at all. In particular,[X, X ′ ] = [M, M ′ ]. Also note that we always have [X, X ′ ] ∈ cFV 0 .We write [X] = [X, X] <strong>and</strong> call [X] the quadratic variation of X. Our strategy <strong>for</strong>developing the results of this section is first to develop all the necessary results <strong>for</strong>the quadratic variation process <strong>and</strong> then use a concept from Hilbert space theory,polarization, to obtain the corresponding results <strong>for</strong> the quadratic covariation.Thus, at first, we will <strong>for</strong>get all about the quadratic covariation <strong>and</strong> consider only thequadratic variation. We begin by demonstrating some fundamental properties of thequadratic variation process. Our first lemma shows how to calculate the quadraticvariation of a stochastic integral with respect to a martingale integrator. Note thatwe are here using that the quadratic variation is a st<strong>and</strong>ard process <strong>and</strong> can there<strong>for</strong>eby used as an integrator.Lemma 3.5.1. Let M ∈ cM L W . Then X ∈ L2 (M) if <strong>and</strong> only if X ∈ L 2 ([M]).Likewise, X ∈ L 2 (M) if <strong>and</strong> only if X ∈ L 2 ([M]). In the affirmative case, the equality[X · M] = X 2 · [M] holds.


78 The Itô IntegralProof. Let M = Y · W , where Y ∈ L 2 (W ) n . We then haven∑Ek=1∫ ∞0X 2 s (Y ks ) 2 ds = E= E∫ ∞ n∑Xs20k=1∫ ∞0X 2 s d[M] s .(Ys k ) 2 dsThe first expression is finite if <strong>and</strong> only if X ∈ L 2 (M), <strong>and</strong> the final expression is finiteif <strong>and</strong> only if X ∈ L 2 ([M]). There<strong>for</strong>e, X ∈ L 2 (M) if <strong>and</strong> only if X ∈ L 2 ([M]). Fromlocalisation, it follows that X ∈ L 2 (M) if <strong>and</strong> only if X ∈ L 2 ([M]).In order to show the integral equality, assume that X ∈ L 2 (M). X · M = XY · W ,<strong>and</strong> there<strong>for</strong>e the definition of the quadratic variation yieldsThis shows the lemma.[X · M] t ===n∑∫ tk=10∫ t n∑Xs20k=1∫ t0(X s Y ks ) 2 dsX 2 s d[M] s= (X 2 · [M]) t .(Ys k ) 2 dsOur next two lemmas gives a martingale characterization of the quadratic variation<strong>and</strong> shows that the quadratic variation interacts properly with stopping.Lemma 3.5.2. Let X be a st<strong>and</strong>ard process, <strong>and</strong> let τ be a stopping time.[X] τ = [X τ ].ThenProof. Let X = A + M with M = Y · W , Y ∈ L 2 (W ) n . X τ then has decompositionX τ = B + N, where B = A τ <strong>and</strong> N = Y [0, τ] · W , <strong>and</strong>as desired.[X] τ t =n∑k=1∫ t∧τ0(Y ks ) 2 ds =n∑k=1∫ t0(Y k [0, τ] s ) 2 ds = [X τ ] t ,Lemma 3.5.3. If M ∈ cM L W , [M] is the unique process in cFV 0, up to indistinguishability,that makes M 2 t − [M] t a local martingale. Also, if M ∈ cM 2 W , M 2 t − [M] tis a uni<strong>for</strong>mly integrable martingale.


3.5 The Quadratic Variation 79Proof. We first prove that M 2 t − [M] t is a uni<strong>for</strong>mly integrable martingale when wehave M ∈ cM 2 W . Assume M = Y · W , Y ∈ L2 (W ) n . First note that M has an almostsure limit. Letting τ be any stopping time, we obtainE ( M 2 τ − [M] τ)= E( n∑k=1∫ ∞0Y [0, τ] s dW s) 2−n∑Ek=1∫ ∞0(Y k [0, τ]) 2 s ds = 0by Itô’s isometry, Lemma 3.3.6. Lemma 2.4.13 now yields that Mt 2 −[M] t is a uni<strong>for</strong>mlyintegrable martingale. In the case M ∈ cM L W , letting τ n be a localising sequence, byLemma 3.5.2, [M] τ n= [M τ n] <strong>and</strong> there<strong>for</strong>e (M 2 − [M]) τ nt = (M τ n) 2 t − [M τ n] t is amartingale, showing that Mt 2 − [M] t is a local martingale.It remains to show uniqueness. Let A <strong>and</strong> B be two processes in cFV 0 such thatN A t = M 2 t −A t <strong>and</strong> N B t = M 2 t −B t both are local martingales. Then N A −N B = B−A,so B − A is a continuous local martingale of finite variation. By Lemma 3.4.7, A <strong>and</strong>B are indistinguishable.As we will experience in the following, Lemma 3.5.3 is a very important result in thesense that the fact that Mt2 − [M] t is a uni<strong>for</strong>mly integrable martingale wheneverM ∈ cM 2 W is the only fact we will need to identify [M] t as the quadratic variation.In particular, we can in the following completely <strong>for</strong>get about the specific <strong>for</strong>m of Mas an integral with respect to W . Our next goal will be to show that [X] really canbe interpreted as a quadratic variation, we will show the resultn∑(X tk − X tk−1 ) 2 P−→ [X]t .k=1For any partition π <strong>and</strong> stochastic processes X <strong>and</strong> X ′ we define the quadratic covariationof X <strong>and</strong> X ′ over π by Q π (X, X ′ ) = ∑ nk=1 (X t k−X tk−1 )(X ′ t k−X ′ t k−1). We writeQ π (X) = Q π (X, X). The next two lemmas are technical results that pave the road<strong>for</strong> the result on the convergence of the squared increments Q π (X) to the quadraticvariation.Lemma 3.5.4. If M ∈ cM 2 W<strong>and</strong> Y is a bounded <strong>and</strong> adapted process, it holds thatE= E(∑ nY tk−1 ((M tk − M tk−1 ) 2 − ([M] tk − [M] tk−1 ))k=1) 2n∑ (Ytk−1 ((M tk − M tk−1 ) 2 − ([M] tk − [M] tk−1 )) 2) 2,k=1in other words, the cross-terms equal zero.


80 The Itô IntegralProof. Consider i < j with i ≤ n, j ≤ n. Let X i = (M ti − M ti−1 ) 2 − ([M] ti − [M] ti−1 ).Then X i ∈ F ti ⊆ F tj−1 , so we obtainEY ti−1 X i Y tj−1 X j = E(Y ti−1 X i Y tj−1 E(X j |F tj−1 ).Now noteE(X j |F tj−1 ) = E((M tj − M tj−1 ) 2 − ([M] tj − [M] tj−1 )|F tj−1 )= E(Mt 2 j− [M] tj + Mt 2 j−1+ [M] tj−1 − 2M tj M tj−1 |F tj−1 )= E(Mt 2 j− [M] tj − (Mt 2 j−1− [M] tj−1 )|F tj−1 )= 0,by Lemma 3.5.3. There<strong>for</strong>e, we findas desired.( n) 2∑E Y tk−1 ((M tk − M tk−1 ) 2 − ([M] tk − [M] tk−1 )) 2k=1( n) 2∑= E Y tk−1 X k== E= En∑k=1k=1 i=1n∑EY tk−1 X k Y ti−1 X in∑k=1Y 2t k−1X 2 kn∑ (Ytk−1 ((M tk − M tk−1 ) 2 − ([M] tk − [M] tk−1 )) 2) 2,k=1Lemma 3.5.5. Let M be a continous bounded martingale, zero at zero. Thenn∑E (M tk − M tk−1 ) 4k=1tends to zero as the norm of the partition tends to zero.Proof. We first employ the Cauchy-Schwartz inequality to obtainn∑k=1∑ nE(M tk − M tk−1 ) 4 ≤ E sup(M tk − M tk−1 ) 2 (M tk − M tk−1 ) 2k≤nk=1() 1≤ E sup(M tk − M tk−1 ) 4 2 (EQ π (M) 2) 1 2.k≤n


3.5 The Quadratic Variation 81The first factor converges to zero by dominated convergence as ‖π‖ tends to zero, sinceM is continuous <strong>and</strong> bounded. It is there<strong>for</strong>e sufficient <strong>for</strong> our needs to show that thesecond factor is bounded. We rewrite asEQ π (M) 2( n) 2∑= E (M tk − M tk−1 ) 2=n∑k=1k=1n−1∑E(M tk − M tk−1 ) 4 + 2n∑i=1 j=i+1The last term can be computed using Lemma 2.4.12 twice,n−1∑2n∑i=1 j=i+1n−1∑= 2 Ei=1⎛E(M ti − M ti−1 ) 2 (M tj − M tj−1 ) 2⎛⎝(M ti − M ti−1 ) 2 E ⎝E(M ti − M ti−1 ) 2 (M tj − M tj−1 ) 2 .∣ ⎞⎞n∑∣∣∣∣∣(M tj − M tj−1 ) 2 F ti⎠⎠j=i+1n−1∑= 2 E ( (M ti − M ti−1 ) 2 E((M t − M ti ) 2 |F ti ) )≤i=1n−1∑8‖M ∗ ‖ 2 ∞ E(M ti − M ti−1 ) 2i=1≤ 8‖M ∗ ‖ 2 ∞EM 2 t .We then find, again using Lemma 2.4.12,EQ π (M) 2≤≤≤n∑E(M tk − M tk−1 ) 4 + 8‖M ∗ ‖ 2 ∞EMt2k=14‖M ∗ ‖ 2 ∞n∑E(M tk − M tk−1 ) 2 + 8‖M ∗ ‖ 2 ∞EMt2k=14‖M ∗ ‖ 2 ∞EM 2 t + 8‖M ∗ ‖ 2 ∞EM 2 t= 12‖M ∗ ‖ 2 ∞EM 2 t ,showing the desired boundedness.Finally, we are ready to show that the quadratic variation can actually be interpretedas a quadratic variation. We begin with the martingale case.Theorem 3.5.6. Let M ∈ cM 2 W . Assume that M <strong>and</strong> [M] are bounded. With π apartition of [0, t], Q π (M) tends to [M] t in L 2 as ‖π‖ tends to zero.


82 The Itô IntegralProof. By Lemma 3.5.4, using (x − y) 2 ≤ 2x 2 + 2y 2 ,(∑ nE (Q π (M) − [M] t ) 2 = E (M tk − M tk−1 ) 2 − ([M] tk − [M] tk−1 )=k=1n∑E((M tk − M tk−1 ) 2 − ([M] tk − [M] tk−1 )) 2k=1n∑n∑≤ 2 E(M tk − M tk−1 ) 4 + 2 E([M] tk − [M] tk−1 ) 2 .k=1To show the result of the theorem, it there<strong>for</strong>e suffices to show that each of the twosums above converges to zero as ‖π‖ tends to zero. Now, the first sum converges tozero by 3.5.5. Concerning the second sum, we have, since [M] is increasing,n∑E ( ) 2[M] tk − [M] tk−1 ≤ ∣ ∑E sup ∣ n ∣[M]tk − [M] tk−1 ∣ [M]tk − [M] tk−1 k≤nk=1k=1k=1) 2∣∣ ∑≤ E sup ∣[M] n tk − [M] tk−1 [M] tk − [M] tk−1k≤nk=1∣= E[M] t sup ∣ [M]tk − [M] tk−1 ,k≤nwhich tends to zero by dominated convergence as ‖π‖ tends to zero, since [M] iscontinuous <strong>and</strong> bounded.Corollary 3.5.7. Let M ∈ cM L 0 . Then Q π (M) converges in probability to [M] t as‖π‖ tends to zero.Proof. Let τ n = inf{t ≥ 0|M t > n or [M] t > n}. Then τ n is a stopping time, <strong>and</strong> τ ntends to infinity because M <strong>and</strong> [M] are continuous. Furthermore, M τn <strong>and</strong> [M τn ] arebounded. Thus, the hypotheses of Theorem 3.5.6 are satisfied <strong>and</strong> Q π (M τ n) convergesin L 2 to ∫ t0 Y [0, τ n] 2 s ds as ‖π‖ tends to zero. Since we know Q π (M τ n) = Q π (M) τ n<strong>and</strong> [M] τ n= [M τ n], the result now follows from Lemma 3.4.14.Theorem 3.5.8. Let X ∈ S. With π a partition of [0, t], Q π (X) tends to [X] t inprobability as ‖π‖ tends to zero.


3.5 The Quadratic Variation 83Proof. Letting X = A + M, we first noten∑(X tk − X tk−1 ) 2 =k=1=+ 2+n∑(A tk − A tk−1 + M tk − M tk−1 ) 2k=1n∑(A tk − A tk−1 ) 2k=1n∑(A tk − A tk−1 )(M tk − M tk−1 )k=1n∑(M tk − M tk−1 ) 2 .For the first term, we obtain∣ n∑∣∣∣∣ (A∣tk − A tk−1 ) 2 ≤ max |A t k− A tk−1 |V A (t),k≤nk=1k=1<strong>and</strong> since A is continuous, the above tends to zero almost surely. For the second term,we find ∣ ∣∣∣∣ ∑n (A tk − A tk−1 )(M tk − M tk−1 )∣ ≤ max |M t k− M tk−1 |V A (t),k≤nk=1<strong>and</strong> again conclude that this tends to zero almost surely, by the continuity of M. Andfrom Corollary 3.5.7, we know that the last term tends to [M] t , which is equal to[X] t .Theorem 3.5.8 shows that the process [X] really is a quadratic variation in the sensethat it is a limit of quadratic increments. It remains to show that [X, X ′ ] has theanalogous interpretation of a quadratic covariation. Be<strong>for</strong>e this, we show a simplegeneralisation of Theorem 3.5.8 which will be useful to us in the proof of Itô’s <strong>for</strong>mula.It is the analogue to Lemma 3.4.15, using quadratic Riemann sums instead of ordinaryRiemann sums.Corollary 3.5.9. Let X ∈ S. Let Y be a bounded, adapted <strong>and</strong> continuous process.Then the quadratic Riemann sumsn∑Y tk−1 (X tk − X tk−1 ) 2k=1converge in probability to ∫ t0 Y s d[X] s as the mesh of the partition tends to zero.


84 The Itô IntegralProof. Letting X = A + M, we haven∑n∑Y tk−1 (X tk − X tk−1 ) 2 = Y tk−1 (A tk − A tk−1 + M tk − M tk−1 ) 2k=1=+ 2+k=1n∑Y tk−1 (A tk − A tk−1 ) 2k=1n∑Y tk−1 (A tk − A tk−1 )(M tk − M tk−1 )k=1n∑Y tk−1 (M tk − M tk−1 ) 2 .k=1For the first term, we obtain∣ n∑∣∣∣∣ Y∣tk−1 (A tk − A tk−1 ) 2 ≤ ‖Y ‖ ∞ max |A t k− A tk−1 |V A (t),k≤nk=1<strong>and</strong> since A is continuous, the above tends to zero almost surely. For the second term,we findn∑Y∣tk−1 (A tk − A tk−1 )(M tk − M tk−1 )∣ ≤ ‖Y ‖ ∞ max |M t k− M tk−1 |V A (t),k≤nk=1<strong>and</strong> again conclude that this tends to zero almost surely, by the continuity of M. There<strong>for</strong>e,it will suffice to show that the final term converges in probability to ∫ t0 Y s d[X] s .Recall here that [X] = [M], so we really have to show convergence to ∫ t0 Y s d[M] s .We first consider the case where M <strong>and</strong> [M] are bounded. By Lemma 3.4.15,n∑Y s ([M] tk − [M] tk−1 ) −→Pk=1∫ t0Y s d[M] s ,so it will suffice to show that ∑ nk=1 Y s((M tk − M tk−1 ) 2 − ([M] tk − [M] tk−1 )) tends to


3.5 The Quadratic Variation 85zero in probability. But we have, by Lemma 3.5.4,= E≤(∑ nE Y s ((M tk − M tk−1 ) 2 − ([M] tk − [M] tk−1 ))k=1n∑(Y s ((M tk − M tk−1 ) 2 − ([M] tk − [M] tk−1 ))) 2k=1‖Y ∗ ‖ 2 ∞E) 2n∑((M tk − M tk−1 ) 2 − ([M] tk − [M] tk−1 )) 2k=1(∑ n= ‖Y ∗ ‖ 2 ∞E ((M tk − M tk−1 ) 2 − ([M] tk − [M] tk−1 ))k=1= ‖Y ∗ ‖ 2 ∞E (Q π (M) − [M] t ) 2 ,which converges to zero by Theorem 3.5.6. Now consider the general case whereM ∈ cM L W . Let τ n be a localising sequence such that M τn <strong>and</strong> [M] τn are bounded, thiscan be done since [M] <strong>and</strong> M both are continuous. By Lemma 3.5.2, [M] τn = [M τn ],<strong>and</strong> we then obtainn∑k=1Y tk−1 (M τ nt kThe result then follows from Lemma 3.4.14.∫ t∧τn− M τ nt k−1) 2 P−→0Y s d[X] s .) 2We now transfer the results on the quadratic variation to the setting of the quadraticcovariation. Recall that we defined [X, X ′ ] t = ∫ t0 Y sY s ′ ds. As mentioned earlier, themethod to do so is called polarization. The name originates from the polarizationidentity <strong>for</strong> inner product spaces,〈x, y〉 = 1 4(‖x + y‖ 2 − ‖x − y‖ 2) ,see Lemma 11.2 of Meise & Vogt (1997). The polarization identity allows one torecover the inner product from the norm. Thinking of the quadratic variation as thesquare norm <strong>and</strong> of the quadratic covariation as the inner product, we will show thatthe two satisfies a relation analogous to the above, <strong>and</strong> we will use this relation toobtain results on the quadratic covariation from the quadratic variation.Lemma 3.5.10. Let X, X ′ ∈ S. Then[X, X ′ ] = 1 4 ([X + X′ ] − [X − X ′ ]) .


86 The Itô IntegralProof. Assume that X = A+M <strong>and</strong> X ′ = A ′ +M ′ , where M = Y ·W <strong>and</strong> M ′ = Y ′·W .There<strong>for</strong>e,14 ([X + X′ ] t − [X − X ′ ] t ) = 1 (∫ t∫ t)(Y s + Y s ′ ) 2 ds − (Y s − Y s ′ ) 2 ds40∫ t= 1 4 0= [X, X ′ ].4Y s Y ′s ds0Lemma 3.5.11. Let M, N ∈ cM L W . Then [M, N] is the unique process in cFV 0 suchthat MN − [M, N] is a local martingale.Proof. That MN − [M, N] is a local martingale follows from Lemma 3.5.3, sinceMN − [M, N] = 1 4 ((M + N)2 − (M − N) 2 ) − 1 ([M + N] − [M − N])4= 1 4 ((M + N)2 − [M + N]) − 1 4 ((M − N)2 − [M − N]).Uniqueness follows as in Lemma 3.5.3 from Lemma 3.4.7.Theorem 3.5.12. Let X, X ′ ∈ S <strong>and</strong> let Y be a bounded, adapted <strong>and</strong> continuousprocess. The quadratic Riemann sumsn∑Y tk−1 (X tk − X tk−1 )(X t ′ k− X t ′ k−1)k=1tend to ∫ t0 Y s d[X, X ′ ] s as the mesh of the partitions tend to zero.Proof. We find thatn∑Y tk−1 (X tk − X tk−1 )(X t ′ k− X t ′ k−1)k=1= 1 4− 1 4n∑Y tk−1 (X tk + X t ′ k− (X tk−1 + X t ′ k−1)) 2k=1n∑Y tk−1 (X tk − X t ′ k− (X tk−1 − X t ′ k−1)) 2 .k=1∫By Corollary 3.5.9, the first term converges in probability to 1 t4 0 Y s d[X + X ′ ] s <strong>and</strong>∫the second term converges in probability to 1 t4 0 Y s d[X − X ′ ] s , as the mesh of the


3.5 The Quadratic Variation 87partitions tend to zero. There<strong>for</strong>e, ∑ nk=1 Y t k−1(X tk − X tk−1 )(X t ′ k− X t ′ k−1) convergesin probability to14∫ t0Y s d[X + X ′ ] s + 1 4by Lemma 3.5.10.∫ t0Y s d[X − X ′ ] s ==∫ t0∫ t0( 1Y s d4 [X + X′ ] s + 1 )4 [X − X′ ] sY s d[X, X ′ ] sThe main results on the quadratic covariation of this section are Lemma 3.5.11 <strong>and</strong>Theorem 3.5.12. Putting Y = 1 in Theorem 3.5.12 shows that our definition of thequadratic covariation of two st<strong>and</strong>ard processes really can be interpreted as a quadraticcovariation, although the convergence is only in probability.Be<strong>for</strong>e concluding, we note an important consequence of the definition of the quadraticvariation. Since the quadratic variation is a continuous <strong>and</strong> adapted process of finitevariation, it is a st<strong>and</strong>ard process, <strong>and</strong> we can there<strong>for</strong>e integrate with respect to it.The following lemma exploits this feature.Lemma 3.5.13. Let X, X ′ ∈ S <strong>and</strong> assume that Y ∈ L 1 (X) <strong>and</strong> Y ′ ∈ L 1 (X ′ ). ThenY Y ′ ∈ L 1 ([X, X ′ ]) <strong>and</strong>[Y · X, Y ′ · X ′ ] = Y Y ′ · [X, X ′ ].Proof. Let the canonical decompositions be X = A + M <strong>and</strong> X ′ = A ′ + M ′ . Then[Y · X, Y ′ · X ′ ] = [Y · M, Y ′ · M ′ ] <strong>and</strong> [X, X ′ ] = Y Y ′ · [M, M ′ ], so it will suffice to showthe result in the case where the finite variation components are zero. The integrabilityresult follows from the definition of the quadratic covariation <strong>and</strong> results from measuretheory. For the second statement of the lemma, we calculate[Y · M, Y ′ · M ′ ] t =n∑k=1∫ t0Y s Z k s Y ′s(Z ′ s) k ds =∫ t0Y s Y ′s d[M, M ′ ] s = (Y Y ′ · [M, M ′ ]) s .Lemma 3.5.11 provides a characterisation of the quadratic covariation only in termsof martingales. This result shows that quadratic covariation is essentially a purelymartingale concept, even though we here have developed it in the context of stochasticintegration. This observation is the key to extending the stochastic integral to using


88 The Itô Integralgeneral continuous local martingales as integrators. The price of this generality is thatin a purely martingale setting, we do not have any obvious c<strong>and</strong>idate <strong>for</strong> the quadraticcovariation. In our setting, the existence of the quadratic variation was a trivialityonce the correct idea had been identified. In general, the proof of the existence ofthe quadratic covariation is quite difficult. In Rogers & Williams (2000b), Theorem30.1, a direct construction of the quadratic variation of a continuous square-integrablemartingale is carried out. Alternatively, one can obtain the existence in a more abstractmanner by invoking the Doob-Meyer decomposition theorem, as is done in Karatzas& Shreve (1988). Another alternative is presented in Appendix C.1, where we give ageneral existence result on the quadratic variation in the continuous case based onlyon relatively elementary martingale theory.3.6 Itô’s FormulaIn this section, we will prove the multidimensional Itô <strong>for</strong>mula. We say that X is ann-dimensional st<strong>and</strong>ard process if each of X’s coordinates is a st<strong>and</strong>ard process. TheItô <strong>for</strong>mula states that if X is a n-dimensional st<strong>and</strong>ard process <strong>and</strong> f ∈ C 2 (R n ), thenf(X t ) = f(X 0 ) +n∑i=1∫ t0∂f∂x i(X s ) dX i s + 1 2n∑n∑i=1 j=1∫ t0∂ 2 f∂x i ∂x j(X s ) d[X i , X j ] s .This important result in particular shows that a C 2 trans<strong>for</strong>mation of a st<strong>and</strong>ard processis again a st<strong>and</strong>ard process, <strong>and</strong> allows us to identify the canonical decomposition.To prove this result, we will first localise to a particularly nice setting, use a Taylorexpansion of f <strong>and</strong> take limits. Be<strong>for</strong>e starting the proof, we prepare ourselves withthe right version of the Taylor expansion theorem <strong>and</strong> a result that will help us whenlocalising in the proof.Theorem 3.6.1. Let f ∈ C 2 (R n ), <strong>and</strong> let x, y ∈ R n . There exists a point ξ ∈ R n onthe line segment between x <strong>and</strong> y such thatf(y) = f(x) +n∑i=1∂f∂x i(x)(y i − x i ) + 1 2n∑n∑i=1 j=1∂ 2 f∂x i ∂x j(ξ)(y i − x i )(y j − x j ).Proof. Define g : R → R by g(t) = f(x + t(y − x)). Note that g(1) = f(y) <strong>and</strong>g(0) = f(x). We will prove the theorem by applying the one-dimensional Taylor<strong>for</strong>mula, see Apostol (1964) Theorem 7.6, to g. Clearly, g ∈ C 2 (R), <strong>and</strong> we obtain


3.6 Itô’s Formula 89g(1) = g(0) + g ′ (0) + 1 2 g′′ (s), where 0 ≤ s ≤ 1. Applying the chain rule, we find<strong>and</strong>g ′′ (t) =g ′ (t) =n∑n∑i=1 j=1n∑i=1∂f∂x i(x + t(y − x))(y i − x i )∂ 2 f∂x i ∂x j(x + t(y − x))(y i − x i )(y j − x j ).Substituting <strong>and</strong> writing ξ = x + s(y − x), we may concludef(y) = f(x) +n∑i=1∂f∂x i(x)(y i − x i ) + 1 2n∑n∑i=1 j=1∂ 2 f∂x i ∂x j(ξ)(y i − x i )(y j − x j ).Lemma 3.6.2. Let f ∈ C 2 (R n ). For any compact set K, There exists g ∈ C 2 c (R n )such that f <strong>and</strong> g are equal on K.Proof. By Urysohn’s Lemma, Theorem A.3.4, there exists ψ ∈ C ∞ c (R n ) with K ≺ ψ.Defining g(x) = ψ(x)f(x), g ∈ C 2 c (R n ) <strong>and</strong> f <strong>and</strong> g are clearly equal on K.Theorem 3.6.3. Let X be a n-dimensional st<strong>and</strong>ard process, <strong>and</strong> let f ∈ C 2 (R n ).Thenf(X t ) = f(X 0 ) +n∑i=1up to indistinguishability.∫ t0∂f∂x i(X s ) dX i s + 1 2n∑n∑i=1 j=1∫ t0∂ 2 f∂x i ∂x j(X s ) d[X i , X j ] s ,Proof. First note that all of the integr<strong>and</strong>s are continous, there<strong>for</strong>e locally bounded<strong>and</strong> there<strong>for</strong>e, by Lemma 3.4.13 the stochastic integrals are well-defined.Our plan is first to reduce by localization to a particularly nice setting, apply theTaylor <strong>for</strong>mula to obtain an expression <strong>for</strong> f(X t ) − f(X 0 ) in terms of Riemann-likesums <strong>and</strong> then use Lemma 3.4.15 <strong>and</strong> Theorem 3.5.12 on convergence of Riemann <strong>and</strong>quadratic Riemann sums.Step 1: The localized case. We first consider the case where X is bounded. Thevariables f(X t ), ∂f∂x i(X t ) <strong>and</strong>∂2 f∂x i ∂x j(X t ) does not depend on the values of f outside therange of X. There<strong>for</strong>e, we can by Lemma 3.6.2 assume that f has compact support. In


90 The Itô Integral∂fthat case, all of f,∂x i<strong>and</strong> ∂2 f∂x i∂x jare bounded. Consider a partition π = (t 0 , . . . , t n )of [0, t]. The Taylor Formula 3.6.1 with Lagrange’s remainder term then yields==+ 1 2f(X t ) − f(X 0 )n∑f(X tk ) − f(x tk−1 )k=1n∑n∑i=1 k=1n∑∂f∂x i(X tk−1 )(X i t k− X i t k−1)n∑n∑i=1 j=1 k=1∂x i∂ 2 f∂x i ∂x j(ξ k )(X i t k− X i t k−1)(X j t k− X j t k−1).We will investigate the limit of the sums as the mesh of the partition tends to zero.By Lemma 3.4.15, since ∂f is bounded, ∑ n ∂fk=1 ∂x i(X tk−1 )(Xt i k− Xt i k−1) converges inprobability to ∫ t0n∑i=1 k=1∂f∂x i(X s ) dX i s as the mesh of the partitions tend to zero. There<strong>for</strong>e,n∑∂f∂x i(X tk−1 )(X i t k− X i t k−1)P−→n∑i=1∫ t0∂f∂x i(X s ) dX i s.It remains to consider the second-order terms. Fix i, j ≤ n. We first note that thedifference between the sums given by ∑ n ∂ 2 fk=1 ∂x i ∂x j(ξ k )(Xt i k− Xt i k−1)(X j t k− X j t k−1) <strong>and</strong>∑ n ∂ 2 fk=1 ∂x i ∂x j(X tk−1 )(Xt i k− Xt i k−1)(X j t k− X j t k−1) is bounded bymaxk≤n∂ 2 f∣ (ξ k ) −∂2 fn∑(X tk−1 )∂x i ∂x j ∂x i ∂x j∣ (Xt i k− Xt i k−1)(X j t k− X j t k−1).By continuity of X <strong>and</strong> f ′′ , the maximum tends to zero almost surely. By Theorem3.5.12, ∑ nk=1 (X t k− X tk−1 ) 2 tends to [X] t in probability. There<strong>for</strong>e, the above tendsto zero in probability as the mesh tends to zero. It will there<strong>for</strong>e suffice to prove thatn∑k=1k=1∂ 2 f∂x i ∂x j(X tk−1 )(X i t k− X i t k−1)(X j t k− X j t k−1)∫ t−→P0∂ 2 f∂x i ∂x j(X s ) d[X i , X j ] s ,as the mesh tends to zero, but this follows immediately from Theorem 3.5.12. We maynow conclude that <strong>for</strong> each t ≥ 0,f(X t ) = f(X 0 ) +n∑i=1∫ t0∂f∂x i(X s ) dX i s + 1 2n∑n∑i=1 j=1∫ t0∂ 2 f∂x i ∂x j(X s ) d[X i , X j ] s ,almost surely. Since both sides are continuous, we have equality up to indistinguishability.


3.6 Itô’s Formula 91Step 2: The general case. Since each X i is continuous, there exists a sequenceof stopping times τ n tending to infinity such that X τ nis bounded. By what we havealready shown, we then havef(X t ) τ n= f(X τnt )= f(X 0 ) ++ 1 2n∑n∑i=1 j=1= f(X 0 ) ++ 1 2n∑n∑i=1 j=1n∑∫ ti=1 0∫ t0∫ t∧τnn∑i=1 0∫ t∧τn0∂f(X τ ns ) d(X τ n) i s∂x i∂ 2 f(X τ ns ) d[(X τ n) i , (X τ n) j ] s∂x i ∂x j∂f∂x i(X s ) d(X) i s∂ 2 f∂x i ∂x j(X s ) d[X i , X j ] s .The Itô <strong>for</strong>mula there<strong>for</strong>e holds whenever t ≤ τ n . Since τ n tends to infinity, thetheorem is proven.We have now proven Itô’s <strong>for</strong>mula. This result is more or less the center of the theoryof stochastic integration. We will sometimes need to apply Itô’s <strong>for</strong>mula to mappingswhich are not defined on all of R n . The following corollary shows that this is possible.Corollary 3.6.4. Let X be a n-dimensional st<strong>and</strong>ard process taking its values in anopen set U. Let f ∈ C 2 (U). Thenf(X t ) = f(X 0 ) +n∑i=1up to indistinguishability.∫ t0∂f∂x i(X s ) dX i s + 1 2n∑n∑i=1 j=1∫ t0∂ 2 f∂x i ∂x j(X s ) d[X i , X j ] s ,Proof. Define F n = {x ∈ R n |d(x, U c ) ≥ 1 n}. Our plan is to in some sense localise toF n <strong>and</strong> prove the result there. F n is a closed set with F n ⊆ U. Let g n ∈ C ∞ (R n )be such that F n ≺ g n ≺ Fn+1, c such a mapping exists by Lemma A.3.5. Definef n (x) = g n (x)f(x) when x ∈ U <strong>and</strong> zero otherwise. Clearly, f n <strong>and</strong> f agree on F n .We will argue that f n ∈ C 2 (R n ). To this end, note that since f <strong>and</strong> g n are both C 2 onU, it is clear that f n is C 2 on U. Since f n is zero on Fn+1, c f n is C 2 on Fn+1. c BecauseR n = U c ∪ U ⊆ Fn+1 c ∪ U, this shows f n ∈ C 2 (R n ). Itô’s Lemma then yieldsf n (X t ) = f n (X 0 ) +n∑i=1∫ t0∂f n∂x i(X s ) dX i s + 1 2n∑n∑i=1 j=1∫ t0∂ 2 f n∂x i ∂x j(X s ) d[X i , X j ] s .


92 The Itô IntegralNow define τ n = inf{t ≥ 0|d(X t , U c ) ≤ 1 n }. τ n is a stopping time, <strong>and</strong> since d(X t , U c )is a positive continuous process, τ n tends to infinity. When t ≤ τ n , d(X t , U c ) < 1 n , soX t ∈ F n . Since f n <strong>and</strong> f agree on F n , we concludef(X t ) τn= f n (X 0 ) += f(X 0 ) +n∑∫ t∧τni=1 0∫ t∧τnn∑i=10∂f n∂x i(X s ) dX i s + 1 2∂f∂x i(X s ) dX i s + 1 2Letting τ n tend to infinity, we obtain the result.n∑n∑∫ t∧τni=1 j=1 0∫ t∧τnn∑n∑i=1 j=10∂ 2 f n∂x i ∂x j(X s ) d[X i , X j ] s∂ 2 f∂x i ∂x j(X s ) d[X i , X j ] s .3.7 Itô’s Representation TheoremIn this section <strong>and</strong> the following, we develop some results specifically <strong>for</strong> the contextwhere we have a n-dimensional Brownian motion W on (Ω, F, P ) <strong>and</strong> let the filtrationF t be the one induced by W , augmented in the usual manner as described in Section2.1. We will need to work with processes on [0, T ] <strong>and</strong> integration on [0, T ]. We willthere<strong>for</strong>e make some notation <strong>for</strong> this situation. By L 2 T (W ), we denote the spaceL 2 ([0, T ] × Ω, Σ π [0, T ], λ ⊗ P ), underst<strong>and</strong>ing that Σ π [0, T ] is the restriction of Σ πto [0, T ] × Ω. The norm on L 2 T (W ) is denoted ‖ · ‖ W T . The integral of a processY ∈ L 2 T (W )n is then considered a process on [0, T ], defined as the integral of theprocess Y [0, T ].Our goal of this section is to prove Itô’s representation theorem, which yields a representationof all cadlag F t local martingales. To prove the result, we will need somepreparation. We begin with a few lemmas.Lemma 3.7.1. Assume that X ∈ S with X 0 = 0 <strong>and</strong> let Y 0 be a constant. Thereexists precisely one solution to the equation Y t = Y 0 + ∫ t0 Y s dX s in Y , unique up toindistinguishability, <strong>and</strong> the solution is given by Y t = Y 0 exp(X t − 1 2 [X] t).Proof. We first check that Y as given in the lemma in fact solves the equation. Thecase Y 0 = 0 is trivial, so we assume Y 0 ≠ 0. We note that Y t = f(X t , [X] t ) withf(x, y) = Y 0 exp(x − 1 2 y). Itô’s <strong>for</strong>mula then yields, using that X 0 = 0 <strong>and</strong> there<strong>for</strong>e


3.7 Itô’s Representation Theorem 93exp(X 0 − 1 2 [X] 0) = 1,Y t = Y 0 += Y 0 +∫ t0∫ t0Y s dX s − 1 2Y s dX s ,∫ t0Y s d[X] s + 1 2∫ t0Y s d[X] sas desired. Now consider uniqueness. Let Y be any solution of the equation, <strong>and</strong>define Y ′ by Y ′ = exp(−X t + 1 2 [X] t). Itô’s <strong>for</strong>mula yieldsY ′t = Y 0 −= Y 0 −∫ tY ′0∫ t0s dX s + 1 2Y ′s dX s +∫ t0∫ t0Y ′s d[X] s + 1 2Y ′s d[X] s .There<strong>for</strong>e, again by Itô’s <strong>for</strong>mula, using Lemma 3.5.13,Y t Y ′t = Y 0 += Y 0 −= Y 0 ,∫ t0∫ t0Y s dY ′s +∫ tY s Y ′s dX s +Y ′s dY s + [Y, Y ′ ] t0∫ t0Y s Y ′s d[X] s +∫ t0∫ t0Y ′s d[X] sY ′sY s dX s −up to indistinguishability. Inserting the expression <strong>for</strong> Y ′ , we conclude(Y t = Y 0 exp X t − 1 )2 [X] t ,demonstrating uniqueness.∫ t0Y s Y ′s d[X] sThe equation Y t = Y 0 + ∫ t0 dX s of Lemma 3.7.1 is called a stochastic differentialequation, or a SDE. This equation is one of the few SDEs where an explicit solutionis available. This type of equation will play an important role in the following, <strong>and</strong>we there<strong>for</strong>e introduce some special notation <strong>for</strong> the solution. For any X ∈ S withX 0 = 0, we define the stochastic exponential of X by E(X) t = exp(X t − 1 2 [X] t).The stochastic exponential is also known as the Doléans-Dade exponential. If M isa local martingale with M 0 = 0, we see from E(M) t = 1 + ∫ t0 E(M) t dM t that E(M)also is a local martingale. We will now give a criterion to check when E(M) is asquare-integrable martingale.Lemma 3.7.2. Let M be a nonnegative continuous local martingale. Then M is asupermartingale.


94 The Itô IntegralProof. Let 0 ≤ s ≤ t, <strong>and</strong> let τ n be a localising sequence.conditional expectations <strong>and</strong> continuity of M,By Fatou’s Lemma <strong>for</strong>E(M t |F s ) = E(lim inf M τnt |F s )≥lim inf E(M τnt |F s )= lim inf M τ ns= M s .Lemma 3.7.3. Assume that Y ∈ L 2 (W ) n <strong>and</strong> that there exists f k ∈ L 2 [0, ∞) suchthat |Y kt | ≤ f k (t) <strong>for</strong> k ≤ n. Put M = Y · W . Then E(M) ∈ cM 2 0.Proof. From Lemma 3.7.1, we have E(M) t<strong>for</strong>mula yields= 1 + ∫ t0 E(M) s dM s . There<strong>for</strong>e, Itô’sE(M) 2 t = 1 + 2= 1 + 2= 1 + 2≤ 1 + 2∫ t0∫ t0∫ t0∫ t0E(M) s dE(M) s + [E(M)] tE(M) 2 s dM s +E(M) 2 s dM s +E(M) 2 s dM s +∫ t0n∑k=1∫ t0E(M) 2 s d[M] s∫ t0E(M) 2 sE(M) 2 s(Y ks ) 2 dsn∑f k (s) 2 ds.Define b(s) = ∑ nk=1 f k(s) 2 . Since ∫ t0 E(M)2 s dM s is a continuous local martingale,there exists a localising sequence τ m such that E(M) 2 t∧τ mis bounded over t <strong>and</strong>∫ t∧τmE(M) 2 0 s dM s is a bounded martingale in t. We then obtain∫ t∧τmEE(M) 2 t∧τ m≤ 1 + E E(M) 2 sb(s) ds= 1 + E= 1 + E≤1 + E= 1 +0∫ t0∫ t0∫ t0∫ t0k=1E(M) 2 sb(s)1 [0,τm](s) dsE(M) 2 s∧τ mb(s)1 [0,τm ](s) dsE(M) 2 s∧τ mb(s) dsE(E(M) 2 s∧τ m)b(s) ds,


3.7 Itô’s Representation Theorem 95by the Tonelli theorem. Now consider m fixed. Gronwall’s Lemma A.2.1 then yieldsEE(M) 2 t∧τ mUsing Fatou’s Lemma, we then obtain(∫ t)≤ exp b(s) ds .0(∫ t)EE(M) 2 t = E lim infm E(M)2 t∧τ m≤ lim infm EE(M)2 t∧τ m= exp0b(s) ds .Since E(M) is a nonnegative local martingale, it is a supermartingale by Lemma 3.7.2.There<strong>for</strong>e, E(M) is almost surely convergent by the Martingale Convergence Theorem2.4.3. For the limit we then obtain, again using Fatou’s Lemma,EE(M) 2 ∞ = E lim inft(∫ ∞)E(M) 2 t ≤ lim inf EE(M) 2 t ≤ expt0b(s) ds < ∞.There<strong>for</strong>e, E(M) is bounded in L 2 . This shows the lemma.Our next lemma will require some basic results from the theory of weak convergence.An excellent st<strong>and</strong>ard reference to this is Billingsley (1999).Lemma 3.7.4. Let h 1 , . . . , h n ∈ L 2 [0, T ], <strong>and</strong> let W denote a one-dimensional Brownianmotion. Then the vector ( ∫ T0 h1 s dW s , . . . , ∫ T0 hn s dW s ) follows a n-dimensionalnormal distribution with mean zero <strong>and</strong> variance matrix Σ given by Σ ij = ∫ T0 hi sh j s ds.Proof. First consider the case where h 1 , . . . , h n are continuous. Let π = (t 0 , . . . , t m )be a partition of [0, T ] <strong>and</strong> define X π by putting Xiπ = ∑ mk=1 hi (t k−1 )(W tk − W tk−1 ).X π is then a stochastic variable with values in R n . Since X π is a linear trans<strong>for</strong>mationof normal variables, X π is normally distributed. It is clear that X π has mean zero,<strong>and</strong> the covariance matrix Σ π of X π is( m) (∑m)∑Σ π ij = E h i (t k−1 )(W tk − W tk−1 ) h j (t k−1 )(W tk − W tk−1 )=k=1m∑h i (t k−1 )h j (t k−1 )(t k − t k−1 ) 2 .k=1Now, as the mesh tends to zero, Σ π ij tends to ∫ T0 hi sh j s ds. There<strong>for</strong>e, X π tends to thenormal distribution with mean zero <strong>and</strong> the covariance matrix Σ given in the statementof the lemma.k=1


96 The Itô IntegralOn the other h<strong>and</strong>, <strong>for</strong> any a ∈ R n , we have by Lemma 3.4.15,n∑a i Xiπi=1P−→n∑∫ Ta i h i (s) dW s .i=10as the mesh tends to zero. Since convergence in probability implies weak convergence,by the Cramér-Wold Device, X π tends to ( ∫ T0 h n(s) dW s , . . . , ∫ T0 h n(s) dW s ) weakly.By uniqueness of limits, we conclude that the lemma holds in the case of continuoush 1 , . . . , h n .Now consider the general case, where h 1 , . . . , h n ∈ L 2 [0, T ]. For i ≤ n, there exists asequence (h i k ) of continuous maps converging in L2 [0, T ] to h i . By what we alreadyhave shown, these are normally distribued with zero mean <strong>and</strong> covariance matrixΣ k ij = ∫ T0 hi k (s)hj k(s) ds. Since the integral is continuous, the Cramér-Wold Device<strong>and</strong> uniqueness of limits again yields the conclusion.Lemma 3.7.5. The span of the variables(∑ n ∫ Texp h k (t) dWt k − 1 2k=1where h 1 , . . . , h n ∈ L 2 [0, T ], is dense in L 2 (F T ).0n∑k=1∫ T0h 2 k(t) dt)Proof. Let (G t ) denote the filtration generated by W . From the construction of theusual augmentation in Theorem 2.1.4, we know that F T = σ(G T + , N ), where N denotesthe null sets of F. By Lemma 2.1.9, G T + ⊆ σ(G T , N ). We there<strong>for</strong>e concludeF T ⊆ σ(G T , N ), so it will suffice to show that the span of the variables given in thestatement of the lemma is dense in L 2 (σ(G T , N )). By Lemma B.6.9, it will then sufficeto show that the variables are dense in L 2 (G T ).To this end, first note that ∫ T0 1 [0,s](t) dWt k = Ws k , so G T is generated by the variablesof the <strong>for</strong>m ∑ n∫ Tk=1 0 h k(t) dWt k . Also, by Lemma 3.7.4, the vector of variables∑ n∫ Tk=1 0 h k(t) dWt k is normally distributed, in particular it has exponential momentsof all orders. It then follows from linearity of the integral <strong>and</strong> Theorem B.6.2 that thevariables exp( ∑ n∫ Tk=1 0 h k(t) dWt k ) are dense in L 2 (G T ). Since 1 ∑ n∫ T2 k=1 0 h2 k(t) dt isconstant, the conclusion of the lemma follows.We are now more or less ready to begin the proof of the martingale representationtheorem. We start out with a version of the theorem <strong>for</strong> finite time intervals.


3.7 Itô’s Representation Theorem 97Theorem 3.7.6. Let X ∈ L 2 (F T ). There exists Y ∈ L 2 T (W )n such thatX = EX +n∑k=1∫ Talmost surely. Y k is unique λ ⊗ P almost surely.0Y ks dW k sProof. We first show the theorem <strong>for</strong> variables of the <strong>for</strong>m given in Lemma 3.7.5 <strong>and</strong>then extend the result by a density argument.Step 1: A special case. Let h 1 , . . . , h n ∈ L 2 [0, T ] be given, <strong>and</strong> putX s = exp( n∑k=1∫ s0h k (t) dW k t − 1 2n∑k=1∫ s0h 2 k(t) dt<strong>for</strong> s ≤ T . Our goal is to prove the theorem <strong>for</strong> X T . Note that defining M by puttingM t = ∑ n∫ tk=1 0 h k(s) dWs k , X t = E(M) t <strong>and</strong> so, by Lemma 3.7.1, X T = 1+ ∫ T0 X s dM s .According to Lemma 3.7.3, X is a square-integrable martingale, <strong>and</strong> there<strong>for</strong>e we mayin particular conclude EX T = 1, showingX T = EX T +n∑k=1∫ T0X s h k (s) dW k s .It remains to argue that s ↦→ X s h k (s) is in L 2 (W ). Since X is a square-integrablemartingale, its squared expectation is increasing. We there<strong>for</strong>e obtainE∫ T0(X s h k (s)) 2 ds =∫ T0h k (s) 2 EX 2 s ds ≤ EX 2 T∫ T0)h k (s) 2 ds,which is finite. Thus, the process s ↦→ X s h k (s) is in L 2 T (W ), <strong>and</strong> the theorem is proven<strong>for</strong> X T .Step 2: The general case. In order to extend the result to the genral case, let Hbe the class of variables in L 2 (F T ) where the theorem holds. Clearly, H is a linearspace. By the first step <strong>and</strong> Lemma 3.7.5, we have there<strong>for</strong>e shown the result <strong>for</strong> adense subspace of L 2 (F T ). The general case will there<strong>for</strong>e follow if we can show thatH is closed.To do so, assume that X n is a sequence in H converging towards X ∈ L 2 (F T ). ThenX n = EX n + ∑ nk=1∫ T0 Y k n (s) dW k s almost surely, <strong>for</strong> some processes Y n ∈ L 2 T (W )n .


98 The Itô IntegralSince X n converges in L 2 , the norms <strong>and</strong> there<strong>for</strong>e the second moments also converge.There<strong>for</strong>e, using Itô’s isometry, Lemma 3.3.6,(n∑‖Yn k − Ym‖ k 2 W = E ∑ n Tk=1k=1∫ T0Y k n (s) − Y k m(s) dW k s= E (X n − EX n − (X m − EX m )) 2= ‖(X n − X m ) − (EX n − EX m )‖ 2 2.Since both ‖X n −X m ‖ 2 <strong>and</strong> |EX n −EX m | tends to zero as n <strong>and</strong> m tends to infinity, weconclude that <strong>for</strong> each k ≤ n, (Yn k ) is a cauchy sequence in L 2 T (W ). By completeness,there exists Y k such that ‖Yn k − Y k ‖ W T tends to zero, <strong>and</strong> we then obtain) 2X = lim X m mn∑= lim EX m + Y kmm(s) dWsk= EX +k=1n∑Y k (s) dWskk=1almost surely, by the continuity of the integral, where the limits are in L 2 .existence part of the theorem is now proven.TheStep 3: Uniqueness.To show uniqueness, assume that we have two representationsX = EX +The Itô isometry yieldsn∑k=1∫ T0Y ks dW k s = EX +n∑k=1∫ T0Z k s dW k s .n∑‖Y k − Z k ‖ 2 W Tk=1= E ( n∑∫ Tk=10Y k (s) − Z k (s) dW k s) 2= 0,so Y k <strong>and</strong> Z k are equal λ ⊗ P almost surely, as desired.Theorem 3.7.7 (The Martingale Representation Theorem). Assume that Mis a F t cadlag local martingale. Then, there exists Y ∈ L 2 (W ) n such thatM t = M 0 +n∑k=1∫ t0Y ks dW k s ,almost surely. If M is square-integrable, Y can be taken to be in L 2 (W ) n . Y k is uniqueλ ⊗ P almost surely.


3.7 Itô’s Representation Theorem 99Proof. Uniqueness follows immediately from Lemma 3.3.7. We there<strong>for</strong>e need onlyconsider existence. Obviously, it will suffice to consider the case where M 0 = 0. Thisis proven in several steps:1. Square-integrable martingales.2. Continuous martingales.3. Cadlag uni<strong>for</strong>mly integrable martingales.4. Cadlag local martingales.Step 1: The square-integrable case. Assume that M is square-integrable startingat zero. In particular, EM = 0. For m ≥ 1, let Y m ∈ L 2 m(W ) n be the process thatexists by Theorem 3.7.6 such thatM m =n∑k=1∫ m0Y k m(s) dW k salmost surely. We then have, by the martingale property of the stochastic integral,n∑∫ (m∑ n ∫ )m+1Ym+1(s) k dWs k = EYm+1(s) k dWsk ∣ F mk=10k=1= E (M m+1 |F m )= M mn∑=k=1∫ m00Y k m(s) dW k s ,almost surely. Then, by the uniqueness part of Theorem 3.7.6, Y k m+1 <strong>and</strong> Y k m must beλ⊗P almost surely equal on [0, m]×Ω. By the Pasting Lemma 2.5.10, there exists Y ksuch that Y k is almost surely equal to Y k m on [0, m]×Ω. In particular, Y [0, t] ∈ L 2 (W ) n<strong>for</strong> all t ≥ 0. This implies that <strong>for</strong> t ≥ 0,M t = E(M [t]+1 |F t )(∑ n= E= E=∫ [t]+1k=10∫ [t]+1( n∑n∑k=1k=1∫ t00Y ks dW k s ,Y k[t]+1 (s) dW k sY ks dW k s)∣ F t)∣ F t


100 The Itô Integralalmost surely, as desired. By Lemma 3.3.16, Y ∈ L 2 (W ) n .Step 2: The continuous case. Now let M be a continuous martingale startingat zero. Then M is locally a square-integrable martingale. Let τ n be a localisingsequence. By what we already have shown, there exists Y m ∈ L 2 (W ) n such thatMtτm= ∑ n∫ tk=1 0 Y m(s) k dWs k almost surely. We then obtainn∑∫ (t∑ n ∫ ) τmtYm+1[0, k τ m ](s) dWs k = Ym+1(s) k dWskk=10k=10= (M τm+1 ) τ mt= M τ mtn∑=k=1∫ t0Y k m(s) dW k s .Since both integr<strong>and</strong>s are in L 2 t (W ) n , we conclude Y k m+1[0, τ m ] = Y k m λ ⊗ P almostsurely on [0, t]. Letting t tend to infinity, we may then conclude Y k m+1[0, τ m ] = Y k mλ ⊗ P almost surely on all of [0, ∞). By the Pasting Lemma, there exists Y k suchthat Y k [0, τ m ] = Y k m[0, τ m ], λ ⊗ P almost surely. In particular, Y k [0, τ m ] ∈ L 2 (W ), soY ∈ L 2 (W ) n , <strong>and</strong>M τ mt ===n∑∫ tk=10∫ tn∑k=1( n∑k=10∫ tY k m[0, τ m ] s dW k sY k [0, τ m ] s dW k s0Y ks dW k s) τm,almost surely. Letting m tend to infinity, we get M t = ∑ n∫ tk=1 0 Y s k dWsk almost surely.This shows the theorem in the case where M is a continuous martingale.Step 3: The cadlag uni<strong>for</strong>mly integrable case. Next, let M be a cadlaguni<strong>for</strong>mly integrable martingale starting at zero. By Theorem 2.4.3, M is convergentalmost surely <strong>and</strong> is closed by its limit, M ∞ . Let M∞ m be a sequence of boundedvariables converging towards M ∞ in L 1 . Assume in particular that ‖M ∞ −M∞‖ m 1 ≤ 13. nM∞ m is square-integrable, so putting Mtm = E(M∞|F m t ), M m is a square-integrablemartingale. By what we already have proven, there are Y m ∈ L 2 (W ) n such thatMtm = M 0 + ∑ n∫ tk=1 0 Y m(s) k dWs k almost surely. In particular, there exists a versionN m of M m in cM 2 0.


3.7 Itô’s Representation Theorem 101Next note that by Doob’s maximal inequality, Theorem 2.4.2,P((N m − M) ∗ t > 12 m )≤ 2 m E|N m t − M t | = 2 m E|M m t − M t |Letting t tend to infinity, we obtain by L 1 -convergenceP((N m − M) ∗ > 12 m )≤ 2 m E|M m ∞ − M ∞ | ≤ 2m3 m .By the Borel-Cantelli Lemma, then, (N m − M) ∗ tends to zero almost surely, so N mconverges almost surely uni<strong>for</strong>mly to M. In particular, M must have a continuousversion, so the conclusion in this case follows from what we proved in the previousstep.Step 4: The cadlag local martingale case. Finally, assume that M is a cadlaglocal martingale starting at zero. By Lemma 2.5.16, M is locally a cadlag uni<strong>for</strong>mlyintegrable martingale. Letting τ m be a localising sequence, we know that the theoremholds <strong>for</strong> M τm . We can then use the pasting technique from step 2 to obtain thedesired representation <strong>for</strong> M.The martingale representation theorem has the following important corollary.Corollary 3.7.8. Let M be a F t local martingale. Then there is Y ∈ L 2 (W ) n suchthat M <strong>and</strong> Y ·W are versions. In particular, any F t local martingale has a continuousversion.Proof. By Theorem 2.4.1, there is a cadlag version of M. We can then apply Theorem3.7.7 to obtain the desired result.Corollary 3.7.8 essentially characterises all martingales with respect to the augmentedBrownian filtration. It provides considerable insight into the structure of F t martingales.In particular, it shows that when we are considering the filtration F t , ourdefinition of the stochastic integral encompasses all local martingales as integrators.There<strong>for</strong>e, in the following, whenever we are considering the augmented Brownian filtration,we can assume whenever convenient that we are dealing with continuous localmartingales instead of the class cM L W of stochastic integrals with respect to Brownianmotion.


102 The Itô Integral3.8 Girsanov’s TheoremIn this section, we will prove Girsanov’s Theorem. As in the previous section, welet (Ω, F, P, F t ) be a probability space with a n-dimensional Brownian motion W ,where (Ω, F, P ) is complete <strong>and</strong> F t is usual augmentation of the filtration induced bythe Brownian motion. We let F ∞ = σ(∪ t≥0 F t ). Girsanov’s Theorem characterisesthe measures which are equivalent to P on F ∞ <strong>and</strong> shows how the distribution of Wvaries when changing to an equivalent measure. We will see that changing the measurecorresponds to changing the drift of the Brownian motion.The theorem thus has two parts, describing:1. The <strong>for</strong>m of the measures equivalent to P on F ∞ .2. The distribution of W under Q, when Q is equivalent to P on F ∞ .After proving both the results, we will prove Kazamakis <strong>and</strong> Novikovs conditions onthe existence of measure changes corresponding to specific changes of drift.The first result is relatively straight<strong>for</strong>ward <strong>and</strong> only requires a small lemma.Lemma 3.8.1. Let Q be a measure on F such that Q is absolutely continuous withrespect to P on F ∞ . Let Q t denote the restriction of Q to F t , <strong>and</strong> let P t denote therestriction of P to F t . Define the likelihood process L t by L t = dQtdP t. Then L is auni<strong>for</strong>mly integrable martingale, <strong>and</strong> it is closed by dQ∞dP ∞.Proof. All the claims of the lemma will follow if we can show E( dQ∞dP ∞|F t ) = L t <strong>for</strong> anyt ≥ 0. To do so, let t ≥ 0 <strong>and</strong> let A ∈ F t . Using Lemma B.6.8 twice yields∫AdQ ∞dP ∞dP =AA∫AdQ ∞dP ∞dP ∞ =∫A∫dQ ∞ =AdQ t .Making the same calculations in the reverse direction then yields∫ ∫∫∫dQ t dQ tdQ t = dP t = dP = L t dP.dP t dP tThus, E(1 AdQ ∞dP ∞) = E(1 A L t ) <strong>for</strong> any A ∈ F t , <strong>and</strong> there<strong>for</strong>e E( dQ ∞dP ∞|F t ) = L t . Thisshows that L is a martingale closed by dQ ∞dP ∞. In particular, L is uni<strong>for</strong>mly integrable<strong>and</strong> has the limit dQ ∞dP ∞almost surely <strong>and</strong> in L 1 .AA


3.8 Girsanov’s Theorem 103Theorem 3.8.2 (Girsanov’s Theorem, part one). Let Q be a measure on Fequivalent to P on F ∞ . There exists a process M ∈ cM L W such that E(M) is auni<strong>for</strong>mly integrable martingale with the property dQ∞dP ∞= E(M) ∞ .Proof. Let L t be the likelihood process <strong>for</strong> dQdP. By Lemma 3.8.1, L is a uni<strong>for</strong>mlyintegrable martingale. Since the filtration is assumed to satisfy the usual conditions,there exists a cadlag version of L. Since L t is almost surely nonnegative <strong>for</strong> anyt ≥ 0, we can take the cadlag version to be nonnegative almost surely. By Theorem3.7.7, there exists processes Y ∈ L 2 (W ) n such that we have L t = ∑ n∫ tk=1 0 Y s dW s . Inparticular, L can be taken to be continuous.Our plan is to show that L satisfies an exponential SDE, this will yield the desiredresult. If L almost surely has positive paths, we can do this simply by writingL t =n∑k=1∫ t0YskL s dWs k ,L s<strong>and</strong> if Y ksL s∈ L 2 (W ), we have the result. Thus, our proof runs in three parts. First,we show that L almost surely has positive paths. Then we argue that Y kL ∈ L2 (W ).Finally, we prove the theorem using these facts.Step 1: L has positive paths almost surely. To show that L almost surely haspositive paths, define τ = inf{t ≥ 0|L t = 0}. Let t ≥ 0. Then (τ ≤ t) ∈ F t . Usingthat dQ tdP t= L t , we obtain∫∫Q(τ ≤ t) = Q t (τ ≤ t) = L t dP t = L t dP.(τ≤t)(τ≤t)Next using that (τ ≤ t) ∈ F τ∧t <strong>and</strong> that L is a P -martingale, we find∫∫∫∫L t dP = E(L t |F τ∧t ) dP = L τ∧t dP = L τ dP = 0,(τ≤t)(τ≤t)(τ≤t)(τ≤t)where we have used that whenever τ is finite, L τ = 0. This shows Q(τ ≤ t) = 0. SinceP <strong>and</strong> Q are equivalent, we conclude P (τ ≤ t) = 0. Letting t tend to infinity, weobtain P (τ < ∞) = 0. There<strong>for</strong>e, P almost all paths of L are positive. By changingL on a set of measure zero, we can assume that all the paths of L are positive.Step 2: Integrability. Next, we show that Y ksL s∈ L 2 (W ). Let τ n be a localisingsequence <strong>for</strong> Y k . Define the stopping time σ n by σ n = inf{t ≥ 0|L t < 1 n}. Since the


104 The Itô Integralpaths of L are positive, σ n tends to infinity. By L σ nalmost surely, so ( Y ksM t = ∑ n∫ tk=1 0desired.Y sL sdW k s(∣ ∣∣∣ Y ksL s∣ ∣∣∣) τn ∧σ n≤ n(Y ks ) τ n∧σ n,≥ 1 n, we obtainL s) τ n∧σ nis in L 2 (W ), showing Y ksL s∈ L 2 (W ). There<strong>for</strong>e, the processis well-defined, <strong>and</strong> L t = ∑ n∫ tk=1 0 L s YsL sdWsk = ∫ t0 L s dM s , asStep 3: Conclusion. Finally, by Lemma 3.7.1, we have L = E(M). Since L isuni<strong>for</strong>mly integrable by Lemma 3.8.1 with limit dQ ∞dP ∞, we conclude dQ ∞dP ∞= E(M) ∞ .We have now proved the first part of Girsanov’s Theorem. Theorem 3.8.2 yields aconcrete <strong>for</strong>m <strong>for</strong> any measure Q which is equivalent to P on F ∞ . The other half ofGirsanov’s Theorem uses this concrete <strong>for</strong>m to describe the Q-distribution of W .Theorem 3.8.3 (Girsanov’s Theorem, part two). Let Q be a measure on Fequivalent to P on F ∞ with Radon-Nikodym derivative dQ ∞dP ∞= E(M) ∞ , where M isin cM L W such that E(M) is a uni<strong>for</strong>mly integrable martingale. Let M = Y · W , whereY ∈ L 2 (W ) n . The process X in R n given by Xtk = Wt k − ∫ t0 Y s k ds is an n-dimensionalF t -Brownian motion under Q.Proof. We first argue that the conclusion is well-defined. Let Y ∈ L 2 (W ) n , <strong>and</strong> letτ m be a localising sequence such that Y [0, τ m ] ∈ L 2 (W ) n . Then ∫ t0 (Y k [0, τ m ] s ) 2 dsis almost surely finite, <strong>and</strong> by Lemma B.2.4, ∫ t0 |Y k [0, τ m ] s | ds is almost surely finite.Thus, Y k ∈ L 1 (t), <strong>and</strong> ∫ t0 Y ks ds is well-defined. There<strong>for</strong>e, X k is well-defined <strong>and</strong> theconclusion is well-defined. Also note that since Y k is progressive, X k is adapted toF t . There<strong>for</strong>e, to prove the theorem, it will suffice to show that s ↦→ X t+s − X t is aBrownian motion <strong>for</strong> any t ≥ 0.To prove the result, we first consider the case where Y is bounded <strong>and</strong> zero from adeterministic point onwards. Afterwards, we will obtain the general case by approximation<strong>and</strong> localisation arguments.Step 1: The dominated case, zero from a point onwards. First assume thatthere exists T > 0 such that Y t is zero <strong>for</strong> t ≤ T <strong>and</strong> assume that Y is bounded. Havingthis condition will allow us to apply Lemma 3.7.3 in our calculations. To show that


3.8 Girsanov’s Theorem 105X is an F t -Brownian motion under Q, it will suffice to show that <strong>for</strong> any 0 ≤ s ≤ t,( n)∣ ) (∑∣∣∣∣E(expQ θ k (Xt k − Xs k 1) F s = exp2k=1)n∑θk(t 2 − s) .k=1In order to show this, let f k ∈ L 2 [0, ∞) <strong>and</strong> A ∈ F. Put Z k u = Y k u + f k (u) <strong>and</strong>N = Z·W , by Lemma 3.7.3 that the exponential martingale E(N) is a square-integrablemartingale. In particular E(N) has an almost sure limit. We first proveE Q 1 A exp( n∑k=1∫ ∞0f k (u) dX k u)= exp(12n∑k=1∫ ∞0f k (u) 2 du)E P 1 A E(N) ∞ ,<strong>and</strong> use this <strong>for</strong>mula to show the conditional expectation <strong>for</strong>mula above. Note thatthe stochastic integral above is well-defined in the sense that X k ∈ S under P . Weconsider the left-h<strong>and</strong> side of the above. Note that from the condition on Y , we obtainY ∈ L 2 (W ) n <strong>and</strong> there<strong>for</strong>e ∫ ∞Y k 0 u dWu k <strong>and</strong> ∫ ∞0 (Y u k ) 2 du are well-defined. We canthere<strong>for</strong>e writeE Q 1 A exp= E P 1 A exp= E P 1 A exp( n∑∫ ∞k=10∫ ∞( n∑k=10∫ ∞( n∑k=10f k (u) dX k uf k (u) dX k u))f k (u) dX k u +E(M) ∞n∑k=1∫ ∞0Y k u dW k u − 1 2n∑k=1∫ ∞0(Y k u ) 2 du).We note that==∫ ∞0∫ ∞f k (u) dX k +f k (u) dW k u −∫ ∞0∫ ∞00∫ ∞0f k (u) + Y k u dW k u + 1 2Y k u dW k u − 1 2f k (u)Y k u du +∫ t(Y k u ) 2 du0∫ ∞0∫ ∞0f k (u) 2 du − 1 2Y k u dW k u − 1 2∫ ∞0∫ t0(Y k u ) 2 du(f k (u) + Y k u ) 2 du.


106 The Itô IntegralWe then obtainE P 1 A exp= E P 1 A exp= E P 1 A exp= exp(12n∑k=1( n∑∫ ∞k=10∫ ∞( n∑k=1(12n∑k=1∫ ∞00∫ ∞0f k (u) dX k u +n∑k=1∫ ∞f k (u) + Yu k dWu k + 1 2)f k (u) 2 duf k (u) 2 du)0E(N) ∞E P 1 A E(N) ∞ .Y k u dW k u − 1 2∫ ∞0n∑k=1f k (u) 2 du − 1 2We have now proved that <strong>for</strong> any f k ∈ L 2 [0, ∞) <strong>and</strong> A ∈ F,E Q 1 A exp( n∑k=1∫ ∞0f k (u) dX k u)= exp(12n∑k=1∫ ∞0f k (u) 2 du∫ ∞0∫ ∞)0(Y k u ) 2 du)(f k (u) + Y k u ) 2 duE P 1 A E(N) ∞ .We are ready to demonstrate that X is an F t -Brownian motion under Q. Let 0 ≤ s ≤ t,A ∈ F s <strong>and</strong> θ ∈ R n . We can then define f k by f k (u) = θ k 1 (s,t] (u). Since f k is zero on[0, s], M s = N s <strong>and</strong> there<strong>for</strong>e, using that E(N) is a uni<strong>for</strong>mly integrable martingale,E P 1 A E(N) ∞ = E P 1 A E(N) s = E P 1 A E(M) s = Q(A).)We then obtain( n)∑E Q 1 A exp θ k (Xt k − Xs k )k=1= E Q 1 A exp= exp(12(1n∑( n∑∫ ∞k=10∫ ∞k=10∫ ∞f k (u) dX k uf k (u) 2 ds)n∑= expf k (u) 2 ds Q(A)2k=10()1n∑= exp θ 22k(t − s) Q(A).k=1))E P 1 A E(N) ∞This shows the desired <strong>for</strong>mula,( n)∣ ) (∑∣∣∣∣E(expQ θ k (Xt k − Xs k 1) F s = exp2k=1)n∑θk(t 2 − s) .Finally, we may conclude that X is an F t -Brownian motion under Q.k=1


3.8 Girsanov’s Theorem 107Step 2: The L 2 (W ) case, deterministic from a point onwards. Now onlyassume that Y ∈ L 2 (W ) n <strong>and</strong> that Y is zero from T <strong>and</strong> onwards. In the previousstep, we obtained our results by calculating the conditional Laplace trans<strong>for</strong>m. In thecurrent case, we will instead calculate the conditional characteristic function. Our aimis to show that <strong>for</strong> any 0 ≤ s ≤ t,()∣ ) ()n∑∣∣∣∣E(expQ i θ k (Xt k − Xs k ) F s = exp − 1 n∑θ 22k(t − s) .k=1We will use an approximation argument. To this end, define Yn k = Y k ∧ n ∨ −n.Y n then satisfies the condition of the first step, <strong>and</strong> we may conclude that puttingM n (t) = Y n · W <strong>and</strong> defining Q n by dQ ndP n= E(M n ) ∞ , Q n is a probability measure <strong>and</strong>X n given by Xn(t) k = Wtk − ∫ t0 Y n k (s) ds is a Brownian motion under Q n .Our plan is to show thatk=11. There is a subsequence of E(M m ) ∞ converging to E(M) ∞ in L 1 .2. For t ≥ 0, there is a subsequence of X k m(t) converging to X k (t) almost surely.We will be able to combine these facts to obtain our desired result.To prove the first of the two convergence results above, note that Ynk tends to Y k inL 2 (W ) by dominated convergence. There<strong>for</strong>e, ∫ ∞Y k 0 n (s) dWs k tends to ∫ ∞Y k (s) dW k 0 sin L 2 . By taking out a subsequence, we can assume that the convergence is almostsure. Furthermore, from Lemma B.6.5 we find that by taking a subsueqence, we canassume that ∫ ∞0Y k n (s) 2 ds tends to ∫ ∞0Y k (s) 2 ds almost surely. We have now ensuredthat almost surely, ∫ ∞0Y k n (s) dW k s tends to ∫ ∞0Y k (s) dW k s , <strong>and</strong> ∫ ∞0Y k n (s) 2 ds tendsto ∫ ∞0Y k (s) 2 ds. There<strong>for</strong>e, E(M n ) ∞ converges almost surely to E(M) ∞ . We knowthat E P E(M n ) ∞ = 1, <strong>and</strong> have assumed E P E(M) ∞ = 1. By nonnegativity, Scheffé’sLemma B.2.3 yields that E(M n ) ∞ converges in L 1 to E(M) ∞ , as desired.To prove the second convergence result, let t ≥ 0 be given. Note that since Ynk convergesin L 2 (W ) to Y k <strong>and</strong> the processes are zero from a deterministic point onwards,we obtain from Lemma B.2.4 that E ∫ ∞|Y k 0 n (s)−Y k (s)| ds tends to zero. There<strong>for</strong>e, byLemma B.6.5, we can by taking out a subsequence assume that ∫ t0 Y n k (s) ds convergesalmost surely to ∫ t0 Y k (s) ds. We then obtainlimnX k n(t) = limnW k (t) −∫ t0Y k n (s) ds = X k (t)


108 The Itô Integralas desired.Now let 0 ≤ s ≤ t. We select a subsequence such that1. E(M m ) ∞ converges to E(M) ∞ in L 1 .2. X k m(s) converges almost surely to X k (s).3. X k m(t) converges almost surely to X k (t).Since X m is F t -Brownian motion under Q m , we know that()∣ ) (n∑∣∣∣∣E(expQm i θ k (Xm(t) k − Xm(s))k F s = exp − 1 2k=1We will identify the limit of the left-h<strong>and</strong> side. We know that( ()∣ )n∑∣∣∣∣E Q mexp i θ k (Xm(t) k − Xm(s))k F s= E P (exp(ik=1)n∑θk(t 2 − s) .k=1) ∣ )n∑∣∣∣∣θ k (Xm(t) k − Xm(s))k E(M m ) ∞ F s .k=1Using our convergence results with Lemma B.2.2, we find that()()n∑n∑exp i θ k (Xm(t) k − Xm(s))k LE(M m ) 1∞ −→ exp i θ k (X k (t) − X k (s)) E(M) ∞ ,k=1<strong>and</strong> there<strong>for</strong>e, by the L 1 continuity of conditional expectations,()∣ ) ()n∑∣∣∣∣E(expQ i θ k (Xt k − Xs k ) F s = exp − 1 n∑θ 22k(t − s) ,as desired.k=1k=1k=1Step 3: The L 2 (W ) case. Finally, merely assume Y ∈ L 2 (W ) n . Let σ n be a localisingsequence such that Y k [0, σ n ] ∈ L 2 (W ). Put τ n = σ n ∧ n <strong>and</strong> define Y k n = Y k [0, τ n ].Then Y k is in L 2 (W ) <strong>and</strong> is zero from a deterministic point onwards. There<strong>for</strong>e,we can use the result from the previous step on Y n . As in the preceeding step ofthe proof, we define M m = Y m · W <strong>and</strong> define Q m by dQ mdP m= E(M m ) ∞ . PuttingX k m(t) = W k (t) − ∫ t0 Y k m(s) ds, X m is then a F t Brownian motion under Q m . We willshow


3.8 Girsanov’s Theorem 1091. E(M m ) ∞ converges to E(M) ∞ in L 1 .2. X k m(t) converges to X k (t) almost surely.Having these two facts, the conclusion will follow by precisely the same method as inthe previous step. To prove the first convergence result, note that E(M) τ m= E(M m ).Since E(M) is assumed to be a uni<strong>for</strong>mly integrable martingale, it has an almost surelylimit <strong>and</strong> we may conclude E(M) τm = E(M m ) ∞ . Now, (E(M) τm ) m≥1 is a discretetimemartingale, <strong>and</strong> it is closed by E(M) ∞ . There<strong>for</strong>e, it is convergent in L 1 . ThusLE(M m ) 1∞ −→ E(M) ∞ .The second result follows directly fromlimmX k m(t) = limmW k t −∫ t0Y k m(s) ds∫ t= Wt k − lim Y k (s)1 [0,τm m](s) ds= W k t −= X k t ,∫ t00Y k (s) dsalmost surely. As in the previous step, we conclude that X is an F t Brownian motionunder Q.We are now done with the proof of Girsanov’s Theorem. In the first part of thetheorem, Theorem 3.8.2, we have seen that any probability measure Q equivalent to Pon F ∞ has the property that dQ ∞dP ∞= E(M) ∞ <strong>for</strong> some M such that E(M) is uni<strong>for</strong>mlyintegrable. In the second part, Theorem 3.8.3, we have seen now the distribution ofW changes under this measure change.When we have two equivalent probability measures P <strong>and</strong> Q, we will usually be interestedin stochastic integration under both of these probability measures, in the sensethat a given process can be a st<strong>and</strong>ard process under P with respect to the originalBrownian motion, <strong>and</strong> it can be a st<strong>and</strong>ard process under Q with respect to the newQ Brownian motion. We have defined a stochastic integral <strong>for</strong> each of these cases. Wewill now check that whenever applicable, the two stochastic integrals agree. We alsoshow that being a st<strong>and</strong>ard process under P is the same as being a st<strong>and</strong>ard processunder Q.


110 The Itô IntegralTheorem 3.8.4. Let P <strong>and</strong> Q be equivalent on F ∞ . If X is a st<strong>and</strong>ard process underP , it is a st<strong>and</strong>ard process under Q as well. Let L P (X) denote the integr<strong>and</strong>s <strong>for</strong> Xunder P , <strong>and</strong> let L Q (X) denote the integr<strong>and</strong>s <strong>for</strong> X under Q. If Y ∈ L P (X)∩L Q (X),then the integrals under P <strong>and</strong> Q agree up to indistinguishability.Comment 3.8.5 The meaning of the lemma is a bit more convoluted that it mightseem at first. To be a st<strong>and</strong>ard process, it is necessary to work in the context of afiltered space (Ω, F, P, F t ) with a F t Brownian motion. That X is a st<strong>and</strong>ard processunder Q thus assumes that we have some particular process in mind which is a F tBrownian motion under Q. In our case, the implicitly understood Brownian motion isthe one given by Girsanov’s theorem.◦Proof. We need to prove two things: That X is a st<strong>and</strong>ard process under Q, <strong>and</strong> thatif ∈ L P (X) ∩ L Q (X), the integrals agree.Step 1: X is a st<strong>and</strong>ard process under Q. Let the canonical decomposition ofX under P be given by X = A + M with M = Y · W . By Girsanov’s Theorem, thereis a continuous local martingale N = Z · W such that E(N) is a uni<strong>for</strong>mly integrablemartingale <strong>and</strong> dQ∞dP ∞= E(N) ∞ , <strong>and</strong> then the process W Q given byW Q k (t) = W k(t) −∫ t0Z k (s) dsis a F t Brownian motion under Q. Note that since the null sets of P <strong>and</strong> Q arethe same, the space (Ω, F, Q, F t ) satisfies the usual conditions. Thus, with W Q as ourBrownian motion, we have a setup precisely as described throughout the chapter. Andby the properties of the stochastic integral under P ,X t = A t + M tn∑= A t += A t +∫ tk=10∫ tn∑k=10Y k (s) dW k (s)Z k (s) ds +n∑k=1∫ t0Y ks dW Q k (s).Under the setup (Ω, F, Q, F t ), the process A t + ∑ nk=1 Z k(s) ds is of finite variation <strong>and</strong>the process ∑ nk=1 Y s k dW Q k (s) is in cML W . There<strong>for</strong>e, X is a st<strong>and</strong>ard process underQ.Step 2: Agreement if the integrals, part 1. Now assume that Y is some processwith is in L P (X) ∩ L Q (X). Let IX P (Y ) be the integral of Y with respect to X under


3.8 Girsanov’s Theorem 111P , <strong>and</strong> let I Q X(Y ) be the integral of Y with respect to X under Q. We want to showthat IX P (Y ) <strong>and</strong> IQ X(Y ) are indistinguishable.We first consider the case where Y is bounded <strong>and</strong> continuous. Let t ≥ 0. We willshow that IX P (Y ) t = I Q X (Y ) t almost surely. By Lemma 3.4.15, the Riemann sums∑ nk=1 Y t k−1(X tk − X tk−1 ) converge in probability under P to IX P (Y ) t <strong>and</strong> converge inprobability under Q to I Q X (Y ) t <strong>for</strong> some sequence of partitions of [0, t] with meshestending to zero. By taking out a subsequence, we can assume that the convergenceis almost sure. Since P <strong>and</strong> Q are equivalent, we find that both under P <strong>and</strong> Q, theRiemann sums tend to both IX P (Y ) t <strong>and</strong> I Q X (Y ) t. There<strong>for</strong>e, IX P (Y ) t <strong>and</strong> I Q X (Y ) t arealmost surely equal. Since both processes are continuous, we conclude that IX P (Y ) <strong>and</strong>I Q X(Y ) are indistinguishable.Step 3: Agreement of the integrals, part 2. Now, as be<strong>for</strong>e, let X = A + Mbe the canonical decomposition of X under P , <strong>and</strong> let X = B + N be the canonicaldecomposition of X under Q. We consider the case where V A <strong>and</strong> V B are bounded,where M, N ∈ cM 2 W <strong>and</strong> where Y is bounded <strong>and</strong> zero from a deterministic pointonwards. As in the proof of Theorem 3.1.7, defineY n (t) = 1 1n∫ t(t− 1 n )+ Y (s) ds.Then Y n is bounded <strong>and</strong> continuous with ‖Y n ‖ ∞ ≤ ‖Y ‖ ∞ , <strong>and</strong> Y n converges to Yλ ⊗ P everywhere. Since µ M <strong>and</strong> µ N are bounded, this implies that Y n convergesto Y in L 2 (M) <strong>and</strong> L 2 (N), <strong>and</strong> there<strong>for</strong>e IM P (Y n) converges to IM P (Y ) in cM2 0 underP . Likewise, I Q N (Y n) converges to I Q N(Y ) under Q. Taking out subsequences, wecan assume that the convergence is uni<strong>for</strong>mly almost sure, showing in particular thatIM P (Y n) t converges to IM P (Y ) t almost surely <strong>and</strong> I Q N (Y n) t converges to I Q N (Y ) t almostsurely <strong>for</strong> any t ≥ 0.Next, consider the finite variation components. Since V A <strong>and</strong> V B are bounded we findthat <strong>for</strong> almost all t ≥ 0, I P A (Y n) t converges almost surely to I P A (Y ) t. Likewise, <strong>for</strong>almost all t ≥ 0, I Q B (Y n) t converges almost surely to I Q B (Y ) t.All in all, we may conclude that <strong>for</strong> almost all t ≥ 0, I P X (Y n) t converges to I P X (Y ) t<strong>and</strong> I Q X (Y n) t converges to I Q X (Y ) t almost surely. There<strong>for</strong>e, by what we already haveshownI P X(Y ) t = limnI P X(Y n ) t = limnI Q X (Y n) t = I Q X (Y ) t.Since any Lebesgue almost sure set is dense <strong>and</strong> the processes are continuous, we


112 The Itô Integralconclude that IX P (Y ) <strong>and</strong> IQ X(Y ) are indistinguishable.Step 4: Agreement of the integrals, part 3. We now still assume that V A <strong>and</strong>V B are bounded <strong>and</strong> that M, N ∈ cM 2 W , but we reduce our requirement on Y to bethat Y is in L 1 (A) ∩ L 1 (B) ∩ L 2 (M) ∩ L 2 (N). We will use the results of the previousstep to show that the result holds in this case. DefineY n (t) = Y (t)1 (|Yt |>n)1 [0,n] (t).Then Y is bounded <strong>and</strong> zero from a deterministic point onwards. Furthermore,|Y n (t)| ≤ |Y (t)|, <strong>and</strong> Y n converges pointwise to Y . By dominated convergence, wethere<strong>for</strong>e conclude that Y n converges to Y in L 2 (M) <strong>and</strong> L 2 (N). This implies thatIM P (Y n) tends to IM P (Y ) in cM2 0 under P , <strong>and</strong> I Q N (Y n) tends to I Q N (Y ) in cM2 0 underQ. For the finite variation compoments, we find that IA P (Y n) converges pointwise toIA P (Y ) <strong>and</strong> IQ A (Y n) converges pointwise to I Q A(Y ).As in the previous step, this impliesshowing the result in this case.I P X(Y ) t = limnI P X(Y n ) t = limnI Q X (Y n) t = I Q X (Y ) t,Step 5: Agreement of the integrals, part 4. Finally, we consider the generalcase. Define the stopping timesτ A n = inf{t ≥ 0|V A (t) ≥ n}τ B n = inf{t ≥ 0|V A (t) ≥ n},<strong>and</strong> let τn M <strong>and</strong> τn N be sequences localising M <strong>and</strong> N to cM 2 W , respectively. Let σ nbe such that Y σn is in L 1 (A) ∩ L 1 (B) ∩ L 2 (M) ∩ L 2 (N). Defining the stopping timeτ n by τ n = τn A ∧ τn B ∧ τn M ∧ τn N ∧ σ n . ThenV τn τnA<strong>and</strong> VB are bounded, M τn <strong>and</strong> N τnare in cM 2 W <strong>and</strong> Y τ nis in L 1 (A τ n) ∩ L 1 (B τ n) ∩ L 2 (M τ n) ∩ L 2 (N τ n). Furthermore, τ ntends to infinity almost surely. By Lemma 3.4.11,I P X(Y ) τn = I P X τ n (Y )I Q X (Y )τn = I Q X τn (Y ).Since P <strong>and</strong> Q are equivalent, τ n converges to infinity both Q <strong>and</strong> P almost surely.There<strong>for</strong>e, since we have shown the result in the localised case, the result follows <strong>for</strong>the general case as well.


3.8 Girsanov’s Theorem 113Next, we consider necessary requirements <strong>for</strong> the practical application of the Girsanovtheorem. In practice, we are not given a probability measure Q <strong>and</strong> want to identifythe distribution of W under Q. Rather, we desire to change the distribution of W ina certain manner, <strong>and</strong> wish to find an equivalent probability measure Q to obtain thischange. If we plan to use Theorem 3.8.3 to facilitate this change, we need to identifya local martingale M zero at zero yielding the desired distribution <strong>and</strong> ensure thatE(M) is a uni<strong>for</strong>mly integrable martingale. This latter part is the most difficult partof the plan. Note that the condition M 0 = 0 is necessary <strong>for</strong> E(M) to be defined atall.In Lemma 3.7.3, we obtained a sufficient condition to ensure that E(M) is uni<strong>for</strong>mlyintegrable. This condition is too strong to be useful in practice, however. Our nextobjective is to find weaker conditions that ensure the uni<strong>for</strong>mly integrable martingaleproperty of E(M). These conditions take the <strong>for</strong>m of Kazamaki’s <strong>and</strong> Novikov’scriteria. We are led by Protter (2005), Section III.8.Lemma 3.8.6. Let M ∈ cM L 0 . E(M) is a supermartingale with EE(M) t ≤ 1 <strong>for</strong> allt ≥ 0. E(M) is a martingale if <strong>and</strong> only if EE(M) t = 1 <strong>for</strong> all t ≥ 0. E(M) is auni<strong>for</strong>mly integrable martingale if <strong>and</strong> only if EE(M) ∞ = 1.Proof. Since E(M) is a continuous nonnegative local martingale, it is a supermartingale.In particular, EE(M) t ≤ EE(M) 0 = 1. Furthermore, E(M) ∞ always exists asan almost sure limit by the Martingale Convergence Theorem.The martingale criterion. We now prove that E(M) is a martingale if <strong>and</strong> only ifEE(M) t = 1 <strong>for</strong> t ≥ 0. Clearly, if E(M) is a martingale, EE(M) t = EE(M) 0 = 1 <strong>for</strong>all t ≥ 0. Conversely, assume that EE(M) t = 1 <strong>for</strong> all t ≥ 0. Consider 0 ≤ s ≤ t. Bythe supermartingale property, we obtain E(M) s ≥ E(E(M) t |F s ). This shows that0 ≤ E(E(M) s − E(E(M) t |F s )) = EE(M) s − EE(M) t = 0,so E(M) s − E(E(M) t |F s ) is a nonnegative variable with zero mean. Thus, it is zeroalmost surely <strong>and</strong> E(M) s = E(E(M) t |F s ), showing the martingale property.The UI martingale criterion. Next, we prove that E(M) is a uni<strong>for</strong>mly integrablemartingale if <strong>and</strong> only if EE(M) ∞ = 1. If EE(M) ∞ = 1, we obtain1 = EE(M) ∞ = E lim inftE(M) t ≤ lim inf EE(M) t .tNow, EE(M) t is decreasing <strong>and</strong> bounded by one, so we conclude lim t EE(M) t = 1 <strong>and</strong>


114 The Itô Integralthere<strong>for</strong>e EE(M) t = 1 <strong>for</strong> all t. There<strong>for</strong>e, by what we already have shown, E(M) is amartingale. By Lemma B.5.4, it is uni<strong>for</strong>mly integrable.On the other h<strong>and</strong>, if E(M) is a uni<strong>for</strong>mly integrable martingale, it is convergent inL 1 <strong>and</strong> in particular EE(M) ∞ = lim t EE(M) t = 1.Lemma 3.8.7. Let M be a continuous local martingale zero at zero. Let ε > 0. Ifit holds that sup τ E exp (( 12 + ε) M τ)< ∞, where the supremum is over all boundedstopping times, then E(M) is a uni<strong>for</strong>mly integrable martingale.Proof. Let ε > 0 be given. Our plan is to identify an a > 1 such that sup τ EE(M) a τis finite, where the supremum is over all bounded stopping times. If we can do this,Lemma 2.5.19 shows that E(M) is a martingale bounded in L a , in particular E(M) isa uni<strong>for</strong>mly integrable martingale. To this end, let a, r > 1 be given. We then have(E(M) a τ = exp aM τ − a ) (√ a2 [M] τ = expr M τ − a ) (( √ ) ) a2 [M] τ exp a − M τ .rThe point of this observation is that when we raise the first factor in the right-h<strong>and</strong>side above to the r’th power, we obtain an exponential martingale. There<strong>for</strong>e, lettings be the dual exponent of r, Hölder’s inequality <strong>and</strong> Lemma 3.8.6 yieldsEE(M) a τ≤( ( √arMτE exp − ar )) 1( (( √ ) ))2 [M] ra 1 sτ E exp a − sM τr= ( EE( √ () 1rarM) τ E exp≤( (( √ ) )) a 1 sE exp a − sM τ .r(( √ ) )) a 1 sa − sM τrRecalling that s =rr−1, we then findsup EE(M) a ττ≤=((sup E expτsup E expτ(( √ ) a ra −rr − 1 M τ( (ar−√ ar) 1r − 1 M τ)) 1 s)) 1s.Thus, if we can identify a, r > 1 such that (ar − √ ar) 1r−1 ≤ 1 2+ε, the result will followfrom our assumption. To this end, define f(y, r) = (y − √ y) 1r−1. Finding a, r > 1 suchthat (ar− √ ar) 1r−1 ≤ 1 2 +ε is equivalent to finding y > r > 1 such that f(y, r) ≤ 1 2 +ε.


3.8 Girsanov’s Theorem 115To this end, note that ddy (y − √ y) = 1 − 12 √ y , which is positive whenever y > 1 4 . Wethere<strong>for</strong>e obtaininf f(y, r) = inf infr − √ rf(y, r) = infy>r>1 r>1 y>r r>1 r − 1= infr>1√ r√ r + 1= infr>111 + 1 √ r= 1 2 .We may now conclude that <strong>for</strong> any ε > 0, there is y > r > 1 such that f(y, r) ≤ 1 2 + ε.The proof of the lemma is then complete.Comment 3.8.8 This lemma parallels the lemma preceeding Theorem 44 of SectionIII.8 in Protter (2005). However, Protter avoids the analysis of the mapping f used inthe proof above by <strong>for</strong>mulating his lemma differently <strong>and</strong> making some very inspired√ pchoices of exponents. He shows that if sup τ E exp(2 √ p−1 M τ ) is finite, then E(M) is amartingale bounded in L q , where q is the dual exponent to p. His method is based onHölder’s inequality just as our proof, <strong>and</strong> he uses r = √ p+1√ p−1.We have opted <strong>for</strong> a different proof, involving analysis of the mapping f, in order toavoid having to use the very original choice r = √ p+1√ p−1out of the blue.◦Theorem 3.8.9 (Kazamaki’s Criterion). Let M be a continuous local martingalezero at zero. If sup τ E exp ( 12 M τ)< ∞, where the supremum is over all boundedstopping times, then E(M) is a uni<strong>for</strong>mly integrable martingale.Proof. Our plan is to show that E(aM) is a uni<strong>for</strong>mly integrable martingale <strong>for</strong> any0 < a < 1 <strong>and</strong> let a tend to one in a suitable context. Let 0 < a < 1. Then there isε a > 0 such that a( 1 2 +ε a) ≤ 1 2. There<strong>for</strong>e, with the supremum over bounded stoppingtimes,(( ) )1sup E expτ2 + ε a aM τ≤≤≤((( ) ))1sup E 1 (Mτ 0 such that x + ≤ C + exp( 1 2x). There<strong>for</strong>e, by ourassumptions, M t+ is bounded in L 1 , <strong>and</strong> there<strong>for</strong>e M t converges almost surely to its


116 The Itô Integrallimit M ∞ . Since the limit [M] ∞ always exists in [0, ∞], we can then write)E(aM) ∞ = exp(aM ∞ − a22 [M] ∞)= exp(a 2 M ∞ − a22 [M] ∞ exp ( (a − a 2 ))M ∞= E(M) a2 exp ( (a − a 2 )M ∞).Using that E(aM) is a uni<strong>for</strong>mly integrable martingale <strong>and</strong> applying Hölder’s inequalitywith 1 a 2 <strong>and</strong> its dual exponent11−a 2 , we obtain by Lemma 3.8.6 that1 = EE(aM) ∞≤( a − a2(EE(M))(E a2 exp1 − a 2 M ∞)) 1−a2Now, if the last factor on the right-h<strong>and</strong> side is bounded over a > 1, we can let a tendto 1 from above <strong>and</strong> obtain 1 ≤ EE(M) ∞ , which would give us the desired conclusion.Since a−a21−a= a2 1+a = 1 < 1 1a +1 2 , it will suffice to prove that E exp( a 2 M ∞) is boundedover 0 < a < 1. This is what we now set out to do..Define C = sup τ E exp( 1 2 M τ ), C is finite by assumption. Note that since a < 1, thereis p > 1 such that ap ≤ 1 <strong>and</strong> there<strong>for</strong>e( ap)2 M t ≤ C.supt≥0( a pE exp t)2 M = sup E expt≥0This shows that exp ( a2 M t)is bounded in L p <strong>for</strong> t ≥ 0. In particular, the family isuni<strong>for</strong>mly integrable. Since M t converges almost surely to M ∞ , we have convergencein probability as well. There<strong>for</strong>e, exp( a 2 M t) converges in L 1 to exp( a 2 M ∞), <strong>and</strong> wemay conclude( a)( a)E exp2 M ∞ = lim E expt 2 M t ≤ C.In particular, then, we finally get 1 ≤ (EE(M) ∞ ) a2 C 1−a2 , <strong>and</strong> letting a tend to oneyields 1 ≤ EE(M) ∞ . By Lemma 3.8.6, the proof is complete.Theorem 3.8.10 (Novikov’s Criterion). Let M be a continuous local martingalezero at zero. Assume that E exp( 1 2 [M] ∞) is finite. Then E(M) is a uni<strong>for</strong>mly integrablemartingale.Proof. We will show that Novikov’s criterion implies Kazamaki’s criterion. Let τ beany bounded stopping time. Then( ) ( (E(M) 1 2 1τ = exp2 M τ exp − 1 )) 12 [M] 2τ ,


3.9 Why The Usual Conditions? 117which, using the Cauchy-Schwartz inequality, shows thatE exp( 12 M τ)= EE(M) 1 2 τ(exp( 12 [M] τ≤ (EE(M) τ ) 1 2≤)) 12( ( )) 11E exp2 [M] 2τ( ( )) 11E exp2 [M] 2∞ ,where we have used that [M] is increasing. Since the final right-h<strong>and</strong> side is independentof τ <strong>and</strong> finite by assumption, Kazamaki’s criterion is satisfied <strong>and</strong> the conclusionfollows from Theorem 3.8.9.This concludes our work on the stochastic calculus. In this section, we have provenGirsanov’s theorem, characterising the possible measure changes on F ∞ <strong>and</strong> givingthe change in drift of the Brownian motion. The Kazamaki <strong>and</strong> Novikov criteria canbe used to obtain the existence of measure changes corresponding to specific drifts. Ofthese two, the Kazamaki criterion is the strongest, but the Novikov criterion is ofteneasier to apply.Be<strong>for</strong>e proceeding to the next chapter, we will in the next section discuss our choiceto assume the usual conditions.3.9 Why The Usual Conditions?The question of whether to assume the usual conditions is a vexed one. Most bookseither do not comment the problem or choose to assume the usual conditions withoutstating why. We have assumed the usual conditions in this text. We will now explainthis choice. We will also describe how we could have avoided having to make thisassumption, <strong>and</strong> we will discuss general pros <strong>and</strong> cons of the usual conditions.A review of the usual conditions. We begin by reiterating the meaning of theusual conditions <strong>and</strong> reviewing some opinions expressed about the usual conditionsin the literature. Recall that <strong>for</strong> a filtered probability space (Ω, F, P, F t ), the usualconditions are conditions states that:


118 The Itô Integral1. (Ω, F, P ) is a complete measure space.2. The filtration is right-continuous, F t = ∩ s>t F s .3. For t ≥ 0, F t contains all P -null sets in F.The books Rogers & Williams (2000a) <strong>and</strong> Protter (2005) both have sections discussingthe effects <strong>and</strong> the reasonableness of assuming the usual conditions.In Rogers & Williams (2000a), Section II.76, the right-continuity of the filtration istaken as a natural choice. The authors focus on why the completion of the filtration isto be introduced. The main argument is that this is necessary <strong>for</strong> the Début Theorem<strong>and</strong> Section Theorem of Section II.77 to hold. Of these two, the Début Theorem isthe easiest to underst<strong>and</strong>, simply stating that <strong>for</strong> any progressive process X <strong>and</strong> Borelset B, the variable D B = inf{t ≥ 0|X ∈ B} is a stopping time. This is not necessarilytrue if the filtration is not completed. The reason <strong>for</strong> this is illustrated in the proofof Lemma 75.1, where it is explained how the process can approach B in uncountablymany ways, which causes problems with measurability.In Protter (2005), Section 1.5, the completion of the filtration is instead seen as anatural choice, <strong>and</strong> the author focuses on why the right-continuity should be included.He does this by showing that the completed filtration of any cadlag Feller process isautomatically right-continuous, so when assuming completion, right-continuity comesnaturally.While enlightening, none of these discussions provide reasons <strong>for</strong> us to assume theusual filtrations. Our reasons <strong>for</strong> assuming the usual conditions are much more downto-earth<strong>and</strong> actually motivated from mostly practical concerns. Be<strong>for</strong>e discussingupsides <strong>and</strong> downsides from assuming the usual conditions in our case, we make somegeneral observations regarding the effects of the usual conditions.Regularity of paths <strong>and</strong> the usual conditions. We take as starting point thetopic of cadlag versions of martingales. We claim that1. Without any assumptions on the filtration, no cadlag versions are guaranteed.2. With right-continuity, almost sure cadlag paths can be obtained.3. With right-continuity <strong>and</strong> completeness, cadlag paths <strong>for</strong> all ω can be obtained.


3.9 Why The Usual Conditions? 119The first claim is basically unfounded, as we have no counterexample to show that thereare martingales where no cadlag versions exists. However, there are no theorems in theliterature to be found giving results on general filtrations, <strong>and</strong> from the structure ofthe proofs <strong>for</strong> right-continuous filtrations, it seems highly unlikely that cadlag versionsshould exist <strong>for</strong> the general case.To underst<strong>and</strong> what is going on in the two other cases, we follow Rogers & Williams(2000a), sections II.62 through II.67. We work in a probability space (Ω, F, P, F t )where we assume that the filtration is right-continuous. Let x : [0, ∞) → R be somemapping. We define lim q↓↓t x q as the limit of x q as q converges downwards to t throughthe rationals. Likewise, we define lim q↑↑t x q as the limit of x q as q converges upwardsto t through the rationals. We say that x is regularisable if lim q↓↓t x q exists <strong>for</strong> allt ≥ 0 <strong>and</strong> lim q↑↑t x q exists <strong>for</strong> all t > 0. As described in Rogers & Williams (2000a),Theorem II.62.13, if x is regularisable, there exists another function y such that y iscadlag <strong>and</strong> y agrees with x on [0, ∞) ∩ Q. As shown in Sokol (2007), Lemma 2.73,being regularisable is actually both a sufficient <strong>and</strong> necessary condition <strong>for</strong> this to hold.y is explicitly given by y t = lim q↓↓t x t .Now let M be a martingale <strong>and</strong> defineG = {ω ∈ Ω|M(ω) is regularisable}.By Rogers & Williams (2000a), Theorem II.65.1, G is then F-measurable <strong>and</strong> it holdsthat P (G) = 1. We define{lim q↓↓t M t (ω) if the limit exists.Y t (ω) =0 otherwise.By a modification of the argument in Rogers & Williams (2000a), Theorem II.67.7, ifthe filtration is right-continuous, Y is then a version of M. Note thatG ⊆ (Y t = limq↓↓tM t ∀ t ≥ 0),<strong>and</strong> there<strong>for</strong>e Y is almost surely cadlag. Since the filtration is right-continuous, itis clear that Y is adapted, so Y is an F t martingale. Thus, under the assumption ofright-continuity, we have been able to use Rogers & Williams (2000a) Theorem II.67.7to obtain a version Y of M which is a martingale <strong>and</strong> almost surely cadlag.We will now consider what can be done to make sure that all paths of our version of Mare cadlag. An obvious idea would be to put Y ′ = 1 G Y . This corresponds to letting Y ′be equal to Y on G, where Y is guaranteed to be cadlag, <strong>and</strong> zero otherwise. All paths


120 The Itô Integralof Y ′ are cadlag. However, since we have no guarantee that G ∈ F t , Y ′ is no longeradapted. This is what ruins our ability to obtain all paths cadlag in the uncompletedcase: the set of non-cadlag paths does not interact well with the filtration.However, if we assume that F t contains all P -null sets, we can use the fact thatP (G) = 1 to see that Y ′ is adapted. Thus, in this case, we can obtain a version Y ′ ofM which is a martingale <strong>and</strong> has all paths cadlag.We have now argued <strong>for</strong> the three claims we made earlier. A general moral of ourarguments is that if X is an adapted process, then we can change X on a null set <strong>and</strong>retain adaptedness if the filtration contains all null sets.Upsides <strong>and</strong> downsides of the usual conditions. We now discuss reasons <strong>for</strong>assuming or not assuming the usual conditions.The upsides of the usual conditions are:1. Path properties hold <strong>for</strong> all ω, so there are fewer null sets to worry about.2. Proofs of adaptedness are occasionally simpler.3. Most literature assumes the usual conditions, making it easier to refer results.The downsides of the usual conditions are:1. We need to spend time checking that, say, the Brownian motion can be taken tobe with respect to a filtration satisfying the usual conditions.2. Density proofs are occasionally harder.To see how the usual conditions can make density proofs harder, look at the proofof Lemma 3.7.5. To see how proofs of adaptedness can be harder without the usualconditions, we go back to the construction seen earlier, Y t (ω) = lim q↓↓t M t (ω) wheneverthe limit exists <strong>and</strong> zero otherwise. Had we assumed the usual conditions, wecould just define the entire trajectory of Y on some almost sure set <strong>and</strong> putting it tozero otherwise. This phenomenon would also cause some trouble when defining thestochastic integral <strong>for</strong> integrators of finite variation.


3.10 Notes 121As null sets really are a frustrating aspect of the entire theory, adding almost no realvalue but constantly complicating proofs, we have chosen to offer a bit of extra ef<strong>for</strong>tto introduce the usual conditions, in return obtaining fewer null sets to worry about.We could have developed the entire theory without the usual conditions, but then allour stochastic integrals <strong>and</strong> other processes would only be continuous almost surely.Some would say that another downside of the usual conditions is that the filtrationloses some of its interpretation as “carrier of in<strong>for</strong>mation”, in the sense that F t isusually interpreted as “the in<strong>for</strong>mation available at time t”. We do not find thisargument to be particularly weighty. Obviously, the null sets N contain sets from allof F, in particular sets from F t <strong>for</strong> any t ≥ 0. But these are sets of probability zero. Itwould there<strong>for</strong>e seem that the in<strong>for</strong>mation value in these sets is quite minimal. Sincecompleting the filtration does not have any negative impact on actual modeling issues,it seems unreasonable to maintain that the interpretation of the filtration is reduced.There do seem to be situations, though, where the usual conditions should not beassumed. When considering the martingale problem, Jacod & Shiryaev (1987) inSection III.1 specifically notes that they are considering filtrations that do not necessarilysatisfy the usual conditions. However, Rogers & Williams (2000b) also considersthe martingale problem in Section V.19, even though they have only developedthe stochatic integral under the usual conditions, <strong>and</strong> seem to get through with this.Changes of measure can also present situations where the usual conditions are inappropriate,see the comments in Rogers & Williams (2000b), Section IV.38, page82.3.10 NotesIn this section, we will review other accounts of the theory in the literature, <strong>and</strong> wewill give references to extensions of the theory. We will also compare our methods tothose found in other books <strong>and</strong> discuss the merits <strong>and</strong> demerits of our methods.Other accounts of the theory of stochastic integration. There are many excellentsources <strong>for</strong> the theory of stochastic integration. We will consider the approachesto the stochastic integral taken in Øksendal (2005), Rogers & Williams (2000b),Protter (2005) <strong>and</strong> Karatzas & Shreve (1988).


122 The Itô IntegralIn Øksendal (2005), the integrators are resticted to processes of the type given by∫ t0 Y t dt+ ∫ t0 Y t dW t . The integral is developed over compact intevals <strong>and</strong> the viewpointis more variable-oriented <strong>and</strong> less process-oriented. The level of rigor is lower, but thetext is easier to comprehend <strong>and</strong> gives a good overview of the theory.In Rogers & Williams (2000b), a very systematic development of the integral is made,resulting in a large generality of integrators. As integrators, the authors first takeelements of cM 2 0. They develop the integral <strong>for</strong> integr<strong>and</strong>s in bE, <strong>and</strong> then extendto the space of locally bounded predictable integr<strong>and</strong>s, L 0 (bP). Here, the predictableσ-algebra the σ-algebra on (0, ∞) × Ω generated by the left-continuous processes, orequivalently, generated by bE. This σ-algebra is strictly smaller than Σ π , <strong>and</strong> thusthe integr<strong>and</strong>s of Rogers & Williams (2000b) are less general than ours. However, theextension method is simpler, based on the monotone class theorem. Their greater rangeof integrators comes at the cost of having to develop the quadratic variation in greaterdetail. After constructing the integral <strong>for</strong> integrators in cM 2 0, they later develop amuch more general theory, where the integrators are semimartingales, meaning thesum of a local martingale <strong>and</strong> a finite variation process.In Protter (2005), a kind of converse viewpoint to other books is taken. Instead ofdeveloping the integral <strong>for</strong> more <strong>and</strong> more general processes, the author defines a semimartingaleas a process X making the integral mapping with respect to X continuousunder suitable topologies. In the first part of the book, the integral is developed asa mapping from left-continuous processes to right-continuous processes. In time, itis shown that this integral leads to the same result as the conventional approach. Inlater parts of the book, the space of integr<strong>and</strong>s are extended to predictable processes.Finally, in Karatzas & Shreve (1988), the integral is developed in the usual straight<strong>for</strong>wardmanner as in Rogers & Williams (2000b), but here, only continuous integratorsare considered. As in Rogers & Williams (2000b), the authors begin with processes incM 2 0 as integrators, but in this case, the space of integr<strong>and</strong>s is larger than in Rogers& Williams (2000b). The account is very detailed <strong>and</strong> highly recommended.Extensions of the theory. As noted above, the theory we have presented can beextended considerably. Rogers & Williams (2000b), Protter (2005) <strong>and</strong> Karatzas& Shreve (1988) all consider more general integrators than we do here. A verypedagogical source <strong>for</strong> the general theory is Elliott (1982), though the number ofmisprints can be frustrating.


3.10 Notes 123The idea of extending the stochastic integral begs the question of whether resultssuch as Itô’s representation theorem <strong>and</strong> Girsanov’s theorem also can be extended tomore general situations. To a large extent, the answer is yes. For extensions of therepresentation theorem, see Section III.4 of Jacod & Shiryaev (1987) or Section IV.3of Protter (2005). Extensions of the Girsanov theorem can be found in Section III.3of Jacod & Shiryaev (1987), Section III.8 of Protter (2005), <strong>and</strong> also in Rogers &Williams (2000b) <strong>and</strong> Karatzas & Shreve (1988). One immediate extension is theopportunity to prove Girsanov’s theorem in a version where P <strong>and</strong> Q are not equivalenton F ∞ , but only on F t <strong>for</strong> all t ≥ 0. Having proved this extension would have madeour applications to mathematical finance easier. Alas, this realization came too late.The search <strong>for</strong> results like Kazamaki’s or Novikov’s criteria guaranteeing that an exponentialmartingale is a uni<strong>for</strong>mly integrable martingale can reasonably be said tobe an active field of research. See Protter & Shimbo (2006) <strong>for</strong> recent results <strong>and</strong>further references. For an example of a situation where Kazamaki’s criterion appliesbut Novikov’s criterion does not, see Revuz & Yor (1998), page 336.Discussion of the present account. The account of the theory developed here hasthe same level of ambition as, say, Øksendal (2005) or Steele (2000), but with ahigher level of rigor. It is an attempt to combine the methods of Steele (2000) withthe stringency of Rogers & Williams (2000b) <strong>and</strong> Karatzas & Shreve (1988).The restriction to continuous integrators simplifies the theory immensely, more or lessto a point beyond comparison. One consequence of this restriction is that we can takeour integr<strong>and</strong>s to be progressively measurable instead of predictably measurable.Not only is progressive measurability less restrictive than predictable measurability,but it also removes some of the trouble with the timepoint zero. Traditionally, therole of zero has not been well-liked. In Dellacherie & Meyer (1975), Section IV.61, theauthors claim that zero plays “the devil’s role”, <strong>and</strong> in the introduction to ChapterIV of Rogers & Williams (2000b), it is “consigned to hell”. These problems verymuch has something to do with the predictable σ-algebra, this is most obvious inRogers & Williams (2000b) where the predictable σ-algebra is defined as a σ-algebraon (0, ∞) × Ω instead of [0, ∞) × Ω. Even in our case, however, there are someconsiderations to be made. The fundamental problem with zero is that the intuitiverole of an integrator X should only depend on differences X t − X s , interpreted asthe measure assigned to (s, t]. The existence of a value at zero is there<strong>for</strong>e in factsomething of a nuisance. In our definition of the integral, we have chosen to assign


124 The Itô Integralany non-zero value at zero of the integrator X to the finite-variation part of X <strong>and</strong>thus deferred the problems to results from ordinary measure theory.In fact, a good deal of our results can be extended to integrators which are not evenprogressively measurable. This is due to a result, the Chung-Doob-Meyer theorem,stating that <strong>for</strong> any measurable <strong>and</strong> adapted process, there exists a progressive versionof that process. This would enable us to define the stochastic integral with respect to allmeasurable <strong>and</strong> adapted processes satisfying appropriate integrability conditions. Theoriginal proof of the Chung-Doob-Meyer theorem was quite difficult <strong>and</strong> can be foundin Dellacherie & Meyer (1975), Section 30 of Chapter IV, page 99. Recently, a muchsimpler proof has surfaced, see Kaden & Potthoff (2004). Because we have assumedthe usual conditions, we could easily have adapted our results to consider meaurable<strong>and</strong> adapted integr<strong>and</strong>s. Without the usual conditions, however, the integration ofmeasurable <strong>and</strong> adapted processes causes problems with adaptedness. In essence,progressive meaurability is the “right” measurability concept <strong>for</strong> dealing with pathwiseLebesgue integration.Now, as stated in the introduction, part of the purpose of the account given here wasto explore what it would be like to develop the simple theory of stochastic integrationwith respect to Brownian integrators, just as in Steele (2000), but with complete rigor<strong>and</strong> in a self-contained manner. The reason to do so is mainly to avoid the trouble ofhaving to prove the existence of the quadratic variation process <strong>for</strong> general continuouslocal martingales. This could potentially allow <strong>for</strong> a very accesible presentation of thebasic theory of stochastic integration, which covers much of the results necessary <strong>for</strong>,say, basic mathematical finance. However, as we have seen, the lower level of generalityhas its price. The main demerits are:1. The lack of general martingales makes the theory less elegant.2. The development of the integral through associativity in Section 3.3 is tedious.The need to consider n-dimensional Brownian motion <strong>and</strong> corresponding n-dimensional integr<strong>and</strong>s makes the notation cumbersome.3. The proof of Girsanov’s theorem of Section 3.8 is very lengthy.The first two demerits more or less speak <strong>for</strong> themselves. The third demerit deservessome comments. In Øksendal (2005), the integral is also only developed <strong>for</strong> Brownianintegrators, but the proof of the Girsanov theorem given there is considerably shorter.


3.10 Notes 125This is because the author takes the Lévy characterisation of Brownian motion asgiven, <strong>and</strong> thus his account is not self-contained.The modern proof of the Lévy characterisation theorem uses a more general theoryof stochastic integration than presented here, see Theorem 33.1 of Rogers & Williams(2000b). There is an older proof, however, not based on stochastic integration theory,but it is much more cumbersome. A one-dimensional proof is given in Doob (1953).If we were to base our proof of the Girsanov theorem on the Lévy characterisationtheorem, our account would become even longer than it is at present. The proofwe have given is based on the proof in Steele (2000) <strong>for</strong> the case where the driftis bounded. Steele (2000) claims on page 224 that the proof <strong>for</strong> the bounded caseimmediately can be extended to the general case. It is quite unclear how this shouldbe possible, however, since it would require the result that if E(M) is a martingale <strong>and</strong>f is bounded <strong>and</strong> deterministic, then E(M + f · W ) is a martingale as well. Our resultextends the method in Steele (2000) by a localisation argument, <strong>and</strong> it is to a largedegree this localisation which encumbers the proof.If we had the theory of stochastic integration <strong>for</strong> general continuous local martingalesat our disposal, we could have used the arguments of Rogers & Williams (2000b),Section IV.33 <strong>and</strong> Section IV.38, to give short <strong>and</strong> elegant proofs of both the Lévycharacterisation theorem <strong>and</strong> the Girsanov theorem.Our conclusion is that when a high level of rigor is required, it is quite doubtful thatthere is any use in developing the theory only <strong>for</strong> the case of Brownian motion insteadof considering general continuous local martingales. A theory at the level of Karatzas& Shreve (1988) would there<strong>for</strong>e be preferable as introductory, with works such asØksendal (2005) <strong>and</strong> Steele (2000) being useful <strong>for</strong> obtaining a heuristic overview ofthe theory.


126 The Itô Integral


Chapter 4The Malliavin CalculusIn this section, we will define the Malliavin derivative <strong>and</strong> prove its basic properties.The Malliavin derivative <strong>and</strong> the theory surrounding it is collectively known as theMalliavin calculus. The Malliavin derivative is an attempt to define a derivative ofa stochastic variable X in the direction of the underlying probability space Ω. Obviously,this makes no sense in the classical sense of differentiability unless Ω has sufficientstructure. The Malliavin derivative is there<strong>for</strong>e defined <strong>for</strong> appropriate variables ongeneral probability spaces not by a limiting procedure as in the case of classical derivativeoperators, but instead by analogy with a situation where the underlying space hassufficient topological structure.We will not be able to develop the theory sufficiently to prove the results which are appliedto mathematical finance. There<strong>for</strong>e, the exposition given here is mostly designedto give a rigorous introduction to the fundamentals of the theory.Be<strong>for</strong>e beginning the development of the Malliavin calculus, we make some motivatingremarks on the definition of the derivative. We are interested in defining the derivativeon probability spaces (Ω, F, P ) with a one-dimensional Wiener process W on [0, T ],where the variables to be differentiated are in some suitable sense related to W . Withthis in mind, we first consider the case where our probability space is C 0 [0, T ], thespace of continuous real functions on [0, T ] starting at zero, endowed with the Wienermeasure P . C 0 [0, T ] is a Banach space under the uni<strong>for</strong>m norm. Let W be the identitymapping, W is then a Wiener process on [0, T ]. We want to identify a differentiability


128 The Malliavin Calculusconcept <strong>for</strong> variables of the <strong>for</strong>m ω ↦→ W t (ω) which can be generalized to the setting ofan abstract probability space with a Wiener process. To this end, consider a mappingX from C 0 [0, T ] to R. Immediately available differentiability concepts <strong>for</strong> the mappingare:1. The Frechét derivative: The Frechét derivative of X at ω is the linear operatorA ω : C 0 [0, T ] → R such that lim h→0‖X(ω+h)−X(ω)−A ω (h)‖‖h‖= 0.2. The Gâteaux directional derivatives: The derivative in direction h ∈ C 0 [0, T ] atX(ω+εh)−X(ω)ω is the element D h X(ω) ∈ R such that lim ε→0 ε= D h X(ω).Here, the Frechét derivative can obviously be ruled out as a c<strong>and</strong>idate <strong>for</strong> generalizationbecause of the direct dependence of the derivative on the linear structure ofC 0 [0, T ]. Considering the Gâteaux derivative, we have that <strong>for</strong> h ∈ C 0 [0, T ], theGateaux derivative of W t in direction h isW t (ω + εh) − W t (ω) ω t + εh t − ω tD h W t (ω) = lim= lim= h tε→0 εε→0 εThis means that this derivative is not amenable to direct generalization either, becausethe derivative is an element of the underlying space depending directly of its propertiesas a function space. Consider, however, the case where h t = ∫ tg(t) dt <strong>for</strong> some0g ∈ L 1 [0, T ]. Then,D h W t (ω) = h t =∫ T01 [0,t] (s)g(s) ds.In this case, then, the Gâteaux derivative <strong>for</strong> any ω ∈ C 0 [0, T ] is actually characterizedby the mapping 1 [0,t] from [0, T ] to R. We can there<strong>for</strong>e consider 1 [0,t] as a kind ofderivative of W t . Extending this concept to general mappings X : C 0 [0, T ] → R, wecan say that if there exists a mapping f : [0, T ] → R such that <strong>for</strong> h ∈ C 0 [0, T ] withh(t) = ∫ t0 g(s) ds, it holds that D hX(ω) = ∫ Tf(s)g(s) ds, then f is the derivative of0X, <strong>and</strong> we define D F X = f.To interpret this derivative, fix y ∈ [0, T ] <strong>and</strong> consider the case h ε (t) = ∫ t0 ψ ε(s−y) ds,where (ψ ε ) is a Dirac family, see Appendix A. This means that h ε is 0 almost all theway on [0, y] <strong>and</strong> 1 almost all the way on [y, T ]. Then,D hε X(ω) =∫ T0D F X(s)ψ ε (s − y) ds ≈ D F X(y).In other words, D F X(y) can be thought of as measuring the sensitivity of X to parallelshifts of the argument ω of the <strong>for</strong>m 1 [y,T ] .


4.1 The spaces S <strong>and</strong> L p (Π) 129Led by the above ideas, consider now a general probability triple (Ω, F, P ) with aWiener process W on [0, T ]. By F T , we denote the usual augmentation of the σ-algebra generated by W . We want to define the Malliavin derivative of W t as themapping DW t = 1 [0,t] from [0, T ] to R. Since this derivative in the abstract setting isnot defined by a limiting procedure, but only by analogy to the case of C 0 [0, T ], weneed to impose requirements on the D to extend it meaningfully to other variablesthan W t . A suitable starting point is to require that a chain rule holds <strong>for</strong> D: Iff is continuously differentiable with sufficient growth conditions, namely polynomialgrowth of the mapping itself <strong>and</strong> its partial derivatives, we would like thatDf(W t1 , . . . , W tn ) =n∑k=1∂f∂x k(W t1 , . . . , W tn )DW tk ,where we put DW tk = 1 [0,tk ]. It turns out that to require this <strong>for</strong>m of the derivative off(W t1 , . . . , W tn ) is sufficient to obtain a rich theory <strong>for</strong> the operator D. The derivativegiven above is a stochastic function from [0, T ] to R. With sufficient growth conditionson f, it is an element of L p ([0, T ] × Ω), p ≥ 1.Our immediate goal is to <strong>for</strong>malize the definition of the derivative given above <strong>and</strong>consider its properties. To do so, we begin by accurately specifying the immediatedomain of the derivative operator <strong>and</strong> considering its properties. After this, we willrigorously define the derivative operator.4.1 The spaces S <strong>and</strong> L p (Π)We will now define the space S which will be the initial domain of the Malliavinderivative. We work in the context of a filtered probability space (Ω, F, P, F t ) with aone-dimensional Brownian motion W , where the filtration is the augmented filtrationinduced by the Brownian motion. A multi-index of order n is an element α ∈ N n 0 . Thedegree of the multi-index is |α| = ∑ nk=1 α k. A polynomial in n variables of degree kis a map p : R n → R, p(x) = ∑ |α|≤k a αx α . The sum in the above runs over all multiindiciesα with |α| ≤ k, with x α = ∏ ni=1 xα ii . The space of polynomials of degree k inany number of variables is denoted P k . Furthermore, we use the following notation.• C 1 (R n ) are the mappings f : R n → R which are continuously differentiable.• C ∞ (R n ) are the mappings f : R n → R which are differentiable infinitely often.


130 The Malliavin Calculus• C ∞ p (R n ) are the elements f ∈ C ∞ (R n ) such that f <strong>and</strong> its partial derivativesare dominated by polynomials.• C ∞ c(R n ) are the elements of C ∞ (R) with compact support.Clearly, then Cc∞ (R n ) ⊆ Cp ∞ (R n ) ⊆ C ∞ (R n ) ⊆ C 1 (R n ). For basic results on thesespaces, see appendix A. We are now ready to define the space S.Definition 4.1.1. If t ∈ [0, T ] n , we write W t = (W t1 , . . . , W tn ). By S, we then denotethe space of variables f(W t ), where f ∈ C ∞ p (R n ) <strong>and</strong> t ∈ [0, T ] n .Our aim in the following regarding S is to develop a canonical representation <strong>for</strong>elements of S, develop results to yield flexible representations of elements in S <strong>and</strong>to prove that S is an algebra which is dense in L p (F T ), p ≥ 1. These results aresomewhat tedious, but they are necessary <strong>for</strong> what will follow.Lemma 4.1.2. Let t ∈ [0, T ] n , s ∈ [0, T ] m <strong>and</strong> F ∈ S with F = f(W t ). Assume that{t 1 , . . . , t n } ⊆ {s 1 , . . . , s m }. There exists g ∈ C ∞ p (R m ) such that F = g(W s ).Comment 4.1.3 Note that this lemma not only allows us to extend the coordinateswhich F depends on, but also allows us to reorder the coordinates.◦Proof. Since {t 1 , . . . , t n } ⊆ {s 1 , . . . , s m }, we have n ≤ m <strong>and</strong> there exists a mappingσ : {1, . . . , n} → {1, . . . , m} such that t k = s σ(k) <strong>for</strong> k ≤ n. We then simply defineg(x 1 , . . . , x m ) = f(x σ1 , . . . , x σn ). Then g ∈ C ∞ p (R m ) <strong>and</strong>g(W s ) = g(W s1 , . . . , W sm )= f(W sσ1 , . . . , W sσ n )= f(W t1 , . . . , W tn )= f(W t )Lemma 4.1.4. Let F ∈ S with F = f(W t ) <strong>for</strong> f ∈ C ∞ p (R n ). If t 1 = 0, there existsg ∈ C ∞ p (R n−1 ) such that F = g(W t2 , . . . , W tn ).Proof. Define g(x 2 , . . . , x n ) = f(0, x 2 , . . . , x n ). Because we have W 0 = 0, it holds thatF = f(W t1 , . . . , W tn ) = g(W t2 , . . . , W tn ).


4.1 The spaces S <strong>and</strong> L p (Π) 131Lemma 4.1.5. Let F ∈ S with F = f(W t ) <strong>for</strong> f ∈ C ∞ p (R n ). If t i = t j <strong>for</strong> some i ≠ j,there is g ∈ C ∞ p (R n−1 ) such that F = g(W u ), where u = (t 1 , . . . , t j−1 , t j+1 . . . , t n ).Proof. Define A : R n−1 → R n by Ax = (x 1 , . . . , x j−1 , x i , x j+1 , . . . , x n ). Then A islinear. Putting g(x) = f(Ax), then g ∈ C ∞ p (R n ). Clearly, AW u = W t <strong>and</strong> there<strong>for</strong>eF = f(W t ) = f(AW u ) = g(W u ), as desired.Corollary 4.1.6. Let F ∈ S. There exists t ∈ [0, T ] n such that 0 < t 1 < · · · < t n <strong>and</strong>f ∈ C ∞ p (R n ) such that F = f(W t ).Proof. This follows by combining Lemma 4.1.2, Lemma 4.1.4 <strong>and</strong> Lemma 4.1.5.Corollary 4.1.6 states that any element of S has a representation where we can assumethat all the coordinates of the Wiener process in the element are different, positive<strong>and</strong> ordered. This observation will make our lives a good deal easier in the following.Our next result shows that any element of S can be written as a trans<strong>for</strong>mation of ann-dimensional st<strong>and</strong>ard normal variable.Lemma 4.1.7. Let F ∈ S with F = f(W t ) <strong>for</strong> f ∈ Cp ∞ (R n ), where 0 < t 1 < · · · < t n .Define A : R n → R n by Ax = ( √ x1t1, √ x2−x1t2−t 1, . . . , √ xn−xn−1tn −t n−1). Then A is invertible,f ◦ A −1 ∈ Cp ∞ (R n ) <strong>and</strong> F = (f ◦ A −1 )(AW t ), where AW t has the st<strong>and</strong>ard normaldistribution in n dimensions.Proof. Put t 0 = 0. A is invertible with inverse given by (A −1 x) k = ∑ ki=0√tk − t k−1 x k .Since A −1 is linear, f ◦ A −1 ∈ C ∞ p (R n ). It is clear that AW t is st<strong>and</strong>ard normal.We are now done with our results on representations of elements of S. Be<strong>for</strong>e proceeding,we prove that S is an algebra which is dense in L p (F T ).Lemma 4.1.8. S is an algebra, <strong>and</strong> S ⊆ L p (Ω) <strong>for</strong> all p ≥ 1.Proof. S is an algebra. S is a space of real r<strong>and</strong>om variables. Since S is obviouslynonempty, to prove the first claim, we need to show that S is stable under linearcombinations <strong>and</strong> multiplication. Let F, G ∈ S with F = f(W t ) <strong>and</strong> G = g(W s ) wheref ∈ C ∞ p (R n ) <strong>and</strong> g ∈ C ∞ p (R m ). Let λ, µ ∈ R. Then,λF + µG = (λf ⊕ µg)(W t1 , . . . , W tn , W s1 , . . . , W sm ),


132 The Malliavin Calculuswhere λf ⊕ µg ∈ C ∞ p (R n+m ). In the same manner,F G = (f ⊗ g)(W t1 , . . . , W tn , W s1 , . . . , W sm ),where, f ⊗ g ∈ C ∞ p (R n+m ). This shows that S is an algebra.S is a subset of L p (Ω). Next, we will show that S ⊆ L p (Ω). Let F ∈ S be suchthat F = f(W t ). By Corollary 4.1.6, we can assume 0 < t 1 < · · · < t n . By Lemma4.1.7, there then exists g ∈ C ∞ p (R n ) such that F = g(X) where X is n-dimensionallyst<strong>and</strong>ard normally distributed. Since g ∈ C ∞ p (R n ), |g| p ∈ C ∞ p (R n ). There<strong>for</strong>e, thereis a multinomial q in n variables with |g| p ≤ q, say q(x) = ∑ |α|≤m a αx α . We then findE|F | p =≤≤∫∫∫= ∑|g(x)| p φ n (x) dxq(x)φ n (x) dx∑n∏|α|≤m k=1|α|≤m k=1n∏∫a αa α x α kkφ n(x) dxx α kk φ(x k) dx k ,which is finite, since the normal distribution has moments of all orders.Lemma 4.1.9. The variables of the <strong>for</strong>m f(W t ) <strong>for</strong> f ∈ C ∞ c (R n ) are dense in L p (F T )<strong>for</strong> p ≥ 1.Proof. As in the proof of Lemma 3.7.5, we find that with G T the σ-algebra generated byW , F T ⊆ σ(G T , N ), so it will suffice to show that the variables given in the statementof the lemma is dense in L 2 (G T ).We first prove that any X ∈ L p (G T ) can be approximated by variables dependentonly on finitely many coordinates of the Wiener process. Let X ∈ L p (G T ) be given<strong>and</strong> let {t k } k≥1 be dense in [0, T ] <strong>and</strong> let H n = σ(W t1 , . . . , W tn ). Then H n is an increasingfamily of σ-algebras, <strong>and</strong> G T = σ(H n ) n≥1 . There<strong>for</strong>e, E(X|H n ) is a discretetimemartingale with limit X, convergent in L p . By the Doob-Dynkin lemma B.6.7,E(X|H n ) = g(W t1 , . . . , W tn ) <strong>for</strong> some measurable g : R n → R.Now consider X ∈ L p (G T ) with X = g(W t ) <strong>for</strong> some measurable g : R n → R <strong>and</strong>t ∈ [0, T ] n . Let µ be the distribution of W t . Since µ is bounded, µ is a Radon measure


4.1 The spaces S <strong>and</strong> L p (Π) 133on R n , <strong>and</strong> g ∈ L p (µ) by assumption. By Theorem A.4.5, g can be approximated inL p (µ) by mappings (g n ) ⊆ Cc∞ (R n ). We then obtainshowing the claim of the lemma.lim ‖g(W t ) − g n (W t )‖ p = lim ‖g − g n ‖ L p (µ) = 0,Corollary 4.1.10. The space S is dense in L p (F T ) <strong>for</strong> any p ≥ 1.Proof. This follows immediately from Lemma 4.1.9, since C ∞ c (R n ) ⊆ C ∞ p (R n ).This concludes our basic results on S, which we will use in the following sections whendeveloping the Malliavin derivative. The important points of our work is this:• By Corollary 4.1.6, we can choose a particularly nice <strong>for</strong>m of any F ∈ S, namelythat F = f(W t ) where 0 < t 1 < · · · < t n . Also, by Lemma 4.1.2 we can extendto more coordinates when we wish. When working with more than one elementof S, this will allow us to assume that they depend on the same underlyingcoordinates of the Wiener process.• By combining Corollary 4.1.6 <strong>and</strong> Lemma 4.1.7, we can always write any F ∈ Sas a Cp∞ trans<strong>for</strong>mation of a st<strong>and</strong>ard normal variable.• By Lemma 4.1.8 <strong>and</strong> Corollary 4.1.10, S is an algebra, dense in L p (F T ) <strong>for</strong> anyp ≥ 1.Be<strong>for</strong>e we proceed to the next section, we show some basic results on the spaceL p ([0, T ] × Ω, B[0, T ] ⊗ F T , λ ⊗ P ). We will use the shorth<strong>and</strong>s Π = B[0, T ] ⊗ F T<strong>and</strong> η = λ ⊗ P <strong>and</strong> write L p (Π) <strong>for</strong> L p ([0, T ] × Ω, B[0, T ] ⊗ F T , λ ⊗ P ). When p = 2,we denote the inner product on L 2 (Π) by 〈·, ·〉 Π .Lemma 4.1.11. The variables of the <strong>for</strong>m f(W t ) <strong>for</strong> f ∈ C ∞ c (R n ) generate F T .Proof. This follows immediately from Lemma 4.1.9, since this lemma yields that <strong>for</strong>any A ∈ F T , 1 A can be approximated pointwise almost surely with elements of the<strong>for</strong>m f(W t ), where f ∈ C ∞ c (R n ).Lemma 4.1.12. The functions 1 [s,t] ⊗ f(W u ), where 0 ≤ s ≤ t ≤ T , u ∈ [0, T ] n <strong>and</strong>f ∈ C ∞ c (R n ) is stable under multiplication <strong>and</strong> generates Π.


134 The Malliavin CalculusProof. Clearly, the class is stable under multiplication. We show the statement aboutits generated σ-algebra. We will apply Lemma B.6.4. Let E be the class of mappingsf(W u ) where f ∈ Cc∞ (R n ) <strong>and</strong> u ∈ [0, T ] n . Let K be the class of functions 1 [s,t] <strong>for</strong>0 ≤ s ≤ t ≤ T .By Lemma 4.1.11, σ(E) = F T . Since f(W 0 ) is constant <strong>for</strong> any f ∈ Cc∞ (R), E containsthe constant mappings. Also, clearly σ(K) = B[0, T ], <strong>and</strong> clearly the mapping whichis constant 1 on [0, T ] is in K. The hypotheses of Lemma B.6.4 are then fulfilled, <strong>and</strong>the conclusion follows.4.2 The Malliavin Derivative on S <strong>and</strong> D 1,pWith the results of Section 4.1, we are ready to define <strong>and</strong> develop the Malliavinderivative. Letting F ∈ S with F = f(W t ), where f ∈ C ∞ p (R n ), we want to putDF =n∑k=1∂f∂x k(W t )1 [0,tk ].We begin by showing that this is well-defined, in the sense that if we have differentrepresentations f(W t ) = g(W s ), the derivative yields the same result.Lemma 4.2.1. Let F ∈ S, <strong>and</strong> assume that we have two representations,F = f 1 (W t )F = f 2 (W s ),where f 1 ∈ C ∞ p (R n ) <strong>and</strong> f 2 ∈ C ∞ p (R m ). It then holds thatn∑k=1∂f 1∂x k(W t )1 [0,tk ] =m∑k=1∂f 2∂x k(W s )1 [0,sk ].Proof. We proceed in two steps, first showing the equality <strong>for</strong> particularly simplerepresentations of F <strong>and</strong> then proceeding to the general case.Step 1. First assume that F = f 1 (W t ) = f 2 (W t ) <strong>for</strong> some f 1 , f 2 ∈ Cp ∞ (R n ), where0 < t 1 < · · · < t n . Define A : R n → R n by Ax = ( √ x 1t1, √t2−t x 2−x 11, . . . , x n−x n−1√tn−t n−1).By Lemma 4.1.7, A is invertible, <strong>and</strong> AW t has the st<strong>and</strong>ard normal distribution inn dimensions. Put g 1 = f 1 ◦ A −1 <strong>and</strong> g 2 = f 2 ◦ A −1 . Then g 1 (AW t ) = g 2 (AW t ).


4.2 The Malliavin Derivative on S <strong>and</strong> D 1,p 135This implies that g 1 <strong>and</strong> g 2 are equal almost surely with respect to the n-dimensionalst<strong>and</strong>ard normal measure. Since this measure has a positive density with respect tothe Lebesgue measure, this means that g 1 <strong>and</strong> g 2 are equal Lebesgue almost surely.Since both mappings are continuous, we conclude g 1 = g 2 . Because f 1 = g 1 ◦ A <strong>and</strong>f 2 = g 2 ◦ A, this implies f 1 = f 2 <strong>and</strong> there<strong>for</strong>eas desired.n∑k=1∂f 1∂x k(W t )1 [0,tk ] =n∑k=1∂f 2∂x k(W t )1 [0,tk ],Step 2. Now assume F = f 1 (W t ) <strong>and</strong> F = f 2 (W s ). Defining u = (t 1 , . . . , t n , s 1 , . . . , s m )<strong>and</strong> puttingg 1 (x 1 , . . . , x n+m ) = f 1 (x 1 , . . . , x n )g 2 (x 1 , . . . , x n+m ) = f 2 (x n+1 , . . . , x m ),we obtain f 1 (W t ) = g 1 (W u ) <strong>and</strong> f 2 (W s ) = g 2 (W u ). By what we already have shown,There<strong>for</strong>e, we concluden+m∑k=1n+m∂g 1∑ ∂g 2(W u )1 [0,tk ] = (W u )1 [0,tk ].∂x k ∂x kk=1n∑k=1∂f 1∂x k(W t )1 [0,tk ] ==n∑k=1n+m∑k=1n+m∂g 1∑ ∂g 1(W u )1 [0,uk ] = (W u )1 [0,uk ]∂x k ∂x kk=1m∑∂g 2∂x k(W u )1 [0,uk ] =k=1∂f 2∂x k(W s )1 [0,sk ],showing the desired result.Definition 4.2.2. Let F ∈ S, with F = f(W t1 , . . . , W tn ). We define the Malliavinderivative of F as the stochastic process on [0, T ] given byn∑k=1∂f∂x k(W t1 , . . . , W tn )1 [0,tk ].Comment 4.2.3 The mapping defined above is well-defined by Lemma 4.2.1.◦We have now defined the Malliavin derivative D on S. Next, we will investigate itsbasic properties <strong>and</strong> extend it to a larger space.


136 The Malliavin CalculusLemma 4.2.4. The mapping D is linear <strong>and</strong> maps into L p (Π) <strong>for</strong> all p ≥ 1.Proof. Let F, G ∈ S with F = f(W t ) <strong>and</strong> G = g(W s ). Also, let λ, µ ∈ R. FromLemma 4.1.8 <strong>and</strong> its proof, we haveD(λF + µG) = D(λf ⊕ µg)(W t1 , . . . , W tn , W s1 , . . . , W sm )=n∑λ ∂fm∑(W t )1 [0,tk ] + µ ∂g (W s )1 [0,sk ]∂x k ∂x kk=1= λDF + µDG.This shows linearity. To show the second statement, assume F = f(W t ). Then,n∑ ∂fn∑‖DF ‖ p =(W∥t )1 [0,tk ]≤ T 1 p∂f∂x k ∥ ∥ (W t )∂x k∥ .pSince f ∈ Cp ∞ (R n ),lemma.∂f∂x kk=1pk=1k=1has polynomial growth, so ∂f∂x k(W t ) ∈ L p (F T ), proving theFrom Corollary 4.1.10, we see that we have defined the derivative on a dense subspaceof L p (F T ). Nonetheless, even dense sets can be quite slim, <strong>and</strong> we need to extend theoperator D to a larger space be<strong>for</strong>e it can be of any actual use. To do so, we showthat D is closable. We can then extend it by taking the closure. We first need a fewlemmas.Lemma 4.2.5. Let F ∈ S with F = f(W t ) <strong>for</strong> f ∈ C ∞ p (R n ), where t = (t 1 , . . . , t n )<strong>and</strong> 0 = t 0 < t 1 < · · · < t n . Let A : R n → R n be linear <strong>and</strong> invertible with matrix A ′ .Thenn∑k=1∂∂x kf(W t )1 [0,tk ] =n∑n∑k=1 i=1∂∂x i(f ◦ A −1 )(AW t )A ′ ik1 [0,tk ].Proof. This follows fromn∑ ∂f(W t )1 [0,tk ] =∂x kk=1==n∑k=1n∑k=1 i=1n∑∂∂x k(f ◦ A −1 ◦ A)(W t )1 [0,tk ]n∑n∑k=1 i=1where we have applied the chain rule.∂(f ◦ A −1 )(AW t ) ∂ A i (W t )1 [0,tk ]∂x i ∂x k∂∂x i(f ◦ A −1 )(AW t )A ′ ik1 [0,tk ],


4.2 The Malliavin Derivative on S <strong>and</strong> D 1,p 137Lemma 4.2.6. Let f ∈ Cp ∞ (R n ). Let φ n be the n-dimensional st<strong>and</strong>ard normaldensity. Then,∫∫∂f(x)φ n (x) dx =∂x kf(x)x k φ n (x) dx.Proof. First note that the integrals are well-defined, since φ n decays faster than anypolynomial. We begin by showing that it will suffice to prove the result in the casewhere k = 1, this will make the proof notationally easier. To this end, note that theLebesgue measure is invariant under orthonormal trans<strong>for</strong>mations. Define the mappingU : R n → R n as the linear trans<strong>for</strong>mation exchanging the first <strong>and</strong> k’th coordinates,leaving all other coordinates the same. U is then orthonormal, <strong>and</strong>∫∂f∂x k(x)φ n (x) dx ===∫ ∂f(Ux)φ n (Ux) dx∂x k∫∑ n∂f(Ux) ∂U i(x)φ n (Ux) dx∂xi=1 i ∂x 1∫ ∂(f ◦ U)(x)φ n (x) dx.∂x 1Likewise,∫∫f(x)x k φ n (x) dx =∫f(Ux)(Ux) k φ n (Ux) dx =(f ◦ U)(x)x 1 φ n (x) dx.There<strong>for</strong>e, it will suffice to show the result <strong>for</strong> k = 1 <strong>for</strong> mappings of the <strong>for</strong>m f ◦ Uwhere f ∈ C ∞ p (R n ). But such mappings are in C ∞ p (R n ) as well, <strong>and</strong> we conclude thatit will suffice to show the result in the case where k = 1 <strong>for</strong> some f ∈ C ∞ p (R n ). Inthat case, we obtain∫= limM→∞= limM→∞∫= limM→∞∂f(x)φ n (x) dx∂x 1∫[−M,M] n∫ M∫[−M,M] n−1∂f∂x 1(x)φ n (x) dx−M∂f∂x 1(x)φ n (x) dx 1 · · · dx n∫ M[f(x)φ n (x)] x1=Mx 1 =−M −[−M,M] n−1 −Mf(x) ∂φ n∂x 1(x) dx 1 dx 2 · · · dx n .We wish to use dominated convergence to get rid of the first term. Defining x ′ by


138 The Malliavin Calculusx ′ = (x 2 , . . . , x n ), we obtain∫ ∣ ∣∣[f(x)φn(x)] x 1=Mx 1=−M ∣ dx 2 · · · dx n[−M,M]∫ ∣ n−1 ∣∣[f(x)φn≤(x)] x1=Mx 1 =−M ∣ dx ′∫≤ |f(M, x ′ )|φ n (M, x ′ ) + |f(−M, x ′ )|φ n (−M, x ′ ) dx ′∫∫= |f(M, x ′ )|φ(M)φ n−1 (x ′ ) dx ′ + |f(−M, x ′ )|φ(−M)φ n−1 (x ′ ) dx ′Let p with p(x) = ∑ mk=0 a k∏ ni=1 xαk i , be a polynomial such that |f| ≤ p. Then,sup |f(M, x ′ )|φ(M)M∈R≤=m∑n∏|a k | sup φ(M)M αk 1|x| αk iM∈Ri=2m∑()∏ n|a k | sup |M| αk 1 φ(M) |x| αk i ,M∈Rk=0k=0i=2where the suprema are finite since φ decays faster than any exponential. There<strong>for</strong>e,sup M∈R |f(M, x ′ )|φ 1 (M) has polynomial growth. From this we conclude thatsup M∈R |f(M, x ′ )|φ 1 (M)φ n−1 (x ′ ) is integrable, <strong>and</strong> by dominated convergence, wethere<strong>for</strong>e get==limM→∞∫∫= 0.∫[−M,M] n−1 [f(x)φ n (x)] x 1=Mx 1=−M dx′lim 1 [−M,M]M→∞ n−1(x′ )(fφ n )(M, x ′ ) − 1 [−M,M] n−1(x ′ )(fφ n )(−M, x ′ ) dx ′lim (fφ n)(M, x ′ ) − (fφ n )(−M, x ′ ) dx ′M→∞We may there<strong>for</strong>e conclude∫∂f(x)φ n (x) dx = − lim∂x 1∫= lim=M→∞M→∞∫∫[−M,M] n−1 ∫ M−M[−M,M] n f(x)x 1 φ n (x) dxf(x)x 1 φ n (x) dx,f(x) ∂φ n∂x 1(x) dx 1 dx 2 · · · dx nas desired.


4.2 The Malliavin Derivative on S <strong>and</strong> D 1,p 139Comment 4.2.7 The result of Lemma 4.2.6 is a version of a result sometimes knownas Stein’s lemma in the literature, see <strong>for</strong> example Zou et al. (2007). See Stein (1981),Lemma 1 <strong>and</strong> Lemma 2 <strong>for</strong> proofs of Stein’s Lemma.◦Lemma 4.2.8. Consider G = g(W u ) ⊗ 1 [s,t] with g ∈ Cc∞ (R n ) <strong>and</strong> u ∈ [0, T ] n . Thereis some constant C such that <strong>for</strong> any F ∈ S, | ∫ G(DF ) dη| ≤ C‖F ‖ p .Proof. By Lemma 4.1.2, we can assume that F = f(W u ) <strong>for</strong> some f ∈ Cp ∞ (R n ), <strong>and</strong>we can assume that 0 < u 1 < · · · < u n . Define Ax = ( √ x1 xu1, √u2 2−x 1 x−u 1, . . . , √un−u n−x n−1n−1),by Lemma 4.2.5 we then haven∑k=1∂∂x kf(W u )1 [0,uk ] =n∑n∑k=1 i=1∂∂x i(f ◦ A −1 )(AW u )A ′ ik1 [0,uk ],where A ′ is the matrix <strong>for</strong> the linear mapping A. Defining f ′ = f ◦A −1 <strong>and</strong> g ′ = g◦A −1 ,we then obtain∫GDF dη ===∫ n∑ n∑ ∂f ′G(AW u )A ′∂xik1 [0,uk ] dηk=1 i=1 in∑ n∑∫A ′ ik g(W u ) ∂f ′(AW u )1 [s,t] 1 [0,uk ] dη∂xk=1 i=1in∑ n∑(∫)A ′ ik 1 [s,t] 1 [0,uk ] dλ E(g ′ (AW u ) ∂f ′ )(AW u ) .∂x ik=1 i=1In particular, then,∫∣GDF dη∣ ≤ nT ‖A′ ‖ ∞n∑i=1(∣ E g ′ (AW u ) ∂f )∣ ′ ∣∣∣(AW u ) .∂x iWe consider the mean values in the sums. Using Lemma 4.2.6, we obtainE(g ′ (AW u ) ∂f ′ )(AW u )∂x i===∫g ′ (x) ∂f ′(x)φ n (x) dx∂x i∫ ∂g ′ f ′∫ ∂g′(x)φ n (x) dx − (x)f ′ (x)φ n (x) dx∂x i∂x∫∫ i∂gg ′ (x)f ′ ′(x)x i φ n (x) dx − (x)f ′ (x)φ n (x) dx.∂x i


140 The Malliavin CalculusRecalling that F = f ′ (AW u ), we can use Hölder’s inequality to obtain(∣ E g ′ (AW u ) ∂f )∣ ′ ∣∣∣(AW u ) =∂x i=≤∫∫ ∂g∣ g ′ (x)f ′ ′(x)x i φ n (x) dx − (x)f ′ (x)φ n (x) dx∂x i∣(∣ EF g ′ (AW u )(AW u ) i − ∂g′ ∣∣∣(AW u ))∣∂x i∥ ∥∥∥‖F ‖ p g ′ (AW u )(AW u ) i − ∂g′(AW u )∂x i∥ .qSince g has compact support, so does g ′ <strong>and</strong> ∂g′∂x i. There<strong>for</strong>e, the latter factor in theabove is finite. PuttingC = nT ‖A ′ ‖ ∞n∑i=1we have proved | ∫ GDF dη| ≤ C‖F ‖ p , as desired.∥ g′ (AW u )(AW u ) i − ∂g′(AW u )∂x i∥ ,qTheorem 4.2.9. The operator D is closable from S to L p (Π).Proof. Let a sequence (F n ) ⊆ S be given such that F n converges to zero in L p (F T )<strong>and</strong> such that DF n converges in L p (Π). We need to prove that the limit of DF n iszero. Let ξ be the limit, By Theorem A.2.5 it will suffice to prove that that <strong>for</strong> anyG ∈ L q (Π), ∫ ξG dη = 0.Let H be the class of elements G ∈ L q (Π) such that ∫ ξG dη = 0. We will provethat H = L q (Π) first by proving that bH contains all bounded mappings in L q (Π).Obviously, then, H also contains all bounded mappings in L q (Π) <strong>and</strong> we can finish theproof by a closedness argument.Step 1: bH contains all bounded mappings. We will use the monotone classtheorem to prove that bH contains all bounded mappings. To this end, note thatbH by dominated convergence is a monotone vector space. Defining K as the classof mappings g(W t ) ⊗ 1 [s,t] where g ∈ Cc∞ (R n ) <strong>and</strong> t ∈ [0, T ] n , we know from Lemma4.1.12 that K is a multiplicative class generating Π. If we can show that K ⊆ bH,it will follow from the monotone class theorem B.1.2 that bH contains all boundedmappings in L q (Π).Let G ∈ K be given with G = g(W u ) ⊗ 1 [s,t] . Our ambition is to show G ∈ H.By Lemma 4.2.8, there exists a constant C, such that <strong>for</strong> any F ∈ S, it holds that


4.2 The Malliavin Derivative on S <strong>and</strong> D 1,p 141| ∫ G(DF ) dη| ≤ C‖F ‖ p . By the continuity of F ↦→ ∫ F G dη, we then obtain∫∣∫∫∣∣∣ ∣ ξG dη∣ = G lim DF n dηn ∣ = lim n ∣ G(DF n ) dη∣ ≤ lim C‖F n‖ p = 0.nThus, ∫ Gξ dη = 0. We conclude G ∈ bH <strong>and</strong> there<strong>for</strong>e K ⊆ H.Step 2: H is closed. Since the bounded elements of L q (Π) are dense in L q (Π), itwill follow that H = L q (Π) if we can prove that H is closed. To do so, assume that(G n ) ⊆ H, converging to G. Then lim ‖ξG n − ξG‖ 1 ≤ lim ‖ξ‖ p ‖G n − G‖ q = 0 byHölder’s inequality, so ∫ ξG dη = lim ∫ ξG n dη = 0, <strong>and</strong> H is closed. It follows thatH = L q (Π) <strong>and</strong> we may finally conclude that ξ = 0 <strong>and</strong> there<strong>for</strong>e D is closable.We may now conclude from Theorem A.7.3 that D has a unique closed extension fromS to the subspace D 1,p of L p (F T ) defined as the F ∈ L p (F T ) such that there existsa sequence (F n ) in S converging to F such that DF n converges as well, <strong>and</strong> in thiscase the value of DF is the limit of DF n . D maps from D 1,p into L p (Π). Note thatendowing the space D 1,p with the norm given by ‖F ‖ 1,p = ‖F ‖ p +‖DF ‖ p , S is a densesubspace of D 1,p . This is another way to think of the extension of D from S to D 1,p .We will mainly be interested in the properties of the operator from D 1,2 → L 2 (Π). Wewill consider this case in detail in the next section. First, we will spend a momentreviewing our progress. We have succeeded in defining, <strong>for</strong> any p ≥ 1, a linear operatorD : D 1,p → L p (Π) with the properties that• For F ∈ S with F = f(W t ), DF = ∑ n ∂fk=1 ∂x k(W t )1 [0,tk ].• When considering D 1,p under the norm ‖ · ‖ p , D is closed.These two properties are more or less all there is to the definition. The first propertyshows how to calculate D on S, <strong>and</strong> the second property shows how to infer valuesof D outside of S from the values on S. Note that our definition of D 1,p is differentfrom the one seen in Nualart (2006). Nualart (2006) consideres the construction ofthe Malliavin derivative based on isonormal gaussian processes, which can be seen asa generalization of Brownian motion. This allows certain proofs the clarity of higherabstraction, but the downside is that the theory becomes harder to relate to realproblems <strong>and</strong> furthermore that the development of the derivative operator is tied upto the L 2 -structure of the concept of isonormal processes. For a brownian motion, the


142 The Malliavin Calculusspace of smooth functionals considered in Nualart (2006) would be{f( ∫ T0∫ }T∣∣∣∣h 1 (t) dW (t), . . . , h n (t) dW (t))∣h 1 , . . . , h n ∈ L 2 [0, T ], f ∈ Cp ∞ (R n ) ,0<strong>and</strong> the space D 1,p would be the completion of this space with respect to the norm‖F ‖ 1,p =()E|F | p + E‖DF ‖ p 1pL 2 [0,T ],where DF is considered as a stochastic variable with values in L 2 [0, T ].4.3 The Malliavin Derivative on D 1,2We will now consider specifically the properties of the Malliavin derivative as operatorD : D 1,2 → L 2 (Π). We will prove two results allowing us easy manipulation of theMalliavin derivative: the chain rule <strong>and</strong> the integration-by-parts <strong>for</strong>mula. Furthermore,we will prove that D 1,2 has some degree of richness by showing that it containsall stochastic integrals of deterministic functions. We will also show how the derivativeof such integrals are given.We begin by proving the chain rule. The proof of the general case is rather long, so wesplit it in two. We begin by proving in Theorem 4.3.1 the chain rule as it is <strong>for</strong>mulatedin Nualart (2006) with trans<strong>for</strong>mations having bounded partial derivatives. Afterthis, we prove in Theorem 4.3.5 a generalized chain rule.Theorem 4.3.1. Let F = (F 1 , . . . , F n ), where F 1 , . . . , F n ∈ D 1,2 , <strong>and</strong> let ϕ ∈ C 1 (R n )with bounded partial derivatives. Then ϕ(F ) ∈ D 1,2 <strong>and</strong>Dϕ(F ) =n∑k=1∂ϕ∂x k(F )DF k .∂ϕComment 4.3.2 In the above,∂x k(F ) is a mapping on Ω <strong>and</strong> DF k is a mapping on[0, T ] × Ω. The multiplication is understood to be pointwise in the sense( ∂ϕ∂x k(F )DF k)(t, ω) = ∂ϕ∂x k(F )(ω)DF k (t, ω).◦


4.3 The Malliavin Derivative on D 1,2 143Proof. The conclusion is well-defined since ∂ϕ∂x kis bounded, so ϕ(F ) is an element ofL 2 (F T ) <strong>and</strong> the right-h<strong>and</strong> side is an element of L 2 (Π). We prove the theorem in threesteps:1. The case where ϕ ∈ C ∞ p (R n ) <strong>and</strong> F ∈ S n .2. The case where ϕ ∈ C ∞ p (R n ) with bounded partial derivatives <strong>and</strong> F ∈ D n 1,2.3. The case where ϕ ∈ C 1 (R n ) with bounded partial derivatives <strong>and</strong> F ∈ D n 1,2.Step 1: Cp∞ trans<strong>for</strong>mations of S. Consider the case where ϕ ∈ Cp ∞ (R n ) <strong>and</strong>F ∈ S n . By Lemma 4.1.2, we can assume that the F k are trans<strong>for</strong>mations of thesame coordinates of the wiener process, F k = f k (W t ) where t ∈ [0, T ] m . Puttingf = (f 1 , . . . , f n ), ϕ ◦ f ∈ Cp ∞ (R m ). We have ϕ(F ) = (ϕ ◦ f)(W t ), so ϕ(F ) ∈ S <strong>and</strong>there<strong>for</strong>em∑ ∂(ϕ ◦ f)Dϕ(F ) =(W t )1 [0,ti ]∂x i===i=1m∑n∑i=1 k=1n∑k=1n∑k=1∂ϕ∂x k(f(W t )) ∂f k∂x i(W t )1 [0,ti]∂ϕ∂x k(f(W t ))∂ϕ∂x k(F )DF k .m∑i=1∂f k∂x i(W t )1 [0,ti]Step 2: Special Cp∞ trans<strong>for</strong>mations of D 1,2 . We next consider the case whereF ∈ D n 1,2 <strong>and</strong> ϕ ∈ Cp ∞ (R n ) with bounded partial derivatives. Since D is closed, weknow that there exists F j k ⊆ S such that F j k converges to F k in L 2 (F T ) <strong>and</strong> DF j kconverges to DF k in L 2 (Π). Then, by what we have already shown, ϕ(F j ) ∈ S <strong>and</strong>Dϕ(F j ) = ∑ n ∂ϕk=1 ∂x k(F j )DF j k . Our plan is to show that ϕ(F j ) converges to ϕ(F ) inL 2 (F T ) <strong>and</strong> that Dϕ(F j ) converges to ∑ n ∂ϕi=1 ∂x i(F i )DF i in L p (Π). By the closednessof D, this will imply that ϕ(F ) ∈ D 1,2 <strong>and</strong> Dϕ(F ) = ∑ n ∂ϕk=1 ∂x k(F )DF k .By assumption, F j k converges to F k in L 2 (F T ) <strong>and</strong> DF j k converges in L2 (Π) to DF k .By picking a subsequence we may assume that we also have almost sure convergence.By the mean value theorem, Lemma A.2.2,∥∥ϕ(F j ) − ϕ(F ) ∥ ∥2=n∑ ∂ϕ(ξ)(F j∥k∂x − F k)k ∥k=12≤ maxk≤n∥ ∥ ∂ϕ ∥∥∥∞ ∥∥∥∥ ∑n ∥(F j k∂x − F k) ∥ ,kk=1∥2


144 The Malliavin Calculusso ϕ(F j ) converges to ϕ(F ) in L 2 (F T ). To show the other convergence, note that∥ Dϕ(F j ) −n∑i=1∂ϕ∂x i(F )DF i∥ ∥∥∥∥2=≤∥ n∑ ∂ϕ(F j )DF j i∥ ∂x − ∑n ∂ϕ ∥∥∥∥2(F )DF ii=1 i ∂xi=1 in∑∂ϕ∥ (F j )DF j i∂x − ∂ϕ ∥ ∥∥∥2(F )DF i ,i ∂x iwhere, <strong>for</strong> each i ≤ n,∂ϕ∥ (F j )DF j i∂x − ∂ϕ ∥ ∥∥∥2(F )DF ii ∂x i ( ≤∂ϕ∥ (F j )(DF j i∂x − DF i)i∥ +∂ϕ∥ (F j ) − ∂ϕ ) ∥ ∥∥∥2(F ) DF i2∂x i ∂x i ∥ ≤∂ϕ ∥∥∥∞ ∥ (∥∥DF j∥i∂x − DF i∥ +∂ϕi 2∥ (F j ) − ∂ϕ ) ∥ ∥∥∥2(F ) DF i .∂x i ∂x ii=1Here, the first term converges to zero by assumption. Regarding the second term, weknow that F j −→ a.s.∂ϕF , <strong>and</strong> by continuity,∂x i(F j ) −→ a.s. ∂ϕ∂x i(F ), bounded by a constantsince the partial derivatives are bounded. Dominated convergence there<strong>for</strong>e yields thatthe second term in the above converges to zero as well. We conclude that ϕ(F ) ∈ D 1,2<strong>and</strong> Dϕ(F ) = ∑ n ∂ϕk=1 ∂x k(F )DF k .Step 3: C 1 trans<strong>for</strong>mations of D 1,2 . Now assume ϕ ∈ C 1 with bounded partialderivatives, <strong>and</strong> let F ∈ D 1,2 . From Lemma A.4.3, we know that there exists functionsϕ n ∈ Cp ∞ (R n ) such that ϕ n converges uni<strong>for</strong>mly to ϕ, the partial derivatives of ϕ nconverges pointwise to the partial derivatives of ϕ <strong>and</strong> ‖ ∂ϕ n∂x k‖ ∞ ≤ ‖ ∂ϕ∂x k‖ ∞ . By whatwe already have shown, ϕ n (F ) ∈ D 1,2 <strong>and</strong> Dϕ n (F ) = ∑ n ∂ϕ nk=1 ∂x k(F )DF k . We wantto argue that ϕ n (F ) converges to ϕ(F ) in L 2 (F T ) <strong>and</strong> that Dϕ n (F ) converges to∑ n ∂ϕk=1 ∂x k(F )DF k in L 2 (Π). As in the previous step, by closedness of D, this will yieldthat ϕ(F ) ∈ D 1,2 <strong>and</strong> that the derivative is given by Dϕ(F ) = ∑ n ∂ϕk=1 ∂x k(F )DF k .Clearly, since ‖ϕ(F ) − ϕ n (F )‖ 2 ≤ ‖ϕ − ϕ n ‖ ∞ , ϕ n (F ) converges to ϕ(F ). To show thesecond limit statement, we noteSince ∂ϕ n∂x k∥ Dϕ n(F ) −n∑k=1∥∂ϕ ∥∥∥∥(F )DF k ≤∂x k2n∑( ∂ϕn∥ (F ) − ∂ϕ ) ∥ ∥∥∥2(F ) DF k .∂x k ∂x kk=1converges pointwise to ∂ϕ∂x k, clearly ∂ϕ n∂x k(F ) a.s.−→ ∂ϕ∂x k(F ), <strong>and</strong> this convergenceis bounded. The dominated convergence theorem there<strong>for</strong>e yields that the abovesum converges to zero.


4.3 The Malliavin Derivative on D 1,2 145The chain rule of Theorem 4.3.1 is sufficient in many cases, but it will be convenientin the following to have an extension which does not require the boundedness of thepartial derivatives of the trans<strong>for</strong>mation, but only requires enough integrability toensure that the statement of the chain rule is well-defined. This is what we now setout to prove.Lemma 4.3.3. Let M > 0 <strong>and</strong> ε > 0. There exists a mapping f ∈ Cc∞ (R n ) withpartial derivatives bounded by 1 + ε such that [−M, M] n ≺ f ≺ (−(M + 1), M + 1) n .Proof. Let ε > 0 be given, put M ε− = M − ε <strong>and</strong> M ε+ = M + ε. Define the setsK ε = [−M ε+ , M ε+ ] n <strong>and</strong> V ε = (−(M ε− + 1), M ε− + 1) n . We will begin by identifyinga d ∞ -Lipschitz mapping g ∈ C c (R n ) with K ε ≺ g ≺ V ε .To this end, we put g(x) = min{ 11−2ε d ∞(x, Vε c ), 1}. Clearly, then g is continuous <strong>and</strong>maps into [0, 1]. Since g is zero on Vεc <strong>and</strong> V ε has compact closure, g has compactsupport. And if x ∈ K ε , we obtain{ }1g(x) = min1 − 2ε d ∞(x, Vε c ), 1{ }1≥ min1 − 2ε d ∞(K ε , Vε c ), 1 = 1,We conclude K ε ≺ g ≺ V ε . We know that |d ∞ (x, V c ) − d ∞ (y, V c )| ≤ d ∞ (x, y) by theinverse triangle inequality, so x ↦→ d ∞ (x, V c ) is d ∞ -Lipschitz continuous with Lipschitzconstant 1. There<strong>for</strong>e, the mapping x ↦→ 11−2ε d ∞(x, V c ) is d ∞ -Lipschitz continuouswith Lipschitz constant11−2ε . Since x ↦→ min{x, 1} is a contraction, g is d ∞-Lipschitzcontinuous with Lipschitz constant11−2ε .Now let (ψ ε ) be a Dirac family with respect to ‖·‖ ∞ <strong>and</strong> let f = g∗ψ ε . We claim that fsatisfies the properties in the lemma. Let K = [−M, M] n <strong>and</strong> V = (−(M +1), M +1) n ,we want to show that f ∈ Cc∞ (R n ), that K ≺ f ≺ V <strong>and</strong> that f has partial derivatives1bounded by1−2ε. By Lemma A.3.2, f is infinitely differentiable, <strong>and</strong> it is clear thatf takes values in [0, 1]. If x ∈ V c , we have x − y ∈ Vε c <strong>for</strong> any y with ‖y‖ ∞ ≤ ε.There<strong>for</strong>e,∫∫f(x) = g(x − y)ψ ε (y) dy = 1 (‖y‖∞ ≤ε)g(x − y)ψ ε (y) dy = 0.Likewise, if x ∈ K, then x − y ∈ K ε <strong>for</strong> any y with ‖y‖ ∞ ≤ ε <strong>and</strong> thus∫∫f(x) = g(x − y)ψ ε (y) dy = 1 (‖y‖∞≤ε)g(x − y)ψ ε (y) dy = 1.We have now shown K ≺ f ≺ V . Since V is compact closure, it follows in particularthat f has compact support. It remains to prove the bound on the partial derivatives.


146 The Malliavin CalculusTo this end, we note|f(x) − f(y)| =≤∫∫∣ g(x − z)ψ ε (z) dz − g(y − z)ψ ε (z) dz∣∫|g(x − z) − g(y − z)|ψ ε (z) dz≤ d ∞(x, y)1 − 2ε .We there<strong>for</strong>e obtain, with e k denoting the unit vector in the k’th direction,∂f∣ (x)∂x k∣ = lim |f(x + he k ) − f(x)|1 ‖he k ‖ ∞≤ lim= 1h→0 + hh→0 + 1 − 2ε h 1 − 2ε .We have now identified a mapping f ∈ Cc∞ (R n ) with K ≺ f ≺ V <strong>and</strong> partial derivativesbounded by11−2ε . Since ε > 0 was arbitrary, we conclude that there existsf ∈ Cc ∞ (R n ) with K ≺ f ≺ V <strong>and</strong> partial derivatives bounded by 1 + ε <strong>for</strong> any ε > 0.This shows the lemma.Comment 4.3.4 Instead of explicitly constructing the Lipschitz mapping, we couldalso have applied the extended Urysohn’s Lemma of Theorem C.2.2. This would haveyielded a weaker bound <strong>for</strong> the partial derivatives, but the nature of the argumentwould be more general.◦Theorem 4.3.5 (Chain rule). Let F = (F 1 , . . . , F n ), where F 1 , . . . , F n ∈ D 1,2 , <strong>and</strong>let ϕ ∈ C 1 (R n ). Assume ϕ(F ) ∈ L 2 (F T ) <strong>and</strong> ∑ n ∂ϕk=1 ∂x k(F )DF k ∈ L 2 (Π). Thenϕ(F ) ∈ D 1,2 <strong>and</strong> Dϕ(F ) = ∑ n ∂ϕk=1 ∂x k(F )DF k .Proof. We first prove the corollary in the case where ϕ is bounded, <strong>and</strong> then extendto general ϕ.Step 1: The bounded case. Assume that ϕ ∈ C 1 (R n ) is bounded <strong>and</strong> that we have∑ n ∂ϕk=1 ∂x k(F )DF k ∈ L 2 (Π). In this case, obviously ϕ(F ) ∈ L 2 (F T ). We will employthe bump functions analyzed in Lemma 4.3.3. Let ε > 0 <strong>and</strong> define the sets K m <strong>and</strong>V m by K m = [−m, m] n <strong>and</strong> V m = (−m − 1, m + 1) n . By Lemma 4.3.3, there existsmappings (c m ) in Cc∞ (R n ) with K m ≺ c m ≺ V m such that the partial derivatives ofc m are bounded by 1 + ε.Define ϕ m by putting ϕ m (x) = c m (x)ϕ(x). We will argue that ϕ m ∈ C 1 (R n ) withbounded partial derivatives. It is clear that ϕ m is continuously differentiable, <strong>and</strong>∂ϕ m∂x k(x) = ∂c m∂x k(x)ϕ(x) + c m (x) ∂ϕ∂x k(x).


4.3 The Malliavin Derivative on D 1,2 147Since c m ∈ Cc∞ (R n ), both c m <strong>and</strong> ∂c m∂x khas compact support. It follows that ∂ϕ m∂x khas compact support, <strong>and</strong> since it is also continuous, it is bounded. We can there<strong>for</strong>eapply the ordinary chain rule of Theorem 4.3.1 to conclude that ϕ m (F ) ∈ D 1,2<strong>and</strong> Dϕ m (F ) = ∑ n ∂ϕ mk=1 ∂x k(F )DF m . We will use the closedness of D to extend thisresult to ϕ(F ). We will show that ϕ m (F ) converges to ϕ(F ) in L 2 (F T ) <strong>and</strong> that∑ n ∂ϕ mk=1 ∂x k(F )DF k converges to ∑ n ∂ϕk=1 ∂x k(F )DF k in L 2 (Π). Closedness of D willthen yield ϕ(F ) ∈ D 1,2 <strong>and</strong> Dϕ(F ) = ∑ n ∂ϕk=1 ∂x k(F )DF k .Since c m converges pointwise to 1, ϕ m converges pointwise to ϕ. There<strong>for</strong>e, we obtainϕ m (F ) a.s.−→ ϕ(F ). Because ‖c m ‖ ∞ ≤ 1, dominated convergence yields that ϕ m (F )converges in L 2 (F T ) to ϕ(F ). To show the corresponding result <strong>for</strong> the derivatives,first note that ∂c m∂x kis zero on Km. ◦ Since c m is one on Km, ◦ we find that ∂ϕ m∂x kare equal on Km. ◦ We there<strong>for</strong>e obtain==≤n∑∥k=1n∑∥k=1n∑∥k=1n∑∥k=1∂ϕ m∂x k(F )DF k −n∑k=1∥∂ϕ ∥∥∥∥(F )DF k∂x k2)∥ ∥∥∥∥DF k2(1 (K ◦ m ) c(F ) ∂cm(F )ϕ(F ) + (c m (F ) − 1) ∂ϕ ) ∥ ∥∥∥∥(F ) DF k∂x k ∂x k2∥ n∑1 (K ◦m ) c(F ) ( ∂ϕm∂x k(F ) − ∂ϕ∂x k(F )1 (K ◦ m ) c(F )∂c m∂x k(F )ϕ(F )DF k∥ ∥∥∥∥2+∥k=1<strong>and</strong> ∂ϕ∂x k∥1 (K ◦ m ) c(F )(c m(F ) − 1) ∂ϕ ∥∥∥∥(F )DF k .∂x k2We wish to show that each of these terms tend to zero by dominated convergence.Considering the first term, by definition of c m we obtain∣n∑∣k=11 (K ◦ m ) c(F )∂c m∂x k(F )ϕ(F )DF k∣ ∣∣∣∣≤ (1 + ε)‖ϕ‖ ∞n ∑k=1|DF k |,which is square-integrable. Likewise, <strong>for</strong> the second term we have the square-integrablebound∣ ∣ n∑1∣ (K ◦ m ) c(F )(c m(F ) − 1) ∂ϕ ∣∣∣∣ (F )DF k = |1 (K ◦∂x m ) c(F )(c n∑ ∂ϕ ∣∣∣∣m(F ) − 1)|(F )DFk ∣k∂x kk=1k=1∣ n∑ ∂ϕ ∣∣∣∣≤(F )DF k .∣ ∂x kSince K ◦ m increases to R n , by dominated convergence using the two bounds obtainedabove, we find that both norms tends to zero <strong>and</strong> there<strong>for</strong>e we may finally concludek=1


148 The Malliavin Calculusthat ∑ n ∂ϕ mk=1 ∂x k(F )DF k tends to ∑ nk=1implies that ϕ(F ) ∈ D 1,2 <strong>and</strong> Dϕ(F ) = ∑ nk=1∂ϕ∂x k(F )DF k in L 2 (Π). Closedness of D then∂ϕ∂x k(F )DF k .Step 2: The general case. Now let ϕ ∈ C 1 (R n ) be arbitrary <strong>and</strong> assume thatϕ(F ) ∈ L 2 (F T ) <strong>and</strong> ∑ n ∂ϕk=1 ∂x k(F )DF k ∈ L 2 (Π). We still consider some fixed ε > 0.Define the sets K m = [−m, m] <strong>and</strong> V m = [−(m + 1), m + 1] <strong>and</strong> let c m be the Lipschitzelement of Cc∞ (R) with K m ≺ c m ≺ V m that exists by Lemma 4.3.3 with Lipschitzconstant 1 + ε. Let C m be the antiderivative of c m which is zero at zero, given byC m (x) = ∫ x0 c m(y) dy. Note that since c m is bounded by one <strong>and</strong> is zero outside ofV m , ‖C m ‖ ∞ ≤ m + 1. In particular, C m is bounded. Defining ϕ m (x) = C m (ϕ(x)), itis then clear that ϕ m is bounded. Furthermore, ϕ m ∈ C 1 (R n ) <strong>and</strong>k=1∂ϕ m(x) = c m (ϕ(x)) ∂ϕ (x).∂x k ∂x kWe there<strong>for</strong>e have∣ ∣ ∣ n∑ ∂ϕ ∣∣∣∣ mn∑(F )DF k =∣ ∂x k ∣ c ∂ϕ ∣∣∣∣ n∑ ∂ϕ ∣∣∣∣m(ϕ(F )) (F )DF k ≤(F )DF k ,∂x k ∣ ∂x kk=1so ∑ n ∂ϕ mk=1 ∂x k(F )DF k ∈ L 2 (Π). Thus, ϕ m is covered by the previous step of the proof,<strong>and</strong> we may conclude that ϕ m (F ) ∈ D 1,2 <strong>and</strong> Dϕ m (F ) = ∑ n ∂ϕ mk=1 ∂x k(F )DF k . As inthe previous step, we will extend this result to ϕ by applying the closedness of D. Wewill there<strong>for</strong>e need to prove convergence in L 2 (F T ) of ϕ m (F ) to ϕ(F ) <strong>and</strong> to proveconvergence in L 2 (Π) of ∑ nk=1∂ϕ m∂x k(F )DF k to ∑ nk=1k=1∂ϕ∂x k(F )DF k .To this end, note that since c m is bounded by one, |C m (x)| ≤ ∫ x0 |c m(y)| dy ≤ |x|, <strong>and</strong><strong>for</strong> x ∈ [−m, m], C m (x) = ∫ x0 c m(y) dy = x. There<strong>for</strong>e, ϕ m (x) = ϕ(x) on [−m, m] <strong>and</strong>‖ϕ m (F ) − ϕ(F )‖ 2 = ‖1 [−m,m] c(F )(ϕ m (F ) − ϕ(F ))‖ 2= ‖1 [−m,m] c(F )(C m (ϕ(F )) − ϕ(F ))‖ 2 .Since |C m (ϕ(F ))−ϕ(F )| ≤ |C m (ϕ(F ))|+|ϕ(F )| ≤ 2|ϕ(F )| by the contraction propertyof C m , we conclude by dominated convergence that the above tends to zero, so ϕ m (F )tends to ϕ(F ) in L 2 (F T ). Likewise, <strong>for</strong> the derivatives we obtain∥==n∑ ∂ϕ m(F )DF∥k −∂x kk=1n∑∥k=1n∑∥ (c m(ϕ(F )) − 1)n∑k=1c m (ϕ(F )) ∂ϕ∂x k(F )DF k −k=1∥∂ϕ ∥∥∥∥(F )DF k∂x k2n∑k=1∥∂ϕ ∥∥∥∥(F )DF k ,∂x k2∂ϕ∂x k(F )DF k∥ ∥∥∥∥2


4.3 The Malliavin Derivative on D 1,2 149<strong>and</strong> since c m −1 tends to zero, bounded by the constant one, we conclude by dominatedconvergence that the above tends to zero. By the closedness of D, we finally obtainϕ(F ) ∈ D 1,2 <strong>and</strong> Dϕ(F ) = ∑ n ∂ϕk=1 ∂x k(F )DF k , proving the theorem.Corollary 4.3.6. Let F, G ∈ D 1,2 . If the integrability conditions F G ∈ L 2 (F T ) <strong>and</strong>(DF )G + F (DG) ∈ L 2 (Π) hold, F G ∈ D 1,2 <strong>and</strong> D(F G) = (DF )G + F (DG).Proof. Putting ϕ(x, y) = xy, the chain rule of Theorem 4.3.5 yields that F G ∈ D 1,2<strong>and</strong> DF G = ∂ϕ∂x 1(F, G)DF + ∂ϕ∂x 2(F, G)DG = G(DF ) + F (DG).We are now done with the proof of the chain rule <strong>for</strong> the Malliavin derivative. Next,we prove that stochastic integrals with deterministic integr<strong>and</strong>s are in the domain ofD. This will allow us to reconcile our definition of the Malliavin derivative with thatfound in Nualart (2006).Lemma 4.3.7. Let h ∈ L 2 [0, T ].D ∫ Th(t) dW (t) = h.0It then holds that ∫ T0 h(t) dW (t) ∈ D 1,2, <strong>and</strong>Proof. First assume that h is continuous. We define h n (t) = ∑ nk=1 h( k n )1 ( k−1n , k ](t). nSince h is continuous, h is bounded <strong>and</strong> it follows that h n ∈ bE. We may there<strong>for</strong>econclude ∫ T0 h n(t) dW (t) ∈ S <strong>and</strong>∫ T∑n ( ( ( ( ))k k k − 1D s h n (t) dW (t) = D s h W − W0n)n)nk=1n∑( k= h 1n)(k−1n , n] (s) kk=1= h n (s).Next, note that h n converges pointwise to h <strong>for</strong> all t ∈ (0, T ]. By boundedness of h,h n converges in L 2 [0, T ] to h. By continuity of the stochastic integral, ∫ T0 h n(t) dW (t)converges in L 2 (F T ) to ∫ T0 h(t) dW (t). Closedness then yields ∫ T0 h(t) dW (t) ∈ D 1,2<strong>and</strong> D ∫ Th(t) dW (t) = h, as desired. This shows the result <strong>for</strong> continuous h.0Next, consider a general h ∈ L 2 [0, T ]. There exists continuous h n converging in L 2 [0, T ]to h. By the previous step, D ∫ T0 h n(t) dW t = h n . Thus, ∫ T0 h n(t) dW t converges to∫ T0h(t) dW t <strong>and</strong> D ∫ T0 h n(t) dW t converges to h. We conclude that ∫ T0 h(t) dW t ∈ D 1,2<strong>and</strong> D ∫ T0 h(t) dW t = h.


150 The Malliavin CalculusIn the following, we will <strong>for</strong> brevity denote the the Brownian stochastic integral operatoron [0, T ] by θ, that is, whenever X ∈ L 2 (Π) is progressively measurable, wewill write θ(X) = θX = ∫ T0 X s dW s . Furthermore, if X 1 , . . . , X n ∈ L 2 (Π) <strong>and</strong> we putX = (X 1 , . . . , X n ), we will write θX <strong>for</strong> (θX 1 , . . . , θX n ).Now consider h 1 , . . . , h n ∈ L 2 [0, T ] <strong>and</strong> let ϕ ∈ Cp ∞ (R n ). Since θh has a normaldistribution by Lemma 3.7.4, it follows that ϕ(θh) ∈ L 2 (Π). Furthermore, since h isdeterministic,∥ n∑ ∂ϕ ∥∥∥∥ n∑∥ (θh)h∥k ≤∂ϕ ∥∥∥2 n∑∂x k∥ (θh)h k =∂ϕ∂x k∥ (θh)∂x k∥ ‖h k ‖ 2,k=12 k=1k=12which is finite. By Lemma 4.3.7 <strong>and</strong> Theorem 4.3.5, we conclude that ϕ(θh) ∈ D 1,2<strong>and</strong> Dϕ(θh) = ∑ n ∂ϕk=1 ∂x k(θh)h k . This result shows that our method of introducingthe Malliavin derivative only <strong>for</strong> variables ϕ(W t ) with ϕ ∈ Cp ∞ (R n ) yields the sameresult as the method in Nualart (2006), where the Malliavin derivative is introduced<strong>for</strong> variables ϕ(θh) with ϕ ∈ Cp ∞ (R n ). The two methods are thus equivalent.We let S h denote the space of variables ϕ(θh), where h = (h 1 , . . . , h n ), h k ∈ L 2 [0, T ]<strong>and</strong> ϕ ∈ C ∞ p (R n ). Clearly, S ⊆ S h . Our final goal of this section will be to prove afew versions of the integration-by-parts <strong>for</strong>mula. This <strong>for</strong>mula will be very useful inthe next section, where we develop the Hilbert space theory of the Malliavin calculus.If X ∈ L 2 (Π) <strong>and</strong> h ∈ L 2 [0, T ], we will write 〈X, h〉 [0,T ] (ω) = ∫ X(ω, t)h(t) dt.Theorem 4.3.8 (Integration-by-parts <strong>for</strong>mula). Let h ∈ L 2 [0, T ] <strong>and</strong> F ∈ D 1,2 .Then E〈DF, h〉 [0,T ] = EF (θh), where F (θh)(ω) = F (ω)(θh)(ω).Proof. We first check that the conclusion is well-defined. Considering h as an elementof L 2 (Π), the left-h<strong>and</strong> side is simply the inner product in L 2 (Π) of two elements inL 2 (Π). And since F <strong>and</strong> θh are in L 2 (F T ), F (θh) ∈ L 1 (F T ). Thus, all the integralsare well-defined.By linearity, it will suffice to prove the result in the case where h has unit norm.First consider the case where F ∈ S h with F = f(θg), g ∈ (L 2 [0, T ]) n . By using alinear trans<strong>for</strong>mation, we can assume without loss of generality that g 1 = h <strong>and</strong> thatg 1 , . . . , g n are orthonormal. We then obtain〈∑ n∂fE〈DF, h〉 [0,T ] = E (θg)g k , g 1 = E∂x k〉[0,T ]k=1n∑k=1∂f∂x k(θg)〈g k , g 1 〉 [0,T ] = E ∂f∂x 1(θg).


4.3 The Malliavin Derivative on D 1,2 151Since g 1 , . . . , g n are orthonormal, θg is st<strong>and</strong>ard normally distributed. There<strong>for</strong>e, wecan use Lemma 4.2.6 to obtainE ∂f ∫∫∂f(θg) = (y)φ n (y) dy = f(y)y 1 φ n (y) dy = Ef(θg)(θg) 1 = EF θ(h).∂x 1 ∂x 1This proves the claim in the case where F ∈ S h . Now consider the case where F ∈ D 1,2 .Let F n be a sequence in S h such that F n converges to F <strong>and</strong> DF n converges to DF , thisis possible by the closedness definition of D <strong>and</strong> our earlier observation that S ⊆ S h .By continuity of the inner products, we findas desired.E〈DF, h〉 [0,T ] = E〈lim DF n , h〉 [0,T ]= lim E〈DF n , h〉 [0,T ]= lim EF n θ(h)= E lim F n θ(h)= EF θ(h),Theorem 4.3.8 is the fundamental result known as the integration-by-parts <strong>for</strong>mula,even though it may not seem to bear much similarity to the integration-by-parts <strong>for</strong>mulaof ordinary calculus. In practice, we will not really use the integration-by-parts<strong>for</strong>mula in the <strong>for</strong>m given in Theorem 4.3.8, rather we will be using the following twocorollaries.Corollary 4.3.9. Let F, G ∈ D 1,2 with F G ∈ L 2 (F T ) <strong>and</strong> F (DG) + G(DF ) ∈ L 2 (Π).ThenE(G〈DF, h〉 [0,T ] ) = E (F G(θh)) − EF 〈DG, h〉 [0,T ]Comment 4.3.10 Note that the left-h<strong>and</strong> side of the <strong>for</strong>mula above also can be writtenas ∫ ∫ DF (ω, t)G(ω)h(t) dt dP . This makes this <strong>for</strong>mula useful <strong>for</strong> identifying[0,T ]conditional expectations, which in the Hilbert space context corresponds to orthogonalprojections.◦Proof. By Corollary 4.3.6, F G ∈ D 1,2 <strong>and</strong> D(F G) = F (DG) + G(DF ). There<strong>for</strong>e,Theorem 4.3.8 yieldsE(F G(θh)) = E〈D(F G), h〉 [0,T ]= E〈F (DG), h〉 [0,T ] + E〈G(DF ), h〉 [0,T ]= EF 〈DG, h〉 [0,T ] + EG〈DF, h〉 [0,T ] ,


152 The Malliavin Calculus<strong>and</strong> rearranging yields the result.Corollary 4.3.11. Let F ∈ D 1,2 <strong>and</strong> G ∈ S h . ThenE(G〈DF, h〉 [0,T ] ) = E (F G(θh)) − EF 〈DG, h〉 [0,T ]Comment 4.3.12 The point of considering G ∈ S h is to avoid the integrabilityrequirements of Corollary 4.3.9. Particularly important <strong>for</strong> our later applications isthat we avoid the requirement F G ∈ L 2 (F T ).◦Proof. Let F n be a sequence in S h such that F n converges to F <strong>and</strong> DF n converges toDF . Then F n <strong>and</strong> G are both in S h , so F n <strong>and</strong> G are both Cp∞ trans<strong>for</strong>mations of coordinatesof the Wiener process. Since Cp∞ trans<strong>for</strong>mations <strong>and</strong> their partial derivativeshave polynomial growth, we conclude F n G ∈ L 2 (F T ) <strong>and</strong> F n (DG)+G(DF n ) ∈ L 2 (Π).From Lemma 4.3.9, we then haveE(G〈DF n , h〉 [0,T ] ) = E (F n G(θh)) − EF n 〈DG, h〉 [0,T ] .We want to show that each of these terms converge as F n converges to F .First term. We begin by considering the term on the left-h<strong>and</strong> side. Because wehave G ∈ L 2 (F T ) <strong>and</strong> h ∈ L 2 [0, T ], G ⊗ h ∈ L 2 (Π). DF n converges to DF in L 2 (Π),so by the Cauchy-Schwartz inequality, DF n (G ⊗ h) converges to DF (G ⊗ h) in L 1 (Π).Noting thatE ∣ ∣ ∫ T∫ T∣G〈DF n , h〉 [0,T ] − G〈DF, h〉 [0,T ] = E(G ⊗ h)DF∣n dλ − (G ⊗ h)DF dλ∣0≤ ‖DF n (G ⊗ h) − DF (G ⊗ h)‖ 1,we obtain that G〈DF n , h〉 [0,T ] converges in L 1 (F T ) to G〈DF, h〉 [0,T ] , <strong>and</strong> there<strong>for</strong>eE(G〈DF n , h〉 [0,T ] ) converges to E(G〈DF, h〉 [0,T ] ), as desired.0Second term. Next, consider the first term on the right-h<strong>and</strong> side. Since G(θh) isin L 2 (F T ), the Cauchy-Schwartz inequality implies that F n G(θh) tends to F G(θh) inL 1 (F T ). There<strong>for</strong>e E(F n G(θh)) converges to E(F G(θh)).Third term. Finally, consider the second term on the right-h<strong>and</strong> side. We proceedas <strong>for</strong> the term on the left-h<strong>and</strong> side <strong>and</strong> first note that since F n ∈ L 2 (F T ) <strong>and</strong>h ∈ L 2 [0, T ], F n ⊗ h ∈ L 2 (Π). We also obtain ‖F n ⊗ h − F ⊗ h‖ 2 2 = ‖F n − F ‖ 2 2‖h‖ 2 2,


4.3 The Malliavin Derivative on D 1,2 153showing that F n ⊗ h converges in L 2 (Π) to F ⊗ h. There<strong>for</strong>e, (F n ⊗ h)DG convergesin L 1 (Π) to (F ⊗ h)DG. Since we have, as <strong>for</strong> the first term considered,E ∣ ∣ ∣F n 〈DG, h〉 [0,T ] − F 〈DG, h〉 [0,T ] = E∣∫ T0(F n ⊗ h)DG dλ −∫ T≤ ‖(F n ⊗ h)DG − (F ⊗ h)DG‖ 10(F ⊗ h)DG dλ∣we conclude that F n 〈DG, h〉 [0,T ] tends to F 〈DG, h〉 [0,T ] in L 1 (F T ), <strong>and</strong> there<strong>for</strong>eE(F n 〈DG, h〉 [0,T ] ) tends to E(F 〈DG, h〉 [0,T ] ).Conclusion. We have now argued thatE(F Gθ(h)) = lim E(F n Gθ(h))nE(F 〈DG, h〉 [0,T ] ) = lim E(F n 〈DG, h〉 [0,T ] )nE(G〈DF, h〉 [0,T ] ) = lim E(G〈DF n , h〉 [0,T ] ),n<strong>and</strong> we may there<strong>for</strong>e conclude E(G〈DF, h〉 [0,T ] ) = E (F Gθ(h)) − EF 〈DG, h〉 [0,T ] bytaking the limits. This is the result we set out to prove.We have now come to the conclusion of this section. Be<strong>for</strong>e proceeding, we will reviewour progress. We have developed some of the fundamental results of the Malliavinderivative on D 1,2 . In Lemma 4.3.7, we have checked that stochastic integrals with deterministicintegr<strong>and</strong>s are in D 1,2 . This allowed us to reconcile our Malliavin derivativewith that of Nualart (2006). Also, in Theorem 4.3.5 <strong>and</strong> Theorem 4.3.8, we provedthe chain rule <strong>and</strong> the integration-by-parts <strong>for</strong>mula, respectively. These two resultswill be indispensable in the coming sections. The chain rule tells us that as long asnatural integrability conditions are satisfied, D 1,2 is stable under C 1 trans<strong>for</strong>mations.The integration-by-parts <strong>for</strong>mula is essential because it can transfer an expectationinvolving DF into an expectation involving only F . This will be of particular use inthe <strong>for</strong>m of Corollary 4.3.11.Our next goal is to introduce a series of subspaces of L 2 (F T ) <strong>and</strong> L 2 (Π) <strong>and</strong> investigatethe special properties of D related to these subspaces. These results are very important<strong>for</strong> the general theory of the Malliavin calculus. Un<strong>for</strong>tunately we will not be able tosee the full impact of them with the limited theory we detail here. However, we willtake the time to obtain one corollary from the theory, namely an extension of the chainrule to Lipschitz trans<strong>for</strong>mations.


154 The Malliavin Calculus4.4 Orthogonal subspaces of L 2 (F T ) <strong>and</strong> L 2 (Π)In this section, we will introduce orthogonal decompositions of L 2 (F T ) <strong>and</strong> L 2 (Π)given by∞⊕∞⊕L 2 (F T ) = H n <strong>and</strong> L 2 (Π) = H n (Π).n=0We will see that subspaces H n have orthogonal bases <strong>for</strong> which the Malliavin derivativecan be explicitly computed, <strong>and</strong> the images of the subspaces are in H n (Π). Theseproperties will allow us a characterization of D 1,2 in terms of the projections on thesubspaces H n <strong>and</strong> H n (Π), which can lead to some very useful results. In order to givean idea of what is to come, we provide the following graphical overview.n=0H nspan H ′′ nH ′′ n(Π)spanH n (Π)H ′ nclcl H nD H n (Π) H ′ n(Π)clclConsider the left-h<strong>and</strong> part of graph. It is meant to state that H n is the closureof H n ′ <strong>and</strong> H n, ′′ <strong>and</strong> the span of H n is H n.′′ On the right-h<strong>and</strong> part, H n (Π) is theclosure of H n(Π) ′ <strong>and</strong> H n(Π), ′′ <strong>and</strong> the span of H n (Π) is H n. ′′ The connection betweenthe left-h<strong>and</strong> <strong>and</strong> right-h<strong>and</strong> parts is the Malliavin derivative, which maps H n intoH n (Π).Our plan is first to <strong>for</strong>mally introduce H n , H ′ n, H ′′ n <strong>and</strong> H n <strong>and</strong> consider their basicproperties. After this, we introduce H n (Π), H ′ n(Π), H ′′ n(Π) <strong>and</strong> H n (Π). Finally, weprove that D maps H n into H n (Π) by explicitly calculating the Malliavin derivativeof the elements in H n . We then apply our results to obtain a characterization of theelements of the space D 1,2 in terms of the projections on the spaces H n .The subspaces H n are based on the Hermite polynomials. As described in AppendixA.5, the n’th Hermite polynomial is defined byH n (x) = (−1) n e x22d n x2e− 2dxn ,with H 0 (x) = 1 <strong>and</strong> H −1 (x) = 0. See the appendix <strong>for</strong> elementary properties of theHermite polynomials. This section will use a good deal of Hilbert space theory, seeAppendix A.6 <strong>for</strong> a review of the results used. We now define some of the subspacesof L 2 (F T ) under consideration. Recall that we in the previous section defined θ as theBrownian stochastic integral operator on [0, T ].


4.4 Orthogonal subspaces of L 2 (F T ) <strong>and</strong> L 2 (Π) 155Definition 4.4.1. We let H n′ be the linear span of H n (θh) <strong>for</strong> h ∈ L 2 [0, T ] with‖h‖ = 1 <strong>and</strong> let H n be the closure of H n ′ in L 2 (F T ).Our first goal is to show that the subspaces (H n ) are mutually orthogonal <strong>and</strong> thattheir orthogonal sum is L 2 (F T ). The following lemma will be fundamental to thisendeavor.Lemma 4.4.2. It holds that <strong>for</strong> h, g ∈ L 2 [0, T ] with ‖h‖ 2 = ‖g‖ 2 = 1,EH n (θh)H m (θg) ={0 if n ≠ mn!〈h, g〉 n if n = m .Proof. Let n, m ≥ 1 be given. By Lemma A.5.2, we have( )∣EH n (θh)H m (θg) = E ∂n∂s n exp sθh − s2 ∣∣∣s=0 ∂ m2( ( ) ∂n= E∂s n exp sθh − s2 ∂m2= E( ∂n∂ m (∂s n ∂t m exp sθh − s22∂t m exp (tθg − t2 2)∣∣∣∣t=0))∣ (tθg∂t m exp − t2 ∣∣∣s=0,t=02) ( ))∣exp tθg − t2 ∣∣∣s=0,t=0.2Our plan is to interchange the differentiation <strong>and</strong> integration above <strong>and</strong> explicitlycalculate <strong>and</strong> differentiate the resulting mean.Step 1: Interchange of differentiation <strong>and</strong> integration. We first note that by asimple induction proof,∂ n ∂ m ( ) ( )) ( )∂s n ∂t m exp sx − s2exp ty − t2 = q n (s, x)q m (t, y) exp(sx − s2exp ty − t2 ,2222where q n is some polynomial in two variables. We will show that <strong>for</strong> any s, t ∈ R, itholds thatE ∂ ) ( )∂s q n(s, x)q m (t, y) exp(sθh − s2exp tθg − t222= ∂ ) ( )∂s Eq n(s, x)q m (t, y) exp(sθh − s2exp tθg − t2 .22By symmetry, if this is true, then the same result will of course also hold when differentiationwith respect to t. Applying this result n + m times will yield the desiredexchange of differentiation <strong>and</strong> integration. To show the result, fix t ∈ R <strong>and</strong> consider


156 The Malliavin Calculusa bounded interval [a, b]. Define c = min{|a|, |b|}. We then have, with r n+1 somepolynomial <strong>and</strong> C <strong>and</strong> C t ′ some suitably large constants,) ( )∣ sup∂∣a≤s≤b ∂s q n(s, x)q m (t, y) exp(sx − s2exp ty − t2 ∣∣∣22 ) ( )∣ = sup∣ q n+1(s, x)q m (t, y) exp(sx − s2exp ty − t2 ∣∣∣a≤s≤b22( ) ( )≤ r n+1 (|x|)q m (|t|, |y|) exp bx − c2 exp ty − t222( ) ( )≤ (C + exp(|x|)(C t ′ + exp(|y|)) exp bx − c2 exp ty − t2 .22We have exploited that exponentials grow faster than any polynomial. Now, sinceE exp( ∑ nk=1 λ k|X k |) is finite whenever λ ∈ R n <strong>and</strong> X has an n-dimensional normaldistribution, we find( ) ( )E(C + exp(|θh|)(C t ′ + exp(|θg|)) exp bθh − c2 exp tθg − t2 < ∞.22Ordinary results <strong>for</strong> exchanging differentiation <strong>and</strong> integration now certify thatE ∂ ) ( )∂s q n(s, x)q m (t, y) exp(sθh − s2exp tθg − t222= ∂ ) ( )∂s Eq n(s, x)q m (t, y) exp(sθh − s2exp tθg − t2 ,22as desired. We can then concludeE( ∂n∂ m (∂s n ∂t m exp sθh − s22= ∂n∂s n ∂ m∂t m E (exp(sθh − s22) ( ))exp tθg − t2 2) ( ))exp tθg − t2 ,2<strong>and</strong> finally,( ∂n∂ m (E∂s n ∂t m exp sθh − s22= ∂n ∂ m (∂s n ∂t m E exp sθh − s22) ( ))∣exp tθg − t2 ∣∣∣s=0,t=02)exp(tθg − t2 2)∣∣∣∣s=0,t=0.Step 2: Calculating the mean. We now set to work to calculate) ( )E exp(sθh − s2exp tθh − t2 .22


4.4 Orthogonal subspaces of L 2 (F T ) <strong>and</strong> L 2 (Π) 157By Lemma 3.7.4, (θh, θg) is normally distributed with mean zero, V θh = 1, V θg = 1<strong>and</strong> Cov(θh, θg) = 〈h, g〉. We then conclude <strong>for</strong> any s, t ∈ R, sθh + tθg is normallydistributed with mean zero <strong>and</strong> variance s 2 + t 2 + 2st〈h, g〉. Using the <strong>for</strong>mula <strong>for</strong> theLaplace trans<strong>for</strong>m of the normal distribution, we then findE exp(sθh − s22= exp(− s2 + t 22= exp(− s2 + t 22= exp (st〈h, g〉) .) ( )exp tθg − t2 2)E exp (sθh + tθg)) ( )1exp2 (s2 + t 2 + 2st〈h, g〉)Step 3: Conclusion. With α = max{n, m}, we now obtain∂ n ∂ m ( ) ( )∂s n ∂t m E exp sθh − s2exp tθg − t222= ∂n ∂ m∂s n exp (st〈h, g〉)∂tm = ∂n ∂ m ∑∞ 〈h, g〉 k∂s n ∂t ms k t kk!k=0∞∑ 〈h, g〉 k=n!m!s k−n t k−m .k!k=αCombining our results, we conclude( ∂n∂ m (EH n (θh)H m (θg) = E∂s n ∂t m exp sθh − s22= ∂n ∂ m (∂s n ∂t m E exp sθh − s22∞∑=k=α) ( ))∣exp tθg − t2 ∣∣∣s=t=02)exp∣〈h, g〉 k∣∣∣∣n!m!s k−n t k−m ,k!s=t=0( )∣tθg − t2 ∣∣∣s=0,t=02<strong>and</strong> the final expression above is zero if n ≠ m <strong>and</strong> n!〈h, g〉 n otherwise. This showsthe claim of the lemma.Theorem 4.4.3. The subspaces (H n ) are orthogonal, <strong>and</strong> L 2 (F T ) = ⊕ ∞ n=0H n .Proof. We first prove that the subspaces are mutually orthogonal. Let n, m ≥ 1 begiven with n ≠ m <strong>and</strong> let h, g ∈ L 2 [0, T ] with ‖h‖ 2 = ‖g‖ 2 = 1. By Lemma 4.4.2,


158 The Malliavin CalculusH n (θh) <strong>and</strong> H m (θg) are orthogonal. Since H n is the closure of the span of elementsof the <strong>for</strong>m H n (θh), ‖h‖ = 1, Lemma A.6.2 yields that H n ⊥H m .Next, we must show L 2 (F T ) = ⊕ ∞ n=0H n . By Lemma A.6.11, ⊕ ∞ n=0H n is the closureof span ∪ ∞ n=0 H n . It will there<strong>for</strong>e suffice to show that the subspace span ∪ ∞ n=0 H n isdense in L 2 (F T ). To do so, it will by Lemma A.6.4 suffice to show that the orthogonalcomplement of span ∪ ∞ n=0 H n is {0}. And to do so, it will by Lemma A.6.2 suffice toshow that the orthogonal complement of ∪ ∞ n=0H n is {0}. This is there<strong>for</strong>e what weset out to do.There<strong>for</strong>e, let F ∈ L 2 (F T ) be given, orthogonal to ∪ ∞ n=1H n . This means that, <strong>for</strong> anyn ≥ 0 <strong>and</strong> h with ‖h‖ = 1, EF H n (θh) = 0. Let n be given. By Lemma A.5.6, thereare coefficients such thatn∑x n = λ k H k (x).k=0We then obtain EF (θh) n = ∑ nk=0 λ kEF H k (θh) = 0. By scaling, we can conclude thatEF (θh) n <strong>for</strong> any h ∈ L 2 [0, T ]. There<strong>for</strong>e, in the following h will denote an arbitraryelement of L 2 [0, T ], not necessarily of unit norm. We will show EF exp(θh) = 0. Todo so, note that by Lemma B.3.2,∞∑(θh) n∥ n! ∥ =2n=0==≤∞∑n=0∞∑n=0∞∑n=0∞∑n=0∑ ∞n=01 (E(θh)2n ) 1 2n!(1 (2n)!n! ‖h‖n 2 n n!) 12‖h‖ n ( ) 1(2n)!2√n! 2 n (n!) 2‖h‖ n√n!,where we have used that (2n)!2 n n!≤ n!.is convergent by the quotient test,<strong>and</strong> by completeness, we can conclude that ∑ ∞ (θh) nn=0 n!is convergent in L 2 (F T ). Sinceit also converges almost surely to exp(θh), this is also the limit in L 2 (F T ). Bythe Cauchy-Schwartz inequality, we find that F ∑ n (θh) kk=0 k!converges in L 1 (F T ) toF exp(θh), <strong>and</strong> there<strong>for</strong>eEF exp(θh) = limnEF‖h‖ n√n!n∑ (θh) k= 0.k!Since the variables of the <strong>for</strong>m exp(θh) where h ∈ L 2 [0, T ] <strong>for</strong>m a dense subset ofk=0


4.4 Orthogonal subspaces of L 2 (F T ) <strong>and</strong> L 2 (Π) 159L 2 (F T ) by Theorem B.6.2, their span <strong>for</strong>m a dense subspace of L 2 (F T ). Since F isorthogonal to this subspace, Lemma A.6.4 yields F = 0, as desired.Next, we introduce the orthonormal basis H n <strong>for</strong> the subspace H n . We are going toneed an orthonormal basis <strong>for</strong> L 2 [0, T ]. Since B[0, T ] is countably generated, LemmaB.4.1 yields that L 2 [0, T ] is separable, <strong>and</strong> there<strong>for</strong>e any orthonormal basis must becountable. Let (e n ) be such a basis, we will hold this basis fixed <strong>for</strong> the remainder ofthe section.We let I be the set of sequences with values in N 0 which are zero from a point onwards.We call the elements of I multi-indices. If a ∈ I, we put |a| = ∑ ∞n=1 a n <strong>and</strong> call |a| thedegree of the multi-index. We also define a! = ∏ ∞n=1 a n!. We let I n be the multi-indicesof degree n <strong>and</strong> let I m n be the multi-indices a of degree n such that a k = 0 wheneverk > m. We say that the multi-indices in I m n have order m.Definition 4.4.4. For any multi-index a of order n, we define Φ a = 1 √a!∏ ∞n=1 H a n(θe n ).We write H = {Φ a |a ∈ I} <strong>and</strong> H n = {Φ a ||a| = n}. We let H ′′ n denote the span of H n .We begin by showing that H is an orthonormal set.showing that H n is an orthonormal basis <strong>for</strong> H n .Afterwards, we will work onLemma 4.4.5. H is an orthonormal set in L 2 (F T ).Proof. We need to show that each Φ a has unit norm <strong>and</strong> that the elements of thefamily is mutually orthogonal.Unit norm. Let a multi-index a be given. Since (e n ) is orthonormal, the family(θe n ) are mutually independent. Recalling that a k is zero from a point onwards, usingLemma 3.7.4 <strong>and</strong> Lemma 4.4.2, we obtain(‖Φ a ‖ 2 1 ∏∞2 = E √ H an (θe n )a!= 1 a!= 1 a!= 1.n=1∞∏EH an (θe n ) 2n=1∞∏a n !‖e n ‖ 2n=1) 2


160 The Malliavin CalculusOrthogonality. Next, consider two multi-indices a <strong>and</strong> b with a ≠ b. We then find() ()1 ∏∞ 1 ∏∞〈Φ a , Φ b 〉 = E √ H an (θe n ) √ H bn (θe n )a! b!==1√a!√b!En=1n=1∞∏H an (θe n )H bn (θe n )n=11 ∏ ∞√ √a! b!n=1EH an (θe n )H bn (θe n ).Now, since there is some n such that a n ≠ b n , it follows from Lemma 4.4.2 that <strong>for</strong>this n, EH an (θe n )H bn (θe n ) = 0. There<strong>for</strong>e, the above is zero <strong>and</strong> Φ a <strong>and</strong> Φ b areorthogonal.In order to show that H n is an orthonormal basis <strong>for</strong> H n , we will need to add a few morespaces to our growing collection of subspaces of L 2 (F T ). Recall that P n denotes thepolynomials of degree less than or equal to n. That is, P n is the family of polynomialsof degree less than or equal to n in k variables <strong>for</strong> any k ≥ 1.Definition 4.4.6. We let P ′ n be the linear span of p(θh), where p ∈ P n is a polynomialin k variables <strong>and</strong> h ∈ (L 2 [0, T ]) k , <strong>and</strong> let P n be the closure of P ′ n.Lemma 4.4.7. Consider h ∈ (L 2 [0, T ]) n with h = (h 1 , . . . , h n ). Let (h k ) be a sequencewith h k = (h k 1, . . . , h k n) converging to h. Let f ∈ C 1 p(R n ). Then f(θh k ) converges tof(θh) in L 2 (F T ).Proof. This follows from Lemma B.3.4.Lemma 4.4.8. Let P n be the family of variables p(θe), where p ∈ P n is a polynomiumof k variables <strong>and</strong> e is a vector of distinct elements from the orthonormal basis (e n ).Then the closure of the span of P n is equal to P n .Proof. Obviously, P n ⊆ P n, ′ <strong>and</strong> there<strong>for</strong>e span P n ⊆ P n . We need to prove the otherinclusion. We first consider the case F = p(θh), where p ∈ P n is a polynomial of degreen in k variables <strong>and</strong> h = (h 1 , . . . , h k ) with coordinates in span {e i } i≥1 . We want toprove that F ∈ span P n . To do so, first observe that there exists an orthonormalsubset {g 1 , . . . , g m } of distinct elements from {e i } i≥1 such that h j = ∑ mi=1 λj i g i <strong>for</strong>some coefficients λ j i . We then obtain( m)∑m∑p(θh) = p λ 1 i g i , . . . , λ k i g i = q(g 1 , . . . , g m ),i=1i=1


4.4 Orthogonal subspaces of L 2 (F T ) <strong>and</strong> L 2 (Π) 161where q(x 1 , . . . , x m ) = p( ∑ mi=1 λ1 i x i, . . . , ∑ mi=1 λk i x i) is a polynomium of degree lessthan or equal to n in m variables. We conclude p(θh) ∈ P n . We have now proven thatp(θh) is in span P n when the coordinates of h are in span {e i } i≥1 .Next, consider F = p(θh), where h ∈ (L 2 [0, T ]) k , h = (h 1 , . . . , h k ). We need to provethat p(θh) is in span P n . To this end, note that since (e n ) is an orthonormal basis, thereexists sequences (h i j ) in span {e i} i≥1 <strong>for</strong> j ≤ k such that h i j converges to h j. Puttingh i = (h i 1, . . . , h i k ), we then find by Lemma 4.4.7 that p(θhi ) converges in L 2 to p(θh).Since we have already proven that p(θh i ) ∈ span P n , this shows that p(θh) ∈ span P n .By linearity, we may now conclude P ′ n ⊆ span P n <strong>and</strong> there<strong>for</strong>e also P n ⊆ span P n , asdesired.Lemma 4.4.9. For each n ≥ 0, P n = ⊕ n k=0 H k.Proof. We first show the easier inclusion ⊕ n k=0 H k ⊆ P n . Let k ≤ n. By Lemma A.5.6,the k’th Hermite polynomial is a polynomial of degree k, there<strong>for</strong>e an element of P n .Thus, H ′ k ⊆ P′ n ⊆ P n . Since P n is closed, this implies H k ⊆ P n , <strong>and</strong> we there<strong>for</strong>eimmediately obtain ⊕ n k=0 H k ⊆ P n , as desired.Now consider the other inclusion, P n ⊆ ⊕ n k=0 H k. From Theorem 4.4.3 <strong>and</strong> LemmaA.6.16, we have the orthogonal decomposition(∞⊕n⊕) ( ∞)⊕L 2 (F T ) = H k = H k ⊕ H k .k=0k=0k=n+1By Lemma A.6.14, we there<strong>for</strong>e find that to show P n ⊆ ⊕ ∞ k=0 H k, it will suffice toshow that P n is orthogonal to ⊕ ∞ k=n+1 H k. And to do so, it will by Lemma A.6.11 <strong>and</strong>Lemma A.6.2 suffice to show that P n is orthogonal to Hk ′ <strong>for</strong> any k > n.There<strong>for</strong>e, let k > n be given, <strong>and</strong> let ‖h‖ = 1. We will show that P n is orthogonalto H k (θh). To this end, it will suffice to show that P n ′ is orthogonal to H k (θh).Let F ∈ P n ′ with F = p(θg), where g = (g 1 , . . . , g m ) are in L 2 [0, T ]. Using thesame technique as in the proof of Lemma 4.4.8, we can assume that g 1 , . . . , g m areorthonormal. In fact, by the Gram-Schmidt orthonormalization procedure, we caneven assume g 1 = h.We now prove EF H k (θh) = 0. By Lemma A.5.6, we have, <strong>for</strong> any i ≥ 0,i∑x i = µ i jH j (x)j=0


162 The Malliavin Calculus<strong>for</strong> some coefficients µ i j ∈ R. Recalling that Im n denotes the multi-indicies of order mwith degree n, we then have, <strong>for</strong> some coefficients λ a ,There<strong>for</strong>e, we obtainp(θg) = ∑ ∏ mλ a (θg i ) a ia∈I m na∈I m ni=1= ∑ ∏m ∑a iλ a µ a ij H j(θg i )= ∑EF H k (θh) = ∑a∈I m na∈I m n∑a 1λ ai=1 j=0∑a 1λ aj 1 =0j 1 =0· · ·· · ·a m∑j m =0a m∑m∏j m =0 i=1EH k (θh)µ a ij iH ji (θg i ).m∏i=1µ a ij iH ji (θg i ).Now consider the expression EH k (θh) ∏ mi=1 µai j iH ji (θg i ), we wish to argue that this isequal to zero. Since g 1 , . . . , g m are orthonormal, θg 1 , . . . , θg m are independent. Wethere<strong>for</strong>e find, recalling that g 1 = h,EH k (θh)m∏i=1µ a ij iH ji (θg i ) = E(H k (θh)H j1 (θh))m∏i=2Eµ a ij iH ji (θg i ).Since |a| ≤ n, a 1 ≤ n <strong>and</strong> in particular j 1 ≤ n. Since k > n, we may conclude byLemma 4.4.2 that E(H k (θh)H j1 (θh)) = 0, <strong>and</strong> the above expression is there<strong>for</strong>e zero.Thus, EF H k (θh) = 0 <strong>and</strong> we conclude that P n is orthogonal to Hk ′ <strong>for</strong> k > n. As aconsequence of this, P n is orthogonal to ⊕ ∞ k=n+1 H k <strong>and</strong> there<strong>for</strong>e P n ⊆ ⊕ n k=0 H k, asdesired.Theorem 4.4.10. For each n, H n is an orthonormal basis of H n .Proof. From Lemma 4.4.5, we already know that H n is an orthonormal set, so it willsuffice to show that the span H ′′ n is dense in H n . Let K n be the closure of H ′′ n. Wewish to show K n = H n . To do so, we first prove P n = ⊕ n k=0 K k. Since Φ a <strong>for</strong> |a| = kis a polynomial trans<strong>for</strong>mation of degree k of variables θ(e i ), it is clear that H k ⊆ P n<strong>for</strong> any k ≤ n <strong>and</strong> there<strong>for</strong>e K k ⊆ P n . In particular, ⊕ n k=0 K k ⊆ P n . We need to provethe other inclusion.By Lemma 4.4.8, it will suffice to show P n ⊆ ⊕ n k=0 K k. There<strong>for</strong>e, let F ∈ P nwith F = p(θe), where e = (e k1 , . . . , e kn ) are distinct <strong>and</strong> p(x) = ∑ a∈I λ kx a is amn


4.4 Orthogonal subspaces of L 2 (F T ) <strong>and</strong> L 2 (Π) 163polynomial of degree n in m variables. By Lemma A.5.6, we have, <strong>for</strong> any n ≥ 0,x n = ∑ nk=0 µn k H k(x), <strong>for</strong> some coefficients µ n k∈ R. We obtainp(θ(e k1 ), . . . , θ(e kn )) = ∑a∈I m n= ∑a∈I m n= ∑a∈I m n= ∑λ aλ an∏i=1(θe ki ) a i∏n ∑a ii=1 j=0∑a 1λ a∑a 1a∈I m n j 1 =0j 1 =0· · ·· · ·µ a ij H j(θe ki )a n∑n∏j n =0 i=1a n∑j n =0(∏nλ ai=1µ a ij iH ji (θe ki )√ji !µ aij i) n∏i=11√ji ! H j i(θe ki ).Now, with |j| = ∑ nk=1 j k, the innermost product of the above is in H |j| . Since |a| ≤ n<strong>and</strong> j i ≤ a i <strong>for</strong> i ≤ n, |j| ≤ n. We may there<strong>for</strong>e conclude that the above is in thespan of ∪ n k=0 H k, there<strong>for</strong>e in ⊕ n k=0 K k.We have now shown P n = ⊕ n k=0 K k. Recall that from Lemma 4.4.9, we also haveP n = ⊕ n k=0 H k. We may then concludeP n−1 ⊕ K n =n⊕K k =k=0n⊕H k = P n−1 ⊕ H n .Now, since P n−1 = ⊕ n−1k=0 K k <strong>and</strong> P n−1 = ⊕ n−1k=0 H k, we conclude that K n <strong>and</strong> P n−1 areorthogonal <strong>and</strong> P n−1 <strong>and</strong> H n are orthogonal. There<strong>for</strong>e, by Lemma A.6.15, K n = H n<strong>and</strong> there<strong>for</strong>e span H n = cl H ′′ n = K n = H n , as desired.k=0Our results so far are the following: The subspaces H n are mutually orthogonal, hasorthonormal bases H n , <strong>and</strong> L 2 (F T ) = ⊕ ∞ n=0H n . We also have two subspaces H ′ n <strong>and</strong>H ′′ n which are dense in H n .We will now use the families H n , H ′ n, H ′′ n <strong>and</strong> H n to define analogous families H n (Π),H ′ n(Π), H ′′ n(Π) <strong>and</strong> H n (Π) in L 2 (Π). These sets will have properties much alike to theircounterparts in L 2 (F T ), <strong>and</strong> will eventually be linked to these through the Malliavinderivative.Definition 4.4.11. By H ′ n(Π), we denote the span of the variables of the <strong>for</strong>m F ⊗ h,where F ∈ H n <strong>and</strong> h ∈ L 2 [0, T ]. By H n (Π), we denote the closure of H ′ n(Π).


164 The Malliavin CalculusTheorem 4.4.12. Defining H n (Π) as the family of elements Φ a ⊗ e i , where |a| = n<strong>and</strong> i ∈ N, H n (Π) is an orthonormal basis <strong>for</strong> H n (Π). In particular, the span H ′′ n(Π)of H n (Π) is dense in H n (Π).Proof. It is clear that ‖Φ a ⊗ e i ‖ 2 = ‖Φ a ‖ 2 ‖e i ‖ 2 = 1. For any a, b ∈ I n <strong>and</strong> i, j ∈ N,the Fubini Theorem yields 〈Φ a ⊗ e i , Φ b ⊗ e j 〉 = 〈Φ a , Φ b 〉〈e i , e j 〉, so if a ≠ b or i ≠ j,Φ a ⊗ e i <strong>and</strong> Φ b ⊗ e j are orthogonal. Thus, H n (Π) is orthonormal.It remains to show span H n (Π) = H n (Π). To this end, it will suffice to show thatH ′ n(Π) ⊆ span H n (Π). There<strong>for</strong>e, consider F ∈ H n <strong>and</strong> h ∈ L 2 [0, T ]. We need todemonstrate F ⊗ h ∈ span H n (Π). Now, there are h n ∈ span {e i } i≥1 such that h ntends to h, <strong>and</strong> there<strong>for</strong>elimn‖F ⊗ h − F ⊗ h n ‖ 2 = limn‖F ‖ 2 ‖h − h n ‖ 2 = 0.Furthermore, there are F n ∈ span H n such that F n tends to F . Fixing n, we then findlimk‖F ⊗ h n − F k ⊗ h n ‖ 2 = limk‖F − F k ‖ 2 ‖h n ‖ = 0.Since F k ∈ span H n <strong>and</strong> h n ∈ span {e i } i≥1 , we obtain F k ⊗ H n ∈ span H n (Π). Now,since F k ⊗ h n tends to F ⊗ h n as k tends to infinity, F ⊗ h n ∈ span H n (Π). And sinceF ⊗ H n tends to F ⊗ h as n tends to infinity, F ⊗ h ∈ span H n (Π). This shows thestatement of the theorem.Theorem 4.4.13. The spaces H n (Π) are orthogonal <strong>and</strong> L 2 (Π) = ⊕ ∞ n=0H n (Π).Proof. Since the sets H n are orthogonal, it is clear that the sets H n (Π) are orthogonal.There<strong>for</strong>e, H n (Π) are orthogonal as well.In order to show that L 2 (Π) = ⊕ ∞ n=0H n (Π), as in Theorem 4.4.3 it will suffice to showthat the orthogonal complement of ∪ ∞ n=0H n (Π) is zero. There<strong>for</strong>e, let X ∈ L 2 (Π), <strong>and</strong>assume that X is orthogonal to H n (Π) <strong>for</strong> any n ≥ 0. In particular, X is orthogonalto F ⊗ h <strong>for</strong> any F ∈ H n <strong>and</strong> any h ∈ L 2 [0, T ]. Since L 2 (F T ) = ⊕ ∞ k=0 H k, we concludeby continuity of the inner product that X is orthogonal to F ⊗ h <strong>for</strong> any F ∈ L 2 (F T )<strong>and</strong> any h ∈ L 2 [0, T ]. By Lemma B.4.2, X is zero almost surely.Next, we will investigate the interplay between H n , H n (Π) <strong>and</strong> D. Here, our orthonormalbases H n will prove essential.


4.4 Orthogonal subspaces of L 2 (F T ) <strong>and</strong> L 2 (Π) 165Theorem 4.4.14. H ⊆ D 1,2 , <strong>and</strong> <strong>for</strong> any multi-index a,DΦ a =∞∑αke a k ,k=1where αk a = √ a ka!H ak −1(θe k ) ∏ ∞i≠k H a i(θe i ), with the convention that H −1 = 0. Theelements (DΦ a ) are mutually orthogonal, <strong>and</strong> ‖DΦ a ‖ 2 = |a|.Comment 4.4.15 Since a k is zero from some point onwards, the infinite product inαk a only has finitely many nontrivial factors, <strong>and</strong> is there<strong>for</strong>e well-defined. Similarly,we see that αk a is zero from some point onwards, <strong>and</strong> there<strong>for</strong>e the infinite sum in theexpression <strong>for</strong> DΦ a is well-defined, only having finitely many nontrivial terms. ◦Proof. We need to prove four things: That Φ a ∈ D 1,2 , the <strong>for</strong>m of DΦ a , orthogonality<strong>and</strong> the norm of ‖DΦ a ‖.Computation of DΦ a . Let a ∈ I <strong>and</strong> assume that a k = 0 <strong>for</strong> k > n. ThenΦ a = 1 √a!∏ nk=1 H a n(θe n ), so Φ a = f(H a1 (θe 1 ), . . . , H an (θe n )), where f : R n → Rwith f(x) = 1 √a!∏ nk=1 x k. By the chain rule, Φ a ∈ D 1,2 <strong>and</strong>DΦ a =====n∑ ∂f(H a1 (θe 1 ), . . . , H an (θe n )) DH ak (θe k )∂x k⎛⎞n∑⎝√ 1 ∏n H ai (θe i ) ⎠ a k H ak −1(θe k )e ka!k=1k=1n∑k=1i≠k⎛a√ k⎝H ak −1(θe k )a!n∑αke a kk=1∞∑αke a k ,k=1⎞n∏H ai (θe i ) ⎠ e kwhere we have used Lemma A.5.3. Note that in the case where a k = 0, H ak = 1, soH a ′ k= 0 = H −1 . The above computation is ther<strong>for</strong>e valid no matter whether a k iszero or nonzero.Orthogonality. Let a <strong>and</strong> b be multi-indices. Recall that the inner product onL 2 (Π) is denoted 〈·, ·〉 Π . We have, using Lemma 3.7.4 <strong>and</strong> recalling that in the sumsi≠k


166 The Malliavin Calculus<strong>and</strong> products, only finitely many terms <strong>and</strong> factors are nontrivial,〈Φ a , Φ b 〉 Π = E〈Φ a , Φ b 〉 [0,T ]〈∑ ∞ ∞∑〉= E αke a k , αke b k===k=1∞∑Eαkα a kbk=1k=1[0,T ]⎛⎞ ⎛⎞∞∑ a k b√ √ k∞∏∞∏E ⎝H ak −1(θe k ) H ai (θe i ) ⎠ ⎝H bk −1(θe k ) H bi (θe i ) ⎠a! b!k=1i≠k⎛⎞∞∑ a k b√ √ k∞∏(EH ak −1(θe k )H bk −1(θe k )) ⎝ EH ai (θe i )H bi (θe i ) ⎠ .a! b!k=1In particular, if a ≠ b, 〈Φ a , Φ b 〉 Π = 0 by Lemma 4.4.2, showing orthogonality of distinctΦ a <strong>and</strong> Φ b .i≠ki≠kNorm of DΦ a . Finally, we find‖Φ a ‖ 2 Π ===∞∑ a 2 ka! EH ∏∞a k −1(θe k ) 2 EH ai (θe i ) 2k=1∞∑k=1∞∑k=1= |a|.a 2 ka! (a k − 1)!a 2 k (a k − 1)!a k !∞∏a i !i≠ki≠kTheorem 4.4.14 has several corollaries which gives us a good deal of insight into thenature of the Malliavin derivative.Corollary 4.4.16. It holds that H n ⊆ D 1,2 , <strong>and</strong> <strong>for</strong> any F ∈ H n , ‖DF ‖ = √ n‖F ‖.Proof. First consider F ∈ H ′′ n with F = ∑ mk=1 λ kΦ ak , where a k ∈ I n <strong>for</strong> k ≤ m. ByLemma 4.4.14 <strong>and</strong> the Pythagoras Theorem, we obtain F ∈ D 1,2 <strong>and</strong>m∑m∑m∑m∑‖DF ‖ 2 = λ 2 k‖DΦ ak ‖ 2 = λ 2 k|a k | = n λ 2 k = n λ 2 k‖Φ ak ‖ 2 = n‖F ‖ 2 .k=1k=1k=1k=1


4.4 Orthogonal subspaces of L 2 (F T ) <strong>and</strong> L 2 (Π) 167We have now shown that H ′′ n ⊆ D 1,2 <strong>and</strong> that <strong>for</strong> F ∈ H ′′ n, ‖DF ‖ = √ n‖F ‖. We needto extend this to H n . There<strong>for</strong>e, let F ∈ H n be given, <strong>and</strong> let F n be a sequence in H ′′ nconverging to F . By what we already have shown, ‖DF n − DF m ‖ = √ n‖F n − F m ‖.Since (F n ) is a cauchy sequence, we conclude that DF n is a cauchy sequence as well.By completeness, there exists a limit X of DF n . By the closedness of D, we concludethat F ∈ D 1,2 <strong>and</strong> DF = X. We then have‖DF ‖ = ‖ lim DF n ‖ = lim ‖DF n ‖ = √ n lim ‖F n ‖ = √ n‖F ‖.This shows the claims of the corollary.Corollary 4.4.17. D maps H n into H n−1 (Π).Proof. Let Φ a ∈ H n be given. By Lemma 4.4.14,DΦ a =∞∑αke a k ,k=1where αk a = √ a ka!H ak −1(θe k ) ∏ ∞i≠k H a i(θe i ). By inspection, whenever αk a ≠ 0 we haveαk ae k ∈ H n−1(Π) ′′ ⊆ H n−1 (Π). By Corollary 4.4.16, D is continuous on H n . SinceH n−1 (Π) is a closed linear space, the result there<strong>for</strong>e extends from H n to H n .Corollary 4.4.18. The images of H n <strong>and</strong> H m under D are orthogonal whenevern ≠ m.Proof. This follows by combining Corollary 4.4.17 <strong>and</strong> Theorem 4.4.13.We have now introduced all of the subspaces <strong>and</strong> orthogonal sets described in thediagram in the beginning of the section, <strong>and</strong> we have discussed their basic properties.We are now ready to begin work on a characterisation of D 1,2 in terms of the projectionson the subspaces H n . Let P n be the orthogonal projection in L 2 (F T ) on H n , <strong>and</strong> letPnΠ be the orthogonal projection in L 2 (Π) on H n (Π).Lemma 4.4.19. If F ∈ H ′′ n <strong>and</strong> h ∈ L 2 [0, T ], F (θh) ∈ H n−1 ⊕ H n+1 .Proof. We prove the result by first considering some simpler cases <strong>and</strong> then extendingusing density arguments.


168 The Malliavin CalculusStep 1: The case F ∈ H n <strong>and</strong> h = e p . First consider the case where F ∈ H n withF = Φ a <strong>and</strong> h = e p <strong>for</strong> some p ∈ N. Supposing that a k = 0 <strong>for</strong> k > m, we can writeΦ a = √ 1 ∏ ma! i=1 H a i(θe i ). In the case p ≤ m, we find, using Lemma A.5.4,Φ a θ(e p ) ===1√a!H ap (θe p )θ(e p ) ∏ i≠pH ai (θe i )1 (√ Hap+1(θe p ) + a p H ap−1(θe p ) ) ∏H ai (θe i )a!1√ H ap+1(θe p ) ∏ H ai (θe i ) + √ 1 a p H ap−1(θe p ) ∏ H ai (θe i ),a!i≠pa!i≠pi≠pwhich is in H n−1 ⊕ H n+1 . If p > m, we obtain Φ a θ(e p ) = 1 √a!H 1 (θe p ) ∏ mi=1 H a i(θe i ),yielding Φ a θ(e p ) ∈ H n+1 ⊆ H n−1 ⊕ H n+1 .Step 2: The case F ∈ H n <strong>and</strong> h ∈ L 2 [0, T ]. By linearity, it is clear that the resultextends to F = Φ a <strong>and</strong> h in the span of {e i } i≥1 . Consider a general h ∈ L 2 [0, T ], <strong>and</strong>let h n be in the span of {e i } i≥1 , converging to h. Now, the mapping()1 ∏mf(x) = √ H ai (x i )a!i=1x m+1is in C ∞ p (R m+1 ), <strong>and</strong> furthermore we have the equalities Φ a θ(h) = f(θe 1 , . . . , θe m , θh)<strong>and</strong> Φ a θ(h n ) = f(θe 1 , . . . , θe m , θh n ). There<strong>for</strong>e, Lemma 4.4.7 yields that Φ a θ(h n )converges in L 2 (F T ) to Φ a θ(h). Since H n−1 ⊕ H n+1 is a closed subspace, we mayconclude Φ a θ(h) ∈ H n−1 ⊕ H n+1 .Step 3: The case F ∈ H ′′ n <strong>and</strong> h ∈ L 2 [0, T ]. The remaining extension followsdirectly by linearity as in the previous step.Lemma 4.4.20. If X ∈ H n (Π) <strong>and</strong> h ∈ L 2 [0, T ], 〈X, h〉 [0,T ] ∈ H n .Proof. Let h ∈ L 2 [0, T ]. First note that <strong>for</strong> any F ∈ H n <strong>and</strong> h ′ ∈ L 2 [0, T ], we find〈F ⊗ h ′ , h〉 [0,T ] = F 〈h ′ , h〉 ∈ H n , showing the result in this simple case. Since themapping X ↦→ 〈X, h〉 [0,T ] is linear <strong>and</strong> continuous <strong>and</strong> H n is a closed subspace, theresult extends to X ∈ H n (Π).Theorem 4.4.21. Let F ∈ D 1,2 . Then P Π n DF = DP n+1 F .Proof. Our plan is to show that 〈Z, DF 〉 = 〈Z, DP n+1 F 〉 <strong>for</strong> any Z ∈ H n (Π) <strong>and</strong> thenuse Lemma A.6.18 to obtain the conclusion. Showing this equality in full generality


4.4 Orthogonal subspaces of L 2 (F T ) <strong>and</strong> L 2 (Π) 169will be done in two steps, first demonstrating it in a simple case <strong>and</strong> then extendingby linearity <strong>and</strong> continuity.Step 1: The equality <strong>for</strong> G ⊗ h. Let G ∈ H ′′ n <strong>and</strong> let h ∈ L 2 [0, T ]. We will showthat 〈G ⊗ h, DF 〉 = 〈G ⊗ h, DP n+1 F 〉. Since H ′′ n ⊆ S h , we can use Corollary 4.3.11<strong>and</strong> obtain〈G ⊗ h, DF 〉 = E(G〈DF, h〉 [0,T ] )= EF G(θh) − EF 〈DG, h〉 [0,T ] .Using the orthogonal decomposition of L 2 (F T ) proven in Theorem 4.4.3, LemmaA.6.12 yields F = ∑ ∞k=0 P kF , where the limit is in L 2 (F T ). Using the Cauchy-Schwartz inequality, we then obtain the two results∞∑‖G(θh)P k F ‖ 1 ≤ ‖G(θh)‖ 2k=0∞ ∑k=0‖P k F ‖ 2 < ∞∞∑‖〈DG, h〉 [0,T ] P k F ‖ 1 ≤ ‖〈DG, h〉 [0,T ] ‖ 2k=0∞ ∑k=0‖P k F ‖ 2 < ∞,so ∑ ∞k=0 G(θh)P kF <strong>and</strong> ∑ ∞k=0 〈DG, h〉 [0,T ]P k F are convergent in L 1 . Since convergencein L p implies convergence in probability, we conclude by uniqueness of limits that∞∑∞∑G(θh)P k F = G(θh) P k Fk=0k=0∞∑∑∞〈DG, h〉 [0,T ] P k F = 〈DG, h〉 [0,T ] P k F,k=0where the limits on the left are in L 1 (F T ) <strong>and</strong> the limits on the right are in L 2 (F T ).This impliesk=0E(G〈DF, h〉 [0,T ] ) = EF G(θh) − EF 〈DG, h〉 [0,T ]( ∞) (∑∞)∑= E G(θh)P k F − E 〈DG, h〉 [0,T ] P k F=k=0k=0k=0∞∑∞∑EG(θh)P k F − E〈DG, h〉 [0,T ] P k F,where we have used the L 1 -convergence to interchange sum <strong>and</strong> integral. By assumption,G ∈ H ′′ n, so DG ∈ H n−1 (Π) by Corollary 4.4.17, <strong>and</strong> 〈DG, h〉 [0,T ] ∈ H n−1 byLemma 4.4.20. Furthermore, by Lemma 4.4.19, Gθ(h) ∈ H n−1 ⊕ H n+1 . By orthogo-k=0


170 The Malliavin Calculusnality of the H n , we then find∞∑∞∑EG(θh)P k F − E〈DG, h〉 [0,T ] P k Fk=0k=0= EG(θh)P n−1 F + EG(θh)P n+1 F − E〈DG, h〉 [0,T ] P n−1 F.By the integration-by-parts <strong>for</strong>mula of Corollary 4.3.11,EG(θh)P n−1 F − E〈DG, h〉 [0,T ] P n−1 F = EG〈DP n−1 F, h〉 [0,T ] ,which is zero since G ∈ H ′′ n <strong>and</strong> 〈DP n−1 F, h〉 [0,T ] ∈ H n−2 by Lemma 4.4.19. Since alsoE〈DG, h〉 [0,T ] P n+1 F = 0 by the same lemma, we can use Corollary 4.3.11 once againto obtainEG(θh)P n−1 F + EG(θh)P n+1 F − E〈DG, h〉 [0,T ] P n−1 F= EG(θh)P n+1 F= EG(θh)P n+1 F − E〈DG, h〉 [0,T ] P n+1 F= EG〈DP n+1 F, h〉 [0,T ]= 〈G ⊗ h, DP n+1 F 〉.All in all, we have proven <strong>for</strong> G ∈ H ′′ n that〈G ⊗ h, DF 〉 = 〈G ⊗ h, DP n+1 F 〉.Step 2: Conclusions. Noting that both the left-h<strong>and</strong> side <strong>and</strong> the right-h<strong>and</strong> sidein the equality above is continuous <strong>and</strong> linear in G, we can extend it to hold notonly <strong>for</strong> G ∈ H n, ′′ but also <strong>for</strong> G ∈ H n . Since the variables of the <strong>for</strong>m G ⊗ h<strong>for</strong> G ∈ H n <strong>and</strong> h ∈ L 2 [0, T ] has dense span in H n (Π) by definition, we conclude,again by continuity <strong>and</strong> linearity of the inner product, that 〈Z, DF 〉 = 〈Z, DP n+1 F 〉<strong>for</strong> all Z ∈ H n (Π). Since DP n+1 F ∈ H n (Π) by Corollary 4.4.17, Lemma A.6.18finally yields that DP n+1 F is the orthogonal projection of DF onto H n (Π), that is,Pn Π DF = DP n+1 F .Theorem 4.4.22. Let F ∈ L 2 (F T ). F is in D 1,2 if <strong>and</strong> only if ∑ ∞n=1 n‖P nF ‖ 2 isfinite. In the affirmative case, DF = ∑ ∞n=0 DP nF , <strong>and</strong> in particular the norm is givenby ‖DF ‖ 2 = ∑ ∞n=1 n‖P nF ‖ 2 .Proof. First assume that F ∈ L 2 (F T ) <strong>and</strong> that ∑ ∞n=1 n‖P nF ‖ 2 2 < ∞. We have toshow that F ∈ D 1,2 <strong>and</strong> identify DF <strong>and</strong> ‖DF ‖ 2 . By Lemma A.6.12, F = ∑ ∞n=0 P nF .


4.4 Orthogonal subspaces of L 2 (F T ) <strong>and</strong> L 2 (Π) 171Define F k = ∑ kn=0 P nF , then F k converges to F in L 2 (F T ). We will argue that DF kconverges as well, this will imply F ∈ D 1,2 .Obviously, P n F ∈ H n , so by Corollary 4.4.16, P n F ∈ D 1,2 <strong>and</strong> ‖DP n F ‖ 2 = n‖P n F ‖ 2 .There<strong>for</strong>e, we find ∑ ∞n=0 ‖DP nF ‖ 2 = ∑ ∞n=0 n‖P nF ‖ 2 < ∞, showing that the series∑ ∞n=0 DP nF is convergent in L 2 (F T ), meaning that the sequence DF n is convergent.By closedness of D, we conclude F ∈ D 1,2 <strong>and</strong> DF = ∑ ∞n=0 DP nF . By Corollary4.4.18, DP n F <strong>and</strong> DP m F are orthogonal whenever n ≠ m. We may there<strong>for</strong>e alsoconclude ‖DF ‖ 2 = ∑ ∞n=0 ‖DP nF ‖ 2 = ∑ ∞n=1 n‖P nF ‖ 2 .It remains to show that ∑ ∞n=1 n‖P nF ‖ 2 < ∞ is also necessary to obtain F ∈ D 1,2 .Assume that F ∈ D 1,2 , we need to prove that the sum is convergent. To this end, notethat using the orthogonal decomposition of L 2 (Π) from Theorem 4.4.13 <strong>and</strong> applyingLemma A.6.12 <strong>and</strong> Theorem 4.4.21,∞∑∞∑∞∑DF = Pn Π DF = DP n+1 F = DP n F.n=0n=0In particular, the sum on the right is convergent, <strong>and</strong> there<strong>for</strong>e ∑ ∞n=1 ‖DP nF ‖ 2 is finite.By Corollary 4.4.16, ‖DP n F ‖ 2 = n‖P n F ‖ 2 , <strong>and</strong> we conclude that ∑ ∞n=1 n‖P nF ‖ 2is finite, as desired.n=1Theorem 4.4.22 is the promised characterisation of D 1,2 in terms of the subspaces H n .As our final result on the Malliavin calculus, we will show how the theorem can beused to obtain an extension of the chain rule <strong>for</strong> Lipschitz mappings. We are going toemploy some weak convergence results, see Appendix A.6 <strong>for</strong> an overview.Lemma 4.4.23. Let F n be a sequence in D 1,2 converging to F <strong>and</strong> assume that DF nis bounded in L 2 (Π). Then F ∈ D 1,2 , <strong>and</strong> there is a subsequence of DF n convergingweakly to DF .Proof. We will use Theorem 4.4.22 to argue that F ∈ D 1,2 . To do so, we need to provethat ∑ ∞m=1 m‖P mF ‖ 2 is finite. To this end, first note that since DF n is boundedin L 2 (Π), by Theorem A.6.21 there exists a subsequence DF nk converging weakly tosome α ∈ L 2 (Π). Convergence of F nk to F <strong>and</strong> Corollary 4.4.16 allows us to conclude∞∑m‖P m F ‖ 2 =m=1∞∑lim m‖P m F nk ‖ 2 =km=1∞∑lim ‖DP m F nk ‖ 2 .km=1


172 The Malliavin CalculusNow note that P m F nk is a sequence in H m which converges to P m F since P m iscontinous, being an orthogonal projection. By Corollary 4.4.16, D is a continuousmapping on H m , so we find that DP m F nk converges to DP m F . On the other h<strong>and</strong>,by Theorem 4.4.21, DP m F nk = Pm−1DF Π nk . Since DF nk converges weakly to α,Pm−1DF Π nk converges weakly to Pm−1α Π by Lemma A.6.23. Since ordinary convergenceimplies weak convergence by Lemma A.6.20 <strong>and</strong> weak limits are unique by LemmaA.6.19, we obtain DP m F = Pm−1α. Π All in all, we conclude that DP m F nk convergesto Pm−1α. Π There<strong>for</strong>e, the norms converge as well. Thus, using Lemma A.6.12,∞∑lim ‖DP m F nk ‖ 2 =km=1∞∑‖Pm−1α‖ Π 2 = ‖α‖ 2 ,which is of course finite. Theorem 4.4.22 now yields F ∈ D 1,2 . Finally, note thatP Π mDF = DP m+1 F = P Π mα <strong>for</strong> any m ≥ 0, so by Lemma A.6.12, DF = α. Thismeans that DF nk converges weakly to DF .Corollary 4.4.24. Let F n be a sequence in D 1,2 converging to F <strong>and</strong> assume thatDF n is bounded in L 2 (Π). Then F ∈ D 1,2 , <strong>and</strong> DF n converges weakly to DF .m=1Proof. By Lemma 4.4.23, F ∈ D 1,2 . We need to show that DF n converges weakly toDF . To this end, it will suffice to show that <strong>for</strong> any subsequence DF nk , there is afurther subsequence DF nkl converging to DF . There<strong>for</strong>e, let a subsequence DF nk begiven. Then F nk converges to F <strong>and</strong> DF nk is bounded in L 2 (Π), so Lemma 4.4.23 showsthat there is a subsequence F nkl converging to DF . This concludes the proof.Theorem 4.4.25 (Lipschitz chain rule). Let F ∈ D n 1,2, F = (F 1 , . . . , F n ), <strong>and</strong> letϕ : R n → R be a Lipschitz mapping with Lipschitz constant K with respect to ‖ · ‖ ∞ .Then ϕ(F ) ∈ D 1,2 , <strong>and</strong> there is a r<strong>and</strong>om variable G = (G 1 , . . . , G n ) with values inR n <strong>and</strong> ‖G k ‖ 2 ≤ K such that Dϕ(F ) = ∑ nk=1 G kDF k .Proof. By Lemma A.4.2, there exists a sequence of mappings ϕ n ∈ C ∞ (R n ) withpartial derivatives bounded by K converging uni<strong>for</strong>mly to ϕ. By the ordinary chain ruleof Theorem 4.3.1, Dϕ n (F ) = ∑ n ∂ϕ nk=1 ∂x k(F )DF k . Now, since ϕ n converges uni<strong>for</strong>mlyto ϕ, ϕ n (F ) converges in L 2 (F T ) to ϕ(F ). On the other h<strong>and</strong>, we haven∑∥so the sequence ∑ n ∂ϕ nk=1ϕ(F ) ∈ D 1,2 <strong>and</strong> ∑ nk=1k=1∂ϕ n∂x k(F )DF k∥ ∥∥∥∥2≤ Kn∑‖DF k ‖ 2 ,k=1∂x k(F )DF k is bounded in L 2 (Π). By Corollary 4.4.24 we obtain∂ϕ n∂x k(F )DF k converges weakly to Dϕ(F ). If we can identify


4.4 Orthogonal subspaces of L 2 (F T ) <strong>and</strong> L 2 (Π) 173the weak limit as being of the <strong>for</strong>m stated in the theorem, we are done. To this end,note that since the sequence ∂ϕ n∂x k(F ) is pointwisely bounded by K, it is bounded inL 2 (F T ) by K as well. There<strong>for</strong>e, by Theorem A.6.21 there exists a subsequence suchthat <strong>for</strong> all k ≤ n, ∂ϕ nm∂x k(F ) converges weakly to some G k ∈ L 2 (F T ). From LemmaA.6.22 we know that ‖G k ‖ 2 ≤ K, but we would like a pointwise bound instead. Toobtain this, let A ∈ F T . We then obtainE1 A G 2 k = 〈1 A G k , G k 〉〈= lim 1 A G k , ∂ϕ 〉n m(F )m ∂x k= E1 A G k∂ϕ nm∂x k(F )≤ K1 A |G k |.This shows in particular that G 2 k≤ K|G k| almost surely, <strong>and</strong> there<strong>for</strong>e |G k | ≤ Kalmost surely. Now, <strong>for</strong> any bounded element X of L 2 (Π), we find〈X, ∂ϕ n m∂x k(F )DF k − G k DF k〉〈 ( ) 〉∂ϕnm= X, (F ) − G k DF k∂x k( ∫ T( ) )∂ϕnm= E X(t) − G k DF k (t) dt0 ∂x k( ∫ )T(∂ϕnm )= E X(t)DF k (t) dt − G k .∂x k0Since X is bounded <strong>and</strong> DF k ∈ L 2 (Π), we conclude by Jensen’s inequality that∫ T0 X(t)DF k(t) dt ∈ L 2 (F T ). By the weak convergence of ∂ϕ nm∂x kto G k , the abovethere<strong>for</strong>e tends to zero. We may now conclude, still letting X ∈ L 2 (Π) be bounded,〈X, Dϕ(F ) −n∑〉n∑G k DF k = lim〈X,mk=1k=1∂ϕ nm∂x k(F )DF k −n∑k=1∂ϕ nm∂x k(F )DF k〉= 0.Since the bounded elements of L 2 (Π) are dense, by Lemma A.6.2 <strong>and</strong> A.6.13, weconclude Dϕ(F ) = ∑ nk=1 G kDF k . This proves the theorem.This concludes our exposition of this Malliavin calculus. In the final next section, wegive a survey of the further theory of the Malliavin calculus.


174 The Malliavin Calculus4.5 Further theoryOur description of the Malliavin calculus in the preceeding sections only covers thebasics of theory, <strong>and</strong> is in fact quite inadequate <strong>for</strong> applications. We will now describesome of the further results of the theory <strong>and</strong> outline the main theoretical areas whereMalliavin calculus finds application. All of these areas are covered in more or lessdetail in Nualart (2006).The Skorohod integral. The largest omission in our exposition is clearly that ofthe Skorohod integral. The Skorohod integral δ is defined as the adjoint operator ofthe Malliavin derivative D. This is a linear operator defined on a dense subspace S 1,2of L 2 (Π) mapping into L 2 (F T ), characterised by the duality relationship〈F, δu〉 FT= 〈DF, u〉 Π<strong>for</strong> any u ∈ S 1,2 . The existence of such an operator follows from the closedness ofD combined with Lemma 19.2 <strong>and</strong> Proposition 19.5 of Meise & Vogt (1997). TheSkorohod integral has two extremely important properties which connect it to thetheory of stochastic integration. First off, if u ∈ L 2 (Π) is progressively measurable, itholds thatδ(u) =∫ T0u s dW s ,where the integral on the right is the ordinary Itô stochastic integral. This justifies thename “Skorohod integral” <strong>for</strong> δ, <strong>and</strong> shows that the Skorohod integral is an extensionof the Itô stochastic integral. Inspired by this fact, we will use the notation ∫ T0 u sδW s<strong>for</strong> δ(u). The second important property is as follows. Writing D t F = (DF ) t , if uis such that u t ∈ D 1,2 <strong>for</strong> all t ≤ T <strong>and</strong> such that there is a measurable version of(t, s) ↦→ D t u s <strong>and</strong> a measurable version of ∫ T0 D tu s δW s , then∫ TD t u s δW s =0∫ T0D t u s δW s + u t .In the case where u is progressively measurable, this shows how to differentiate ageneral Itô stochastic integral.The properties of the Skorohod integral are usually investigated through the Itô-Wienerexpansion, which is a series expansion <strong>for</strong> variables F ∈ L 2 (F T ). The basic contentof the expansion is, with ∆ n = {t ∈ [0, T ] n |t 1 ≤ · · · ≤ t n }, that there exists squareintegrabledeterministic functions f n on ∆ n such that∞∑∫ T ∫ tn∫ t3∫ t2F = n! · · · f n (t) dW (t 1 ) · · · dW (t n ),n=00000


4.6 Notes 175where the convergence is in L 2 . Exp<strong>and</strong>ing a process on [0, T ] in this manner, one maythen identify a necessary <strong>and</strong> sufficient criterion on the mappings f n to ensure that theprocess is adapted. This can be used to prove the relationship between the Skorohodintegral <strong>and</strong> the Itô integral. All of this is described in Section 1.3 of Nualart (2006).Differentiation of SDEs. Being a stochastic calculus, it seems obvious that theMalliavin calculus should yield useful results when applied to the theory of stochasticdifferential equations. The main result on this topic is described in Section 2.2 ofNualart (2006) <strong>and</strong> basically states that under suitable regularity conditions, whentaking the Malliavin derivative of the solution to a stochastic differential equation,interchange of differentiation <strong>and</strong> integration is allowed. This result is of fundamentalimportance to the applications in financial mathematics.Regularity of densities. The Malliavin calculus can also be used to give abstractcriteria <strong>for</strong> when a r<strong>and</strong>om variable possesses a density with respect to the Lebesguemeasure, <strong>and</strong> criteria <strong>for</strong> when such a density possesses some degree of smoothness.This is shown in Section 2.1 of Nualart (2006). The basic results are Theorem 2.1.1<strong>and</strong> Theorem 2.1.4, yielding sufficient criteria in terms of the Malliavin derivativesof a vector variable <strong>for</strong> the existence <strong>and</strong> smoothness of a density with respect tothe Lebesgue measure. A further important result is found as Theorem 2.3.3, givinga sufficient criterion <strong>for</strong> the solution of a stochastic differential equation to have asmooth density.4.6 NotesWe will in this section review our results <strong>and</strong> compare them to other accounts of thesame results in the theory.Our exposition of the theory is rather limited in its reach, striving primarily <strong>for</strong> rigor<strong>and</strong> detail. The theory is based almost entirely on Section 1.1 <strong>and</strong> Section 2.1 ofNualart (2006). Nualart (2006) does not always provide many details, so the virtueof our results is mostly the added detail. The main difficulties has been to providedetails of the chain rule of Theorem 4.3.1 <strong>and</strong> the relations between the orthogonaldecompositions of L 2 (F T ) <strong>and</strong> L 2 (Π), culminating in the proof of Theorem 4.4.21.Further work has gone into the extension of the chain rule given in Theorem 4.3.5,which hopefully will be useful in the later development of the theory.


176 The Malliavin CalculusThe Malliavin calculus is a relatively young theory, originating in the 1970s, <strong>and</strong> eventhough the recent applications in finance such as the results in Fournié et al. (1999)seem to have motivated an increase in production of material related to the theory,there are still quite few sources <strong>for</strong> the theory.The most well-known book on the Malliavin calculus is Nualart (2006), which as notedhas been the fundamental source <strong>for</strong> all of the theory described in this chapter. Otherbooks on the topic are Bell (1987), Bass (1998), Malliavin (1997) <strong>and</strong> Sanz-Solé(2005).In general, these books are all, to put it bluntly, either not very rigorous or somewhatinaccessible. Nualart (2006) is clearly the best exposition of these, but is still difficultcompared to the literature available <strong>for</strong>, say, stochastic integration. We will nowdiscuss the merits <strong>and</strong> demerits of each book.The book by Bell (1987) is very small, <strong>and</strong> in general either defers proofs to otherworks or only gives sketches of proofs, making it difficult to use as an introduction.The book Bass (1998) does not have the Malliavin calculus as its main topic, <strong>and</strong>only spends the final chapter exploring it. Considering this, it cannot be blamed <strong>for</strong>not developing the theory in detail. Still, it is not useful as an introductory work.The work Sanz-Solé (2005) looks very useful at a first glance, with a nice layout <strong>and</strong>a seemingly pedagogical introduction. However, it also defers many proofs to otherbooks <strong>and</strong> often entirely skips proofs. This work cannot be recommended as an goodintroduction either.Another work on the Malliavin calculus is Malliavin (1997). This book is not onlyabout the Malliavin calculus, but touches upon a good deal of different topics. In anycase, the main problem with the book is that it is so impressively abstract that itseems almost made to be incomprehensible on purpose.This leaves Nualart (2006). While it, as noted earlier, is not very accessible, it isclearly superior to the other books. It is cast in a modern manner, <strong>and</strong> does not makethe theory unnecessarily abstract. Our account differs from that one by considering aone-dimensional Brownian motion instead of what is known as an isonormal gaussianprocess. The approach based on isonormal processes restricts the theory to the contextof Hilbert spaces. This would make the introductory results of Section 4.2 prettier,but would also make motivation <strong>for</strong> the definitions more difficult to comprehend.


4.6 Notes 177In general, what makes the book difficult is that it very often omits a great deal ofdetail. This scope of these omissions can be estimated by noting that the theory whichhave been developed in 40 pages here <strong>and</strong> at least 15 pages more in the appendix isdeveloped in only 10 pages in Nualart (2006).Complementing these books, there are also notes available on the internet documentingthe Malliavin calculus. Some of these are Zhang (2004), Bally (2003), Friz (2002)<strong>and</strong> Øksendal (1997). Of these, only Øksendal (1997) is useful.The notes Bally (2003) <strong>and</strong> Friz (2002) are very easy to identify as useless. Bothengage in what best can be described as some sort of mathematical name-dropping,often invoking Sobolev spaces, distribution theory <strong>and</strong> other theories, without theseinvocations ever resulting in any actual proofs. Furthermore, proofs are in generalheuristic or omitted. Zhang (2004) at a first glance looks very good. As with Sanz-Solé (2005), the layout is professional <strong>and</strong> the author quickly obtains an air of havinga good overview of the theory. However, on closer inspection most of the proofs haveconspicuously similar levels of detail as in Nualart (2006), thereby not really addinganything new.The final note we discuss is Øksendal (1997). This is by far the most accessible textavailable on the Malliavin calculus. It develops the theory in manner very differentfrom Nualart (2006), basing the definition of the Malliavin derivative <strong>and</strong> Skorohodintegral directly on the Wiener-Itô expansion. The proofs are in general given in detail<strong>and</strong> are very readable. Un<strong>for</strong>tunately, the note is short <strong>and</strong> does not cover so muchof the theory, so at some point one has to revert to other works. However, as anintroduction, it is most definitely very useful.Complementing these works is the <strong>for</strong>thcoming book Di Nunno et al. (2008). Itextends the theory of the Malliavin calculus to processes other than Brownian motion,but does also cover the Brownian case in detail. Even though the final work hereis based on Nualart (2006), Di Nunno et al. (2008) has been invaluable as a firstintroduction, both to the Malliavin calculus in general <strong>and</strong> also <strong>for</strong> the applications tofinance, because of its high level of detail <strong>and</strong> readability.


178 The Malliavin Calculus


Chapter 5<strong>Mathematical</strong> <strong>Finance</strong>In this chapter, we will develop the theory of arbitrage pricing <strong>for</strong> a very basic classof models <strong>for</strong> financial markets. Our goals is to find sufficient criteria <strong>for</strong> the marketsto be free of arbitrage, to apply these criteria to some simple models <strong>and</strong> to show howto calculate prices <strong>and</strong> sensitivities in these models.For ease of notation, we will in general use the shorth<strong>and</strong> dX t = µ dt + σ dW t <strong>for</strong>X t = X 0 + ∫ t0 µ t dt + ∫ t0 σ t dW t <strong>for</strong> some X 0 .5.1 Financial market modelsWe work in the context of a filtered probability space (Ω, F, P, F t ) with a n-dimensionalF t Brownian motion W . We assume that the usual conditions hold, in particular weassume that F t is the usual augmentation of the filtration generated by W . Furthermore,we assume that F = F ∞ = σ(∪ t≥0 F t ).We begin by defining the fundamental building blocks of what will follow. We willassume given a one-dimensional stochastic short rate process r <strong>and</strong> define the riskfreeasset price process B as the solution to dB t = r t B t dt with B 0 some positiveconstant, equivalent to putting B t = B 0 exp( ∫ t0 r s ds). We furthermore assume givenm financial asset price processes (S 1 , . . . , S m ), which we assume to be nonnegative


180 <strong>Mathematical</strong> <strong>Finance</strong>st<strong>and</strong>ard processes.Together, the short rate, the risk-free asset price process <strong>and</strong> the financial asset priceprocesses define a financial market model M. In this market, a portfolio strategy is apair of locally bounded progressive processes h = (h 0 , h S ), where h 0 is one-dimensional<strong>and</strong> h S is m-dimensional. The interpretation is that h 0 (t) denotes the number of unitsheld at time t in the risk-free asset B, <strong>and</strong> h S k(t) denotes the number of units heldat time t in the k’th risky asset S k . To a given portfolio strategy, we associate thevalue process V h , defined by V h (t) = h 0 (t)B(t) + ∑ mk=1 hS k (t)S k(t). We say that h isadmissible if V h is nonnegative. We say that the portfolio h is self-financing if it holdsthatm∑dV h (t) = h 0 (t) dB(t) + h S k (t) dS k (t).Here, the integrals are always well-defined because h is assumed to be locally bounded<strong>and</strong> progressive. The above essentially means that the change in value in the portfolioonly comes from profits <strong>and</strong> losses in the portfolio, there is no exogenous intake orouttake of wealth. That this is the correct criterion <strong>for</strong> no exogenous infusions ofwealth is not as obvious as it may seem. It is important that we are using the Itôintegral <strong>and</strong> not, say, the Stratonovich integral. For an argument showing that the Itôintegral yields the correct interpretation, see Chapter 6 of Björk (2004) <strong>and</strong> comparewith the Riemann approximations <strong>for</strong> the Itô <strong>and</strong> Stratonovich integrals of LemmaIV.47.1 <strong>and</strong> Lemma IV.47.3 in Rogers & Williams (2000b).k=1We are now ready to make the central definition of this section, the notion of anarbitrage opportunity.Definition 5.1.1. We say that an admissible <strong>and</strong> self-financing portfolio is an arbitrageopportunity at time T if the value process V h satisfies V0 h = 0, P (VT h ≥ 0) = 1<strong>and</strong> P (VTh > 0) > 0. If a market M contains no arbitrage opportunities at time T , wesay that it is T -arbitrage free. If a market contains no arbitrage opportunities at anytime, we say that it is arbitrage free.Comment 5.1.2 The notion of being arbitrage free is the basic requirement <strong>for</strong> amarket to be realistic. It it reasonable to state that most real-world markets arearbitrage-free most of the time. There<strong>for</strong>e, if we are considering a market which hasarbitrage, we are some distance away from anything useful <strong>for</strong> real-world modeling.The three requirements <strong>for</strong> h to constitute an arbitrage opportunity can be put inwords as:


5.1 Financial market models 1811. It must be possible to enter the strategy without cost.2. The strategy should never result in a loss of money.3. The stragegy should have a positive probability of resulting in a profit.Note that <strong>for</strong> a portfolio to be an arbitrage opportunity, it must be both self-financing<strong>and</strong> admissible. Clearly, a portfolio satisfying the three requirements above but whichis not self-financing does not say anything about whether our market is realistic, anyonecan generate a sure profit if exogenous money infusions are allowed. The reason wealso require admissibility is to rule out possibilities <strong>for</strong> “doubling schemes” - borrowingmoney until a profit turns out. For more on doubling schemes, see Steele (2000),Section 14.5 or Example 1.2.3 of Karatzas & Shreve (1998).◦We are interested in finding a sufficient criterion <strong>for</strong> a market to be arbitrage free.To <strong>for</strong>mulate our main result on this, we first need to introduce the concept of anormalized market. Given our financial instrument vector S <strong>and</strong> the risk-free asset B,we define the normalized instruments by Sk ′ (t) = S k(t)B(t) . The instruments S′ kcan bethought of as the discounted versions of S k .Lemma 5.1.3. S ′ kis a st<strong>and</strong>ard process with the dynamicsdS ′ k(t) = 1B(t) dS k(t) − S ′ k(t)r(t) dt.Proof. Since B is positive, we can use Itô’s lemma in the <strong>for</strong>m of Corollary 3.6.4 onthe C 2 (R 2 \ {0}) mapping (x, y) ↦→ x y<strong>and</strong> obtaindS ′ k(t) ===1B(t) dS k(t) − S k(t)B(t) 2 dB(t) − 12B(t) 2 d[S k, B] t + S k(t)B(t) 3 d[B] t1B(t) dS k(t) − S k(t)r(t)B(t) dtB(t)21B(t) dS k(t) − S k(t)r(t) ′ dt,as desired.Lemma 5.1.3 shows that the processes S ′ = (S ′ 1, . . . , S ′ m) also describe financial instrumentsin the sense introduced earlier in this section, the processes are nonnegativest<strong>and</strong>ard processes. In particular, we can consider the normalized market consisting


182 <strong>Mathematical</strong> <strong>Finance</strong>of a trivial risk-free asset B ′ = 1 <strong>and</strong> the normalized instruments S ′ . Our next lemmashows that to check arbitrage <strong>for</strong> the general market, it will suffice to consider thenormalized market.Lemma 5.1.4. The market with instruments S <strong>and</strong> risk-free asset B is T -arbitragefree if the normalized market with instruments S ′ <strong>and</strong> risk-free asset B ′ = 1 is T -arbitrage free.Proof. Assume that the normalized market is free of arbitrage at time T . Let theportfolio h = (h 0 , h S ) be an arbitrage opportunity at time T <strong>for</strong> the original market.We will work to obtain a contradiction. Let VSh be the value process when consideringh as a portfolio strategy <strong>for</strong> the original market. Let VS h be the value process when′considering h as a portfolio strategy <strong>for</strong> the normalized market. Our goal is to showthat the portfolio h also is an arbitrage under the normalized market.We first examine VS h ′. Note that since B′ = 1, we obtain()m∑VS h ′(t) = h0 (t) + h S k (t)S k(t) ′ = 1m∑h 0 (t)B(t) + h S k (t)S k (t)B(t)k=1k=1= V h S (t)B(t) .In particular, since B is positive, we find that VS h is nonnegative, so h is admissible′under the normalized market. To show that it is self-financing, we use Itô’s lemma<strong>and</strong> Lemma 5.1.3 to obtaindVS h ′(t) = 1B(t) dV S h (t) − V S h(t)B(t) 2 dB(t) − 12B(t) 2 d[V S h , B] t + V S h(t)B(t) 3 d[B] tm∑====1B(t) h0 (t) dB(t) + 1B(t)1B(t)1B(t)m∑k=1k=1k=1h S k (t) dS k (t) − V h S ′(t)B(t) dB(t)m∑h S k (t) dS k (t) − 1 (VhB(t) S ′(t) − h 0 (t) ) dB(t)m∑m∑h S k (t) dS k (t) − h S k (t)S k(t)r(t) ′ dtk=1h S k (t) dS ′ k(t),so h is self-financing under the normalized market. It remains to show that h isan arbitrage opportunity under the normalized market. Since h is an arbitrage <strong>for</strong>the original market, VS h(0) = 0, P (V S h(T ) ≥ 0) = 1 <strong>and</strong> P (V S h (T ) > 0) > 0. Nownote that since B(t) > 0 <strong>for</strong> all t ≥ 0, we obtain VS h ′(0) = 0, P (V S h ′(T ) ≥ 0) = 1k=1


5.1 Financial market models 183<strong>and</strong> P (VS h ′(T ) > 0) > 0. We conclude that h is an arbitrage opportunity under thenormalized market. Since this market was assumed arbitrage free, we have obtained acontradiction <strong>and</strong> must conclude that the original market is arbitrage free.Next, we prove our main results <strong>for</strong> a market to be free of arbitrage. The theorembelow is the primary criterion, with its corollaries giving criteria <strong>for</strong> special cases whichare easier to check in practice. We say that a proces M is a local martingale on [0, T ]if there exists a localising sequence τ n such that M τ nis a martingale on [0, T ] <strong>for</strong> eachn.Theorem 5.1.5. Let T > 0. If there exists a probability measure Q equivalent toP such that each instrument of the normalized market is a local martingale on [0, T ]under Q, then M is arbitrage-free at time T .Proof. By Lemma 5.1.4, it will suffice to show that the normalized market is arbitragefree at time T . There<strong>for</strong>e, assume that h is an arbitrage opportunity at time T underthe normalized market with value process VS h ′, we will aim to obtain a contradiction.Let Q be the probability measure equivalent to P which makes Sk ′ a local martingaleon [0, T ] <strong>for</strong> any k ≤ m. By definition of an arbitrage opportunity, we knowVS h ′(0) = 0,P (VS h ′(T ) ≥ 0) = 1,P (VS h ′(T ) > 0) > 0.Since P <strong>and</strong> Q are equivalent, we also have Q(VS h ′(T ) ≥ 0) = 1 <strong>and</strong> Q(V S h ′(T ) > 0) > 0.In particular, E Q VS h ′(0) = 0 <strong>and</strong> EQ VS h ′(T ) > 0. We will argue that h cannot be anarbitrage opportunity by showing that VS h ′ is a supermartingale on [0, T ] under Q,contradicting that E Q VS h ′(0) < EQ VS h ′(T ).Under the normalized market, the price of the risk-free asset is constant, so we finddVS h ′(t) = ∑ mk=1 hS k (t) dS′ k (t). Now, since P <strong>and</strong> Q are equivalent, S′ kis a st<strong>and</strong>ardprocess under Q according to Theorem 3.8.4. Because h is locally bounded, h isintegrable with respect to Sk ′ both under P <strong>and</strong> Q, <strong>and</strong> Lemma 3.8.4 then yields thatthe integrals are the same whether under P or Q.Under Q, each Sk ′ is a continuous local martingale on [0, T ]. There<strong>for</strong>e, the Q stochasticintegral process ∑ n∫ tk=1 0 hS k (s) dS′ k(s) is a continuous local martingale on [0, T ] underQ. Under P , the stochastic integral process ∑ n∫ tk=1 0 hS k (s) dS′ k (s) is equal to V S h ′(t).


184 <strong>Mathematical</strong> <strong>Finance</strong>Since the integrals agree, as noted above, we conclude that VS h is a continuous local′martingale on [0, T ] under Q. Because h is admissible, VS h ′ is nonnegative. Thus,under Q, VS h is a nonnegative continuous local martingale on [0, T ]. Using the same′argument as in the proof of Lemma 3.7.2, V h is then a Q supermartingale on [0, T ]. Inparticular, E Q VS h ′(t) is decreasing on [0, T ], in contradiction with our earlier conclusionthat E Q VS h ′(0) < EQ VS h ′(T ). There<strong>for</strong>e, there can exist no arbitrage opportunities inthe normalized market. The result follows.Comment 5.1.6 Note that it was our assumption that h is locally bounded whichensured that the integral of h both under P <strong>and</strong> Q was well-defined. This is thereason why we in our definitions have chosen only to consider locally bounded portfoliostrategies. The equivalent probability measure in the theorem making each instrumentof the normalized market a local martingale on [0, T ] is called an equivalent localmartingale measure <strong>for</strong> time T , or a T -EMM.◦The next two corollaries provide sufficient criteria <strong>for</strong> being arbitrage-free in the specialecase of a linear financial market, which we now define.Definition 5.1.7. A financial market is said to be linear if each S k has dynamicsdS k (t) = µ k (t)S k (t) dt +n∑σ ki (t)S k (t) dW i (t)where σ ki ∈ L 2 (W ), µ k ∈ L 1 (t), <strong>and</strong> S k (0) is some positive constant. Here, µ is knownas the drift vector process <strong>and</strong> σ is the volatility matrix process.i=1By Lemma 3.7.1, the price processes in a linear market has the explicit <strong>for</strong>m( ∫ tn∑∫ tS k (t) = S k (0) exp µ k (s) ds + σ ki (s) dWs i − 1 n∑∫ )tσ 22ki(s) ds ,0i=1unique up to indistinguishability. In particular, S is nonnegative, so it actually definesa financial market in the sense introduced earlier.Corollary 5.1.8. Assume that the market is linear <strong>and</strong> that there exists a process λ inL 2 (W ) n such that µ k (t) − r(t) = ∑ ni=1 σ ki(t)λ i (t) <strong>for</strong> any k ≤ m, λ ⊗ P almost surely.Assume further that with M t = − ∑ ni=1∫ t0 λ i(s) dW i s, E(M) is a martingale. Thenthe market is free of arbitrage, <strong>and</strong> there is a T -EMM Q such that the Q-dynamics on[0, T ] are given by dS k (t) = r(t)S k (t) dt + ∑ ni=1 σ ki(t)S k (t) dW i (t), where W is a F tBrownian motion under Q.0i=10


5.1 Financial market models 185Proof. We will use Theorem 5.1.5. Let T > 0 be given. Since E(M) is a martingale<strong>and</strong> E(M T ) = E(M) T , we obtain EE(M T ) ∞ = EE(M) T = 1. Recalling that F = F ∞ ,we can then define the measure Q on F by Q(A) = E P 1 A E(M T ) ∞ . Since E(M T ) ∞is almost surely positive <strong>and</strong> has unit mean, P <strong>and</strong> Q are equivalent. Now, with= −λ i (t)1 [0,T ] (t), we obtain M T = Y · W . There<strong>for</strong>e, by the Girsanov Theorem ofTheorem 3.8.3, we can define W by W k t = Wtk + ∫ t0 λ k(s)1 [0,T ] (s) ds <strong>and</strong> obtain thatW is an F t Brownian motion under Q.Y itWe claim that under Q, Sk ′using Lemma 5.1.3,is a local martingale on [0, T ]. To this end, we note that,===S ′ k(t) − S ′ k(0)∫ t0∫ t0∫ t(µ k (s) − r(s))S ′ k(s) ds +((µ k (s) − r(s) −n∑∫ ti=1 0σ ki (s)S ′ k(s) dW i s)n∑σ ki (s)λ i (s)1 [0,T ] (s) S k(s) ′ ds +i=1(µ k (s) − r(s))1 (T,∞) (s)S ′ k(s) ds +n∑∫ t0i=1 0n∑∫ ti=1 0σ ki (s)S ′ k(s) dW i s,σ ki (s)S ′ k(s) dW i swhere we have used that ∑ ni=1 σ ki(s)λ i (s)1 [0,T ] (s) = (µ k (s) − r(s))1 [0,T ] (s). Lettingτ n be a determining sequence making the latter integral above into a martingale, it isthen clear that (Sk ′ )τn is a martingale on [0, T ] under Q. Thus, Sk ′ is a local martingaleon [0, T ] under Q, so Q is a T -EMM. By Theorem 5.1.5, the market is free of arbitrage.Using the proof technique of Lemma 5.1.3, we find that the Q-dynamics of S k on [0, T ]areas desired.dS k (t) = B(t) dS k(t) ′ + S k (t)r(t) dt=n∑σ ki (t)S k (t) dW i k + S k (t)r(t) dt,i=1Comment 5.1.9 In the above, our assumption that F = F ∞ became important. Hadwe not known that F = F ∞ , we would only have obtained a measure on F ∞ equivalentto the restriction of P to F ∞ , <strong>and</strong> not a measure on all of F.The equation µ k (t)−r(t) = ∑ ni=1 σ ki(t)λ i (t) is called the market price of risk equations,<strong>and</strong> any solution λ is known as a market price of risk specification <strong>for</strong> the market. In


186 <strong>Mathematical</strong> <strong>Finance</strong>the one-dimensional case, we have a unique solution λ t = µ t−r tσ t, showing that λ canbe interpreted as the excess return per volatility. There<strong>for</strong>e, λ can in an intuitive sensebe said to measure the price of uncertainty, or risk, in the market.◦Corollary 5.1.10. Assume that the market is linear <strong>and</strong> only has one asset. Letµ denote the drift <strong>and</strong> let σ denote the volatility. Assume that the drift <strong>and</strong> shortrate are constant, <strong>and</strong> that σ is positive λ ⊗ P almost surely. If there is k ≤ n suchthat E exp( (µ−r)22∫ t01σ k (s) 2 ds) is finite <strong>for</strong> any t ≥ 0, the market is free of arbitrage,<strong>and</strong> there is a T -EMM such that the Q-dynamics on [0, T ] are given on the <strong>for</strong>mdS k (t) = r(t)S k (t) dt + ∑ ni=1 σ ki(t)S k (t) dW i (t), where W is a F t Brownian motionunder Q.Proof. We want to solve the market price of risk equations of Corollary 5.1.8. Sincewe only have one asset, they reduce ton∑µ − r = σ i (t)λ i (t).Let k be such that E exp( (µ−r)22∫ t0i=11σ k (s) 2 ds) is finite <strong>for</strong> any t ≥ 0. To simplify, wedefine λ i = 0 <strong>for</strong> i ≠ k. We are then left with the equation µ − r = σ k (t)λ k (t).Let A denote the subset of [0, ∞) × Ω where σ is positive. Since σ is progressive,A ∈ Σ π . Defining λ k (t, ω) = µ−rσ k (t,ω)whenever (t, ω) ∈ A <strong>and</strong> zero otherwise, λsatisfies the equations in the statement of Corollary 5.1.8. By that same corollary,in order <strong>for</strong> the market to be arbitrage free, it will then suffice to show that withM t = − ∑ ni=1∫ t0 λ i(s) dW i s, E(M) is a martingale, <strong>and</strong> to do so it will suffice to showthat E(M t ) is a uni<strong>for</strong>mly integrable martingale <strong>for</strong> any t ≥ 0. By the Novikov criterionof Theorem 3.8.10, this is the case if E exp( 1 2 [M t ] ∞ ) is finite <strong>for</strong> any t ≥ 0. But wehave( ) 1E exp2 [M t ] ∞= E exp( 12 [M] t( 1= E exp2∫ t0)( (µ − r)2= E exp2)λ k (s) 2 ds∫ t0)1σ k (s) 2 ds ,so that the Novikov criterion is satisfied follows directly from our assumptions. Theexistence of a Q measure with the given dynamics then follows from Corollary 5.1.8.We are now done with exploring sufficient criteria <strong>for</strong> no-arbitrage. Next, we considerpricing of claims in markets without arbitrage. For T > 0, we define a T -claim as a


5.1 Financial market models 187F T measurable, integrable variable X. The intuition behind this is that the owner ofthe claim receives the amount X at time T . Our goal is to argue <strong>for</strong> a reasonable priceof this contract at t ≤ T . We will not be able to do this in full generality, but willonly consider markets satisfying the criterion of Theorem 5.1.5. As discussed in thenotes, this actually covers all arbitrage-free markets, but we will not be able to showthis fact.Theorem 5.1.11. Let T > 0 <strong>and</strong> consider a market with a T -EMM Q. Let X be anyT -claim. Assume that X ≥ 0 <strong>and</strong> define( ∣ )B(t) ∣∣∣S m+1 (t) = E Q B(T ) X F t 1 [0,T ] (t) + B(t)B(T ) X1 (T,∞)(t).The market with instruments (S 1 , . . . , S m+1 ) is free of arbitrage at time T .Proof. We first argue that (S 1 , . . . , S m+1 ) is a well-defined financial market. Under Q,S ′ m+1 is obviously a martingale. Since we are working under the augmented Brownianfiltration, it has a continuous version. Since X is nonnegative, S ′ m+1(t) is almost surelynonnegative <strong>for</strong> ant t ≥ 0, <strong>and</strong> there<strong>for</strong>e there is a version of S ′ m+1 which is nonnegativein addition to being continuous. Thus, S ′ m+1 is a nonnegative st<strong>and</strong>ard process underQ. There<strong>for</strong>e, S m+1 is also a nonnegative st<strong>and</strong>ard process under Q, <strong>and</strong> so it is ast<strong>and</strong>ard process under P . Finally, we conclude that (S 1 , . . . , S m+1 ) defines a financialmarket.Under Q, each of the instruments S ′ 1, . . . , S ′ m are local martingales on [0, T ]. SinceS ′ m+1 trivially is a martingale on [0, T ] under Q, Theorem 5.1.5 yields that the marketwith instruments (S 1 , . . . , S m+1 ) is free of arbitrage at time T .Comment 5.1.12 The somewhat cumbersome <strong>for</strong>m of S m+1 , splitting up into a parton [0, T ] <strong>and</strong> a part on (T, ∞), is an artefact of considering markets with infinitehorizon.◦What Theorem 5.1.11 shows is that <strong>for</strong> any nonnegative claim, using E Q ( B(t)B(T ) X|F t)as the price <strong>for</strong> the claim X <strong>for</strong> t ≤ T is consistent with no arbitrage in the marketwhere the claim is traded. And since E Q ( B(T )B(T ) X|F T ) = X, buying the asset with priceprocess E Q ( B(t)B(T ) X|F t) actually corresponds to buying the claim, in the sense that atthe time of expiry T , one owns an asset with value X. Assuming that prices are linear,we conclude that a reasonable price <strong>for</strong> any T -claim <strong>for</strong> t ≤ T is E Q ( B(t)B(T ) X|F t), whereQ is a T -EMM.


188 <strong>Mathematical</strong> <strong>Finance</strong>We are now done with our discussion of general financial markets. Our results are thefollowing. We have in Theorem 5.1.5 obtained general criteria <strong>for</strong> a market to be freeof arbitrage, <strong>and</strong> we have obtained specialised criteria <strong>for</strong> the case of linear markets inthe corollaries. Afterwards, we have shortly argued, based on the result of Theorem5.1.11, what a reasonable price of a contingent claim should be modeled as. Note thatlack of uniqueness of T -EMMs can imply that the price of a T -claim is not uniquelydetermined. This will not bother us particularly, however.We will now proceed to discuss the practical matter of calculating the prices <strong>and</strong>sensitivities associated with contingent claims. We will begin by a very short reviewof some basic notions from the theory of SDEs <strong>and</strong> afterwards proceeed to go throughsome methods <strong>for</strong> Monte Carlo evaulation of prices <strong>and</strong> sensitivities. Having done so,we will apply these techniques to the Black-Scholes <strong>and</strong> Heston models.5.2 <strong>Stochastic</strong> Differential EquationsWe will review some fundamental definitions regarding SDEs of the <strong>for</strong>mdX k t = µ k (X t ) dt +n∑σ ki (X t ) dWt i ,where X 0 is constant, k ≤ m, W is a n-dimensional Brownian motion <strong>and</strong> the mappingsµ : R m → R m <strong>and</strong> σ : R m → R m×n are measurable. We also consider a deterministicinitial condition x ∈ R m . We write the SDE in the shorth<strong>and</strong>i=1dX t = µ(X t ) dt + σ(X t ) dW t ,the underlying idea being that the m-vector µ(X t ) is being multiplied by the scalardt, yielding a new m-vector, <strong>and</strong> the m × n matrix σ(X t ) is begin multiplied by then-vector dW t , yielding a m-vector, so that we <strong>for</strong>mally obtain⎛⎜⎝dX 1 t.dX m t⎞ ⎛⎟⎠ = ⎜⎝=⎛⎜⎝µ 1 (X t ).µ m (X t )⎞⎛⎟⎠ dt + ⎜⎝σ 11 (X t ) · · · σ 1n (X t ).µ 1 (X t ) dt + ∑ ni=1 σ 1i(X t ) dW i t.µ m (X t ) dt + ∑ ni=1 σ mi(X t ) dW i t.σ m1 (X t ) · · · σ mn (X t )⎞⎟⎠ ,⎞ ⎛⎟ ⎜⎠ ⎝dW 1 t.dW n t⎞⎟⎠


5.3 Monte Carlo evaluation of expectations 189in accordance with our first definition of the SDE.Analogously with the conventions <strong>for</strong> linear markets in Section 5.1, µ is called the driftcoefficient <strong>and</strong> σ is called the the volatility coefficient. We define a set-up as a filteredprobability space (Ω, F, P, F t ) satisfying the usual conditions <strong>and</strong> being endowed witha F t Brownian motion W . We say that the SDE has a weak solution if there <strong>for</strong> anyset-up exists a m-dimensional st<strong>and</strong>ard process X satisfying the SDE. We say that theSDE satisfies uniqueness in law if any two solutions on any two set-ups have the samedistribution. We say that the SDE satisfies pathwise uniqueness if any two solutionson the same set-up are indistinguishable.Furthermore, we say that the SDE has a strong solution if there exists a mappingF : R m × C([0, ∞), R n ) such that <strong>for</strong> any set-up with Brownian motion W <strong>and</strong> anyx ∈ R m , the process X x = F (x, W ) solves the SDE with initial condition x. We callF a strong solution of the SDE. If the SDE satisfies pathwise uniqueness, F (x, ·) isalmost surely unique when C([0, ∞)×R n ) is endowed with the Wiener measure. Whenthe SDE has a strong solution F , we can define the flow of the solution as the mappingφ : R m × C([0, ∞), R n ) × [0, ∞) → R m given by φ(x, w, t) = F (x, w) t . We say that theflow is differentiable if the mapping φ(·, w, t) is differentiable <strong>for</strong> all w <strong>and</strong> t.This concludes our review of fundamental notions <strong>for</strong> SDEs.5.3 Monte Carlo evaluation of expectationsAs we saw in the first section of this chapter, an arbitrage-free price of a contingentT -claim is given as a discounted expected value under the Q-measure. The problem ofevaluating prices can there<strong>for</strong>e be seen as a special case of the problem of evaluatingexpectations. As we shall see, the same is the case <strong>for</strong> the problem of evaluatingsensitivities of prices to changes in the model parameters. For this reason, we willnow review some basic Monte Carlo methods <strong>for</strong> evaluating expectations. Our basicresource is Glasserman (2003), in particular Chapter 4 of that work.Throughout the section, consider an integrable stochastic variable X. Our goal is tofind the expectation of X. The basis <strong>for</strong> the Monte Carlo methods of evalutatingexpectations is the strong law of large numbers, stating that if (X n ) is a sequence of


190 <strong>Mathematical</strong> <strong>Finance</strong>independent variables with the same distribution as X, then1nn∑k=1X ka.s.−→ EX,This means that if we can draw a sequence of independent numbers sampled from thedistribution of X <strong>and</strong> take their average, the result should approach the expectation ofX as the number of samples tend to infinity. Furthermore, we can measure the speedof convergence by the variance,V(1n)n∑X k = 1 ∑nn 2 V X k = 1 n V X.k=1k=1In reality, we can of course not truly draw independent r<strong>and</strong>om numbers, we are onlyable to generate “pseudor<strong>and</strong>om” sequences, numbers which behave r<strong>and</strong>omly in somesuitable sense. We will not discuss the implications of this observation, mostly becausethe pseudor<strong>and</strong>om number generators available today are of sufficiently high qualityto ensure that the topic is of little practical significance. For more in<strong>for</strong>mation on thefundamentals of generating r<strong>and</strong>om numbers, see Chapter 2 of Glasserman (2003).The methods which we will concentrate upon are how the basic application of the lawof large numbers can be modified to speed up the convergence to the expectation, inthe sense of reducing the variance calculated above. The methods we will consider are,ordered by increased potential efficiency improvement:1. Antithetic sampling.2. Control variates.3. Stratified sampling.4. Importance sampling.Antithetic sampling. The idea behind antithetic sampling is to even out deviationsfrom the sample mean in the i.i.d. sequence. As be<strong>for</strong>e, we let (X n ) be an i.i.d.sequence with the same distribution as X. Let (X ′ n) be another sequence, <strong>and</strong> assumethat the sequence (X n , X ′ n) is also i.i.d. Assume furthermore that EX ′ n = EX n .Thus, the pairs are independent, <strong>and</strong> each marginal in any pair has the mean EX,


5.3 Monte Carlo evaluation of expectations 191it is however possible that X n <strong>and</strong> X n ′ are dependent, in fact such dependence is thevery idea of antithetic sampling. We then find( n) ()1 ∑ n∑X k + X k′ = 1 1n∑X k + 1 n∑X ′ a.s.k −→ EX.2n2 n nk=1k=1Thus, sampling from each sequence <strong>and</strong> averaging yields a consistent estimator of themean. If the samples from Xk ′ on average are larger than the mean when the samplesfrom X k are smaller than the mean <strong>and</strong> vice versa, we could hope that this would evenout deviations from the mean <strong>and</strong> improve convergence. To make this concrete, let X ′be such that (X, X ′ ) has the common distribution of (X n , X n). ′ We then obtain( (1 ∑ nVX k +2nk=1n∑k=1X ′ k))k=1= 1 ∑nn 2 Vk=1= 1 n 2 n ∑k=1k=1(Xk + Xk′ )214 (2V X k + 2Cov(X k , X ′ k))= 12n V X + 12n Cov(X, X′ ).The first term is the usual Monte Carlo estimator variance. Thus, we see that antitheticsampling can reduce variance if Cov(X, X ′ ) is negative, precisely corresponding to thevalues of X ′ k on average being larger than the mean when the values of X k are smallerthan the mean <strong>and</strong> vice versa.As a simple example of antithetic sampling, let us assume that we wanted to calculatethe second moment of the unit uni<strong>for</strong>m distribution by antithetic sampling. We wouldneed to identify a two-dimensional distribution (X, X ′ ) such that X <strong>and</strong> X ′ both havethe means of the the second moment of the uni<strong>for</strong>m distribution <strong>and</strong> are negativelycorrelated. A simple possibility would be to let U be uni<strong>for</strong>mly distributed <strong>and</strong> defineX = U 2 <strong>and</strong> X ′ = (1 − U) 2 . X <strong>and</strong> X ′ are then both distributed as the squaresof uni<strong>for</strong>m distributions. We would intuitively expect X <strong>and</strong> X ′ to be negativelycorrelated, since large values of X corresponds to large values of U, corresponding tosmall values of X ′ . In this case, we can check that this is actually true, sinceCov(X, X ′ ) = Cov(U 2 , (1 − U) 2 ) = − 112 .In practice, we would there<strong>for</strong>e consider a sequence (U n ) of uni<strong>for</strong>m distributions <strong>and</strong>construct the expectation estimator( n)1 ∑ n∑Uk 2 + (1 − U k ) 2 ,2nk=1k=1


192 <strong>Mathematical</strong> <strong>Finance</strong>which would be a consistent estimator of the mean, <strong>and</strong> its variance would be, sinceV U 2 = 4 45 , (1 42n 45 − 12) 1 . The is an improvement over the ordinary Monte Carlo estimatorof a factor 445 ( 445 − 1 12 )−1 = 16. This means that <strong>for</strong> any n, the antitheticestimator would have a st<strong>and</strong>ard deviation four times smaller than the ordinary estimator.Usually, the efficiency boost of antithetic sampling is not much larger thanthis, actually it is usually somewhat more modest. However, whenever we are consideringr<strong>and</strong>om variables based on distributions with some kind of symmetry, suchas the uni<strong>for</strong>m distribution or the normal distribution, antithetic sampling is easy toimplement <strong>and</strong> almost always provides an improvement. There is there<strong>for</strong>e rarely anyreason not to use antithetic sampling in such situations.Control variates. Like antithetic sampling, the method of control variates operatesby attempting to balance samples which are below or above the true expectation bycorrelation considerations. Instead of adding an extra sequence of variables with thesame mean as the original sequence, however, the method of control variates adds anextra sequence of variables with some known mean which is not necessarily the sameas the mean of X.There<strong>for</strong>e, let X ′ be some variable with known mean, possibly correlated with X.Suppose that the sequence (X n , X n) ′ is i.i.d. with the common distribution being thesame as that of (X, X ′ ). With b ∈ R, We can then consider the estimator1n(∑ n)n∑X k − b (X k ′ − EX ′ ) .k=1k=1The law of large numbers applies to show that this yields a consistent estimator. Theidea behind the estimator is that if X k <strong>and</strong> Xk ′ are correlated, we can choose b suchthat the deviations from the mean of X k are evened out by the deviations from themean of Xk ′ . In the case of positive correlation, this would mean choosing a positiveb, since large values of X k then would be offset by large values of Xk ′ <strong>and</strong> vice versa.Likewise, in the case of negative correlation, we would expect that choosing a negativeb would yield the best results. We call Xk ′ the control variate.We will now analyze the variance of the control variate estimator. We find( ( n))1 ∑n∑V X k − b (X k ′ − EX ′ ) = 1 ∑nnn 2 V (X k − b(X k ′ − EX ′ ))k=1k=1k=1= 1 n (V X k + b 2 V (X ′ k) − 2bCov(X k , X ′ k))= 1 n V X + 1 n(b 2 V X ′ − 2bCov(X, X ′ ) ) .


5.3 Monte Carlo evaluation of expectations 193The first term is the variance of the ordinary Monte Carlo estimator. If the controlvariate estimator is to be useful, the second term must be negative, corresponding tob 2 V X ′ −2bCov(X, X ′ ) < 0. To find out when we can make sure this is the case, we firstidentify the optimal value of b. This is the value minimizing b 2 V X ′ − 2bCov(X, X ′ )over b. The <strong>for</strong>mula <strong>for</strong> the minimum of a quadratic polynomial yields the optimalvalue b ∗ = Cov(X,X′ )V X. The corresponding variance change is′(b ∗ ) 2 V X ′ − 2b ∗ Cov(X, X ′ ) = − Cov(X, X′ ) 2V X ′ .Since this is always negative, we conclude that, disregarding considerations of computingtime, control variates can only improve convergence, <strong>and</strong> there is an improvementwhenever X <strong>and</strong> X ′ are correlated. The factor of improvement of the control variateestimator to the ordinary estimator isV X1=V X − Cov(X,X′ ) 21 − Corr(X, X ′ ) 2 .V X ′This means that if, <strong>for</strong> example, the correlation is 1 2, the variance of the controlvariate estimator is a factor of 4 3larger than the variance of the ordinary Monte Carloestimator. If the correlation is 3 4 , it is a factor of 16 7larger <strong>and</strong> so on. The improvementgrows drastically as the correlation tends to one, with the limit corresponding to thedegenerated case where X ′ has the distribution of X, which is of course absurd sinceEX then would be known, making the estimation unnecessary.As an example, we can again consider the problem of numerically evaluating the secondmoment of the uni<strong>for</strong>m distribution. As we have seen, to find an effective controlvariate, we need to identify a variable which is highly correlated with the variableswe are using to estimate the second moment, <strong>and</strong> whose mean we know. Thus, let Ube uni<strong>for</strong>mly distributed <strong>and</strong> let X = U 2 . Assuming that we know that the mean ofthe uni<strong>for</strong>m distribution is 1 2 , we can use the control variate X′ = U. Obviously, X<strong>and</strong> X ′ should be quite highly correlated. Note that unlike the method of antitheticsampling, it does not matter whether the correlation is positive or negative, this istaken care of in the choice of coefficient. We find that the correlation isCorr(X, X ′ ) = Cov(U 2 , U)√V U2 √ V U = √4548 ,which is pretty close to one, leading to an reduction in variance of the estimator to1the ordinary estimator of a factor = 16. The optimal coefficient is1− 4548Cov(X, X ′ )V X ′ = Cov(U 2 , U)= 1.V U


194 <strong>Mathematical</strong> <strong>Finance</strong>The corresponding control variate estimator is, letting (U n ) be a sequence of i.i.d.(∑uni<strong>for</strong>ms, 1 nn k=1 U k 2 − ∑ n(k=1 Uk − 2)) 1 . As expected, very large values of Uk correspondto shooting over the true mean, <strong>and</strong> this is offset by the control variate. Sincethe control variate method in this case does not entail much extra calculation, wecan reasonably say that the associated computational time is the same as that ofthe ordinary Monte Carlo estimator. There<strong>for</strong>e, we appreciate that the control variatemethod provides an estimator with one fourth of the st<strong>and</strong>ard deviation of theordinary estimator with the same amount of computing time.In practice, the optimal coefficient cannot be calculated explicitly. Instead, one canapply the Monte Carlo simulations to estimate it, usingˆb∗ =∑ nk=1 (X k − X)(X ′ k − X′ )∑ nk=1 (X′ k − X′ ) 2 .The extra computational ef<strong>for</strong>t required from this is usually more than offset by thereduction in st<strong>and</strong>ard deviation.Stratified sampling. Stratified sampling is a sampling method where the samplespace of the distribution being sampled is partitioned <strong>and</strong> the number of samples fromeach part of the partition is being controlled. In this way, the deviation from the meancannot go completely awry.We begin by introducing a stratification variable, Y . Let A 1 , . . . , A m be disjoint subsetsof R such that P (Y ∈ ∪ m i=1 A i) = 1 <strong>and</strong> P (Y ∈ A i ) > 0 <strong>for</strong> all i. We call these sets thestrata. The idea is that instead of sampling from the distribution of X to obtain themean of X, we will sample from the distribution of X given Y ∈ A i .Let X i have the distribution of X given Y ∈ A i . We then find, putting p i = P (Y ∈ A i ),m∑m∑( )X1(Y ∈Ai)EX = EX1 (Y ∈Ai ) = p i E=p ii=1i=1m∑p i EX i ,so if we can sample from each of the conditional distributions, we can make MonteCarlo estimates of each term in the sum above <strong>and</strong> thereby obtain an estimate <strong>for</strong>EX. To make this concrete, let <strong>for</strong> each i (X ik ) be an i.i.d. sequence with thecommon distribution being the same as that of X i . All these variables are assumedindependent. Fix some n. We are free to choose how many samples to use from eachstrata. We will restrict ourselves to what is known as proportional allocation, meaningthat we choose to draw a fraction p i of samples from the i’th stratum. Assume <strong>for</strong>i=1


5.3 Monte Carlo evaluation of expectations 195simplicity that np i is an integer, we can then put n i = np i , obtain n = ∑ mi=1 n i <strong>and</strong>define the stratified Monte Carlo estimatorBy the law of large numbers,1nm∑ ∑n im∑X ik =i=1 k=11nm∑ ∑n iX ik .i=1 k=1∑npiiX ikni=1 ik=1a.s.−→so the estimator is consistent. Its variance is()1m∑ ∑n iVX ik = 1 ∑ m∑n inn 2 V X ik = 1 ni=1 k=1i=1 k=1m∑p i EX i = EXi=1m∑p i V X i .To compare this with the variance of the ordinary Monte Carlo estimator, define U byletting U = i if Y ∈ A i . Then, the distribution of Y given U = i is the distribution ofX i , <strong>and</strong> <strong>for</strong> the distribution of U we find P (U = i) = P (Y ∈ A i ) = p i . All in all, wemay conclude1ni=1m∑p i V X i = 1 n EV (X|U) = 1 n V X − 1 n V E(X|U),i=1where the first term is the variance of the ordinary Monte Carlo estimator. Sincethe second term is a negative scalar times a variance, it is always negative. There<strong>for</strong>e,disregarding questions of computation time, the convergence of the stratified estimatoris always at least as fast as that of the ordinary estimator.Let us review the components of the stratified sampling technique. We start out withX, the distribution whose mean we wish to identify. We then consider a stratificationvariable Y <strong>and</strong> strata A 1 , . . . , A m . It is the simultaneous distribution of (X, Y ) combinedwith the strata which determine how our stratified estimator will function. Wedefine i.i.d. sequences X ik such that X ik has the distribution of X given Y ∈ A i . Theestimator 1 ∑ m ∑ nin i=1 k=1 X ik is then consistent <strong>and</strong> has superior variance compared tothe ordinary Monte Carlo estimator.In practice, the critical part of this procedure is to certify that it is possible <strong>and</strong>computationally tractable to sample from the distribution of X given Y ∈ A i . Thisis often accomplished by ensuring that X = t(Y ) <strong>for</strong> some trans<strong>for</strong>mation t. In thatcase, the distribution of X given Y ∈ A i is just the distribution of Y given Y ∈ A itrans<strong>for</strong>med with t. If Y is a distribution such that it is easy to sample from Y


196 <strong>Mathematical</strong> <strong>Finance</strong>given Y ∈ A i , we have obtained a simple procedure <strong>for</strong> sampling from the conditionaldistribution. This is the case if, say, Y is a distribution with an easily computablequantile function <strong>and</strong> A i is an interval.To see an example of how stratified sampling can be used, we will use it to numericallycalculate the second moment of the uni<strong>for</strong>m distribution. Let U be uni<strong>for</strong>mlydistributed, we then wish to identify the mean of U 2 . Fix m ∈ N, we will use U asour stratification variable with the strata A i = ( i−1m , i m ]. We then find p i = 1 m <strong>and</strong>n i = n m . We need to identify the distribution of U 2 given U ∈ A i . We know thatthe distribution of U given U ∈ A i is the uni<strong>for</strong>m distribution on A i . There<strong>for</strong>e, U 2given U ∈ A i has the same distribution as ( i−1m+ U m )2 . Letting U ik be unit uni<strong>for</strong>mlydistributed i.i.d. sequences, we can write our stratified estimator asIts variance is1n= 1 n= 1 n= 1 n= 1 nm∑i=1m∑i=1m∑i=1m∑i=1m∑i=11nm∑i=1 k=1nm∑( i − 1m(1 i − 1m V m+ U ) 2im1m 5 V (i − 1 + U i) 2+ U ) 2ik.m1m 5 V ((i − 1)2 + 2(i − 1)U i + U 2 i )1 (4(i − 1) 2m 5 V U i + V Ui 2 + 4(i − 1)Cov(U i , Ui 2 ) )(1 (i − 1)2m 5 + 4 3 45 + i − 13= 1 (1 (m − 1)m(2m − 1)n m 5 18+ 4m45))(m − 1)(m − 2)+ ,3which is of order O(m −2 ) when m tends to infinity. This basically shows that we intheory can obtain as small a variance as we desire by picking as many strata as possible.In reality, of course, it is insensible to pick more strata than observations. The moralof the example, however, is the correct one: by using a large number of strata, a verydramatic reduction of variance is sometimes possible. An example of how stratifiedsampling can improve results can be seen in Figure 5.1, where we compare samplingfrom squared uni<strong>for</strong>ms with <strong>and</strong> without stratification. We see that the histogramon the right, using stratified sampling, is considerably closer to the true <strong>for</strong>m of thedistribution.


5.3 Monte Carlo evaluation of expectations 197Frequency0 1 2 3 4Frequency0 1 2 3 40.0 0.2 0.4 0.6 0.8 1.0Value0.0 0.2 0.4 0.6 0.8 1.0ValueFigure 5.1: Comparison of samples from a squared uni<strong>for</strong>m distribution with <strong>and</strong>without stratified sampling. On the left, 500 samples without stratification. On theright, 500 samples with 100 strata.The ease of implementation <strong>and</strong> potentially large improvements make stratified samplingan very useful method <strong>for</strong> improving convergence.Importance sampling. Importance sampling is arguably the method described herewhich has the largest potential impact. Its usefulness varies greatly from situationto situation, <strong>and</strong> if used carelessly, it can significantly reduce the effectivity of anestimator. On the other h<strong>and</strong>, in situations where no other method is applicable, itsometimes can be applied to obtain immense improvements.The underlying idea of importance sampling is very simple. It is most naturally describedin the context where X = h(X ′ ) <strong>for</strong> some variable X ′ <strong>and</strong> some mapping hwith sufficient integrability conditions. Let µ be the distribution of X ′ . Let Y ′ besome other variable with distribution ν <strong>and</strong> assume that µ ≪ ν. We can then use theRadon-Nikodym theorem to write∫∫Eh(X ′ ) = h(x) dµ(x) =h(x) dµ(dν dν(x) = E h(Y ′ ) dµ )dν (Y ′ ) .Letting Y n ′ be a i.i.d. sequence with common distibution being the same as that of Y ′ ,we can then define the importance sampling estimator as1n∑h(Y ′nk) dµdν (Y k).′k=1The law of large numbers yields that this is a consistent estimator of Eh(X ′ ).Its


198 <strong>Mathematical</strong> <strong>Finance</strong>variance can be larger or smaller than that of the ordinary Monte Carlo estimatordepending on the choice of ν. Results on the effectiveness of the importance samplingestimator as a function of the choice of ν can only be obtained in specialised situations.Furthermore, the objective of the importance sampling method will in fact often notbe to reduce variance, but rather to change the shape of a distribution when it issomehow suboptimal <strong>for</strong> the purposes of Monte Carlo estimation. Is is difficult todescribe quantitatively when this is the case, we will instead give a small example.Let X ′ be the st<strong>and</strong>ard exponential distribution <strong>and</strong> consider the problem of estimatingP (X ′ > α) <strong>for</strong> some large α. This means considering the trans<strong>for</strong>mation h given byh(x) = 1 (α,∞) (x). Let p = P (X > α), the variance of h(X ′ ) is p(1−p), which is prettysmall. In theory, then, the ordinary Monte Carlo estimator should yield relativelyeffective estimates. However, this is not actually the case.To see why the ordinary estimator often will underestimate the true probability, wecannot rely only on moment considerations, we need to analyze the distribution ofthe estimator in more detail. To this end, note that 1 ∑ nn k=1 h(X′ k) is binomially distributedwith length n, probability parameter p <strong>and</strong> scale 1 n. A fundamental problemwith this is that the binomial distribution is discrete, this immediately puts an upperlimit to the accuracy of the estimator. Also, the estimator will often underestimatethe true probability. To see how this is the case, simply note that the probability ofthe estimator being zero is (1 − p) n . When p is small compared to n, this can be verylarge. In the case α = 10, <strong>for</strong> example, we obtain p = e −10 , (1 − p) 10000 ≈ 63% <strong>and</strong>(1 − p) 100000 ≈ 6.3%. This means that with 10000 samples, our estimator will yieldzero in 63% of our trials. Using ten times as many simulations yields zero only 6.3%percent of the time, but the distribution will still very often be far from the correctprobability, which in this case is e −10 ≈ 0.0000453.We can use importance sampling to improve the efficiency of our estimator. We letthe importance sampling measure ν be the exponential distribution with scale λ. Wethen havedµdν (x) =e−x1λ e− x λ( (= λ exp −x 1 − 1 )),λ<strong>and</strong> with (Y n) ′ being an i.i.d. sequence with distribution ν, the importance samplingestimator is1n∑( (h(Y ′nk)λ exp −Y k′ 1 − 1 )).λk=1In Figure 5.2, we have demonstrated the effect of using the importance sampling estimator.We use α = 10 <strong>and</strong> compare the histograms of 1000 independent ordinary


5.4 Methods <strong>for</strong> evaluating sensitivities 199Monte Carlo estimators with 1000 independent importance sampling estimators, lettingλ = 10 <strong>and</strong> basing the estimators on 1000 samples.Frequency0 200 400 600 800Frequency0 20 40 60 80 100 120 140Frequency0 50 100 1500e+00 2e−04 4e−04 6e−04 8e−04 1e−030e+00 2e−04 4e−04 6e−04 8e−04 1e−030e+00 2e−05 4e−05 6e−05 8e−05 1e−04ValueValueValueFigure 5.2: Comparison of estimation of exponential probabilities with <strong>and</strong> withoutimportance sampling. From left to right, histogram of 1000 ordinary Monte Carloestimators, histogram of 1000 importance sampling estimators <strong>and</strong> finally, close-uphistogram of 1000 importance sampling estimators.Obviously, the ordinary Monte Carlo estimator cannot get very close to the true valueof p because of its discreteness. As expected, it hits zero most of the time, with a1few samples hitting1000. On the other h<strong>and</strong>, the importance sampling estimator cantake any value in [0, nc) <strong>for</strong> some c > 0, removing any limits on the accuracy arisingfrom discreteness. We see that even with only the 1000 samples which were completelyuseless in the ordinary Monte Carlo case, the importance sampling estimator hits thetrue probability very well most of the time. Furthermore, the computational overheadarising from the importance sampling technique is minimal. We conclude that withvery little ef<strong>for</strong>t, we have improved the ordinary estimator by an extraordinary factor.5.4 Methods <strong>for</strong> evaluating sensitivitiesNext, we turn our attention to methods specifically designed to evalutate sensitivitiessuch as the delta or gamma of a claim. We will cast our discussion in a generalframework, which will then be specialised to financial cases in the following sections.We assume that we have a family of integrable variables X(θ) indexed by an opensubset Θ of the real line. Our object is to underst<strong>and</strong> how to estimate ∆ θ = d dθ EX(θ),assuming that the derivative exists. We will consider five methods, being:


200 <strong>Mathematical</strong> <strong>Finance</strong>1. Finite difference.2. Pathwise differentiation.3. Likelihood ratio.4. Malliavin weights.5. Localisation.These methods in general trans<strong>for</strong>m the problem of estimating a derivative into estimatingone or more expectations. These expectations can then be evaluated by themethods described in Section 5.3. In the following we always assume that the mappingθ ↦→ EX(θ) is differentiable.Finite difference. The finite difference method aims to calculate the sensitivity byapproximating the derivative by a finite difference quotient <strong>and</strong> estimating this byMonte Carlo. Define α(θ) = EX(θ). Fixing θ <strong>and</strong> leting h > 0, our approximation isdefined byα(θ + h) − α(θ − h)ˆ∆ θ = .2hIntroducing i.i.d. variables (X n (θ + h), X n (θ − h)) such that X n (θ + h) <strong>and</strong> X n (θ − h)has the distributions of X(θ+h) <strong>and</strong> X(θ−h), respectively, but possibly are correlated,we can define the finite difference estimator asX n (θ + h) − X n (θ − h).2hThe law of large numbers yields X n(θ+h)−X n (θ−h)2ha.s.−→ ˆ∆ θ , so the estimator is in generalbiased. To analyze the bias, note that if α is C 2 , we can <strong>for</strong>m second-order expansionsof α at θ, yieldingα(θ + h) = α(θ) + α ′ (θ)h + 1 2 α′′ (θ)h 2 + o(h 2 )α(θ − h) = α(θ) − α ′ (θ)h + 1 2 α′′ (θ)h 2 + o(h 2 ),so that in this case, the bias has the <strong>for</strong>mˆ∆ θ − α ′ (θ) =α(θ + h) − α(θ − h)2hIf α is C 3 , we can obtain the third-order expansions− α ′ (θ) = o(h).α(θ + h) = α(θ) + α ′ (θ)h + 1 2 α′′ (θ)h 2 + 1 6 α′′′ (θ)h 3 + o(h 3 )α(θ − h) = α(θ) − α ′ (θ)h + 1 2 α′′ (θ)h 2 − 1 6 α′′′ (θ)h 3 + o(h 3 ),


5.4 Methods <strong>for</strong> evaluating sensitivities 201which yields the bias 1 6 α′′′ (θ)h 2 + o(h 2 ), which is not only o(h) but is O(h 2 ).Conditional on a level of smoothness of α, then, the finite difference estimator is at leastasymptotically unbiased. Letting (X ′ (θ + h), X ′ (θ − h)) have the common distributionof (X n (θ + h), X n (θ − h)), the variance of the estimator is( )Xn (θ + h) − X n (θ − h)V2h==( n)14h 2 n 2 V ∑X k (θ + h) − X k (θ − h)k=114h 2 n V (X′ (θ + h) − X ′ (θ − h)).We see that the variance is highly dependent both on h <strong>and</strong> on the dependence structurebetween X ′ (θ + h) <strong>and</strong> X ′ (θ − h). Both of these are under our control, <strong>and</strong> wecan there<strong>for</strong>e seek to choose them in a manner which reduces the mean square erroras much as possible. This will provide a reasonable tradeoff between reducing bias<strong>and</strong> reducing variance. The context which we are working in at present is too general<strong>for</strong> us to be able to make any optimal choice of the dependence between X ′ (θ + h)<strong>and</strong> X ′ (θ − h), so we will content ourselves with making some assumptions about theasymptotic behaviour of the bias <strong>and</strong> variance <strong>and</strong> examining how best to choose h asa function of n from this.We will assume thatˆ∆ θ − α ′ (θ) = O(h 2 )V (X ′ (θ + h) − X ′ (θ − h)) = O(h β ),<strong>for</strong> some β > 0. This corresponds to situations often encountered in practice. We thendefine h n = cn −γ <strong>and</strong> want to identify the rate of decrease γ yielding the asymptoticallyminimal mean square error. The mean square error isMSE( ˆ∆ n ) = V ( ˆ∆ n ) + (E ˆ∆ n − θ) 2=14h 2 nn V (X′ (θ + h n ) − X ′ (θ − h n )) + O(h 2 n) 2=14h 2 nn O(hβ n) + O(h 4 n)=14c 2 n 1−2γ O(n−βγ ) + O(n −4γ )= O(n 2γ−1 )O(n −βγ ) + O(n −4γ )= O(n γ(2−β)−1 ) + O(n −4γ )


202 <strong>Mathematical</strong> <strong>Finance</strong>This is O(n δ(γ) ), where δ(γ) = max{γ(2 − β) − 1, −4γ}. From this we see that witha small positive γ, we obtain δ(γ) < 0. This implies that n −δ(γ) MSE( ˆ∆ n ) is boundedas n tends to infinity. Since n −δ(γ) tends to infinity, we conclude that in this case,MSE( ˆ∆ n ) tends to zero. In other words, if we decrease h as n tends to zero, themean square error tends to zero <strong>and</strong> we obtain a consistent estimator. This is a veryreasonable conclusion.However, if β < 2, choosing a γ which is too large, corresponding to a very fastdecrease in the step size, can end up yielding δ(γ) > 0, corresponding to increasingmean square error. We see that choosing γ properly is paramount to obtaining effectivefinite difference estimates. The condition β < 2 corresponds to the situation wherethe decrease in V (X ′ (θ + h) − X ′ (θ − h)) cannot keep up with the factor 1 hin the2expression <strong>for</strong> the total variance of ˆ∆ n , <strong>and</strong> there<strong>for</strong>e large γ can cause an explosionin variance. The optimal choice of γ is found by minimizing δ over γ, leading to thefastest decrease of the mean squared error.To this end, note that a negative γ always yields δ(γ) > 0, so this is never a goodchoice. It will there<strong>for</strong>e suffice to optimize over positive γ. We split up according towhether β < 2, β = 2 or β > 2. If β < 2, δ(γ) is the maximum of a line with increasingslope <strong>and</strong> a line with decreasing slope. The minimum is there<strong>for</strong>e where the two linesintersect, meaning the solution to γ(2 − β) − 1 = −4γ, which is γ = 16−β. If β = 2, theminimum is obtained <strong>for</strong> any γ ≥ 1 4. The optimal choice of γ would in this case dependon the explicit <strong>for</strong>m of the mean squared error. If β > 2, δ is downwards unbounded<strong>and</strong> γ should be chosen as large as possible. This latter case, however, probably willnot happen in reality, at least not in any of the practial cases we consider.Note that our analysis is somewhat different from the one found in Section 7.1 ofGlasserman (2003), <strong>and</strong> the case β > 2 does not correspond to any of the casesconsidered there. This is because Glasserman (2003) always assumes that the totalvariance of the estimator tends to zero as h tends to zero. In a sense, the parametrizationof the orders of decrease is different. The results, however, are equivalent.To apply the finite difference method, then, very little actual implementation work isneeded, but it may be necessary to spend some time figuring out the best selection ofh.We now consider two examples of application of the finite difference method. First, letU(θ) have the normal distribution with unit variance <strong>and</strong> mean θ, assume that we are


5.4 Methods <strong>for</strong> evaluating sensitivities 203interested in estimating d dθ EU(θ)2 . If we let U ′ (θ + h) <strong>and</strong> U ′ (θ − h) be independentnormal variables with unit variance <strong>and</strong> means θ + h <strong>and</strong> θ − h, respectively, we obtainV ( U ′ (θ + h) 2 − U ′ (θ − h) 2) = V U ′ (θ + h) 2 + V U ′ (θ − h) 2= 2(1 + 2(θ + h) 2 ) + 2(1 + 2(θ − h) 2 )= 4 + 4((θ + h) 2 + (θ − h) 2 )= 4 + 8θ 2 + 8h 2 ,which is O(1) as h tends to zero. We have used that U ′ (θ + h) 2 has a noncentral χ 2distribution to obtain the variances. This corresponds to a β value of zero, <strong>and</strong> wewould there<strong>for</strong>e expect that the optimal choice of step size is h n = cn − 1 6 <strong>for</strong> some c.This would yields a total variance of4 + 8θ 2 + 8c 2 n − 1 3,4c 2 n 1− 1 3which is of order o(n). To find the corresponding bias of the estimator, note thatputting α(θ) = EU(θ) 2 = 1 + θ 2 , we find α ′ (θ) = 2θ. The bias is then1 + (θ + h) 2 − (1 + (θ − h) 2 )2h− 2θ = 0,so the estimator is unbiased. On the other h<strong>and</strong>, if we let U be a st<strong>and</strong>ard normal <strong>and</strong>define U ′ (θ + h) = U + θ + h <strong>and</strong> U ′ (θ − h) = U + θ − h, we findV ( U ′ (θ + h) 2 − U ′ (θ − h) 2) = V ( U 2 + 2(θ + h)U + h 2 − (U 2 + 2(θ − h)U + h 2 ) )= V (4hU)= 16h 2 ,which is O(h 2 ). Using this simulation scheme then yields β = 2 <strong>and</strong> a reasonablechoice is h n = cn −γ <strong>for</strong> any γ ≥ 1 4. The total variance becomes16c 2 n −2γ4c 2 n 1−2γ = 4 n ,which, as expected, is asymptotically independent of γ (in fact, completely independent).If the estimator were biased, we could have used our freedom in choosing γ tominimize the bias.Pathwise differentiation. In the finite difference method, we considered a finitedifference approximation to d dθEX(θ) <strong>and</strong> let the parameter increment tend to zeroat the same time as we let the number of samples tend to infinity. In the method ofpathwise differentiation, we instead simply exchange differentiation <strong>and</strong> mean <strong>and</strong> use


204 <strong>Mathematical</strong> <strong>Finance</strong>ordinary Monte Carlo to estimate the result. Thus, the basic operation we desire tomake isddθ EX(θ) = E d dθ X(θ).While the left-h<strong>and</strong> side is only dependent on each of the marginal distributions X(θ),the right-h<strong>and</strong> side is dependent on the distribution of the ensemble (X(θ)) θ∈Θ . Forthe interchange to make sense at all, we there<strong>for</strong>e assume that we are working ina context where all of the variables X θ are defined on the same probability space.Furthermore, it is of course necessary that the variable d dθX(θ) is almost surely welldefined<strong>and</strong> integrable, so we need to assume that X θ is almost surely differentiable inθ <strong>for</strong> any θ ∈ Θ. Finally, we need to make sure that the exchange of integration <strong>and</strong>differentiation is allowed.If all this is the case, we can let X n(θ) ′ be an i.i.d. sequence with common distributiondbeing the same as the distribution ofdθX(θ) <strong>and</strong> define the pathwise differentiationestimator1n∑X ′nk(θ),k=1which by the law of large numbers <strong>and</strong> our assumptions satisfies1nn∑k=1X ′ k(θ) a.s.−→ E d dθ X(θ) = d dθ EX(θ),so the pathwise differentiation estimator, when it exists, is always consistent. However,we have no guarantees that the variance of the pathwise estimator is superior to thatof the finite difference estimator. Two definite benefits of the pathwise differentiationmethod over the finite difference method, though, are its unbiasedness <strong>and</strong> that we nolonger need to worry about the optimal parameter increment in the difference quotient.Other than that, we have no guarantees <strong>for</strong> improvement. General experience statesthat the pathwise differentiation estimator rarely is worse than the finite differenceestimator, but also rarely yields more than moderate improvements in variance.We will now consider sufficient conditions <strong>for</strong> the existence <strong>and</strong> consistency of thepathwise differentiation estimator. For clarity, we reiterate that we at all times assume1. The variables X θ are all defined on the same probability space.2. For each θ, X(θ) is almost surely differentiable.3. The variable d dθX(θ) is integrable.


5.4 Methods <strong>for</strong> evaluating sensitivities 205A sufficient condition <strong>for</strong> the interchange of differentiation <strong>and</strong> integration is thenthat θ is if the family ( X(θ+h)−X(θ)h) is uni<strong>for</strong>mly integrable over h in some puncturedneighborhood of zero: Since the variables converge almost surely to d dθX(θ), we alsohave convergence in probability, <strong>and</strong> this combined with uni<strong>for</strong>m integrability yieldsconvergence in mean <strong>and</strong> there<strong>for</strong>e convergence of the means, that is,( ) ()dX(θ + h) − X(θ)EX(θ) = limdθ E X(θ + h) − X(θ)= E lim= E dh→0 hh→0 hdθ X(θ),as desired. We now claim that the following is a sufficient condition <strong>for</strong> the interchangeof differentiation <strong>and</strong> integration:• V (X(θ + h) − X(θ)) = O(h 2 ).To see this, note that under this condition,E(X(θ + h) − X(θ)) 2 = V (X(θ + h) − X(θ)) + E 2 (X(θ + h) − X(θ))= O(h 2 ) + h 2 ( EX(θ + h) − EX(θ)h= O(h 2 ),so the family X(θ+h)−X(θ)his bounded in L 2 over a punctured neighborhood of zero. Inparticular, it is uni<strong>for</strong>mly integrable, so the interchange is allowed by what we alreadyhave shown. Another sufficient condition is:) 2• For θ ∈ Θ, there is an integrable variable κ θ such that |X(θ + h) − X(θ)| ≤ κ θ |h|<strong>for</strong> all h in a punctured neighborhood of zero.In order the realize this, merely note that in this case, κ θ is an integrable bound <strong>for</strong>|X(θ+h)−X(θ)|h, <strong>and</strong> there<strong>for</strong>e the dominated convergence theorem yields the result.We now turn to a concrete example illustrating the use of the pathwise differentiationmethod. We consider the same situation as in the subsection on the finite differencemethod, estimating d dθ EU(θ)2 where U(θ) is normally distributed with unit variance<strong>and</strong> mean θ. To use the pathwise method, we first need to make sure that all of thevariables are defined on the same probability space. This is easily obtained by lettingU be some variable with the st<strong>and</strong>ard normal distribution <strong>and</strong> defining U(θ) = U + θ.Clearly, then, θ ↦→ (U + θ) 2 is always everywhere differentiable, <strong>and</strong> the derivative is


206 <strong>Mathematical</strong> <strong>Finance</strong>ddθ (U + θ)2 = 2(U + θ), which is integrable. We earlier foundV (U(θ + h) 2 − U(θ) 2 ) = 4h 2 = O(h 2 ),so by our earlier results, the interchange of differentiation <strong>and</strong> integration is allowed,<strong>and</strong> we concludeddθ EU(θ)2 = E d dθ U(θ)2 = E2(U + θ).Now, we can of course evalutate this expectation directly, but <strong>for</strong> the sake of comparingwith the finite difference method, let us consider what would happen if we used a MonteCarlo estimator. The variance based on n samples would then be 1 n V (2U + 2θ) = 4 n ,which is the same as the finite difference estimator.Likelihood ratio. The likelihood ratio method, like the pathwise differentiationmethod, seeks to evaluate the sensitivity by interchanging differentiation <strong>and</strong> integration.But instead of differentiating the variables themselves, the likelihood ratiomethod differentiates their densities. Since densities usually in most cases are relativelysmooth, while the stochastic processes often, in particular in the case of optionvaluation, have discontinuities or breaks, this means that the likelihood ratio methodin many cases applies when the pathwise differentiation method does not.To set the stage, assume that X(θ) = f(X ′ (θ)) <strong>for</strong> some X ′ (θ) with density g θ , wheref : R → R is some measurable mapping. This is the <strong>for</strong>mulation best suited to laterapplications. Our goal is then to evaluate d dθ Ef(X′ (θ)). The basis of the likelihoodratio method is the observation that if the interchange is allowed <strong>and</strong> the denominatorsare nonzero, we haveddθ Ef(X′ (θ)) = d ∫f(x)g θ (x) dxdθ∫= f(x) d dθ g θ(x) dx=∫f(x) g′ θ (x)g θ (x) g θ(x) dx= Ef(X ′ (θ)) g′ θ (X′ (θ))g θ (X ′ (θ)) .While the pathwise method rewrote the sensitivity as an expectation of a somewhatdifferent variable, the likelihood ratio method rewrites the sensitivity as a new trans<strong>for</strong>mationof the old variable.As with the pathwise method, there is no guarantee that the likelihood ratio methodyields better results than the finite difference estimator. In fact, it is easy to find


5.4 Methods <strong>for</strong> evaluating sensitivities 207cases where it yields substantially worse results than finite difference, <strong>and</strong> we shall seeexamples of this later. As a rule of thumb, if the mapping f has discontinuities, thelikelihood ratio has a good chance of improving convergence, otherwise it may easilyworsen convergence. In this sense, the likelihood method complements the pathwisemethod: the pathwise method is usually useful in the presence of continuity, <strong>and</strong> thelikelihood ratio is usually useful in the presence of discontinuities. A limitation of thelikelihood ratio method is that it is necessary to know the density of the variables <strong>and</strong>their derivatives, as this is rarely the case in practice. However, the scheme can beextended to cover very general cases by approximation arguments, see Section 7.3 ofGlasserman (2003).There are no criteria in simple terms <strong>for</strong> when the interchange of differentiation <strong>and</strong>integration of the likelihood ratio method is allowed. However, general criteria can beexpressed in the case where the family X ′ (θ) is an exponential family. See <strong>for</strong> examplethe comments to Lemma 1.3 of Jensen (1992), page 7.Malliavin weights. The method of Malliavin weights, much as the likelihood method,computes sensitivities by trans<strong>for</strong>ming the derivative of the expectation into anotherexpectation. The method does not work in general cases, but is specifically suited tothe context where the sensitivity to be calculated is with respect to a trans<strong>for</strong>mationof the solution to an SDE. Specifically, consider a n-dimensional Brownian motion W<strong>and</strong> assume that the SDEdX t = µ(X t ) dt + σ(X t ) dW thas a strong solution F , such that <strong>for</strong> each x ∈ R n , X x = F (x, W ) is a solution to theabove SDE with initial condition x ∈ R n . We further assume that the correspondingflow is differentiable <strong>and</strong> let Y x be the Jacobian of X x . Furthermore, let T > 0 besome constant <strong>and</strong> let f : R n → R be some mapping such that Ef(XT x)2 is finite.The basic result which we will need to derive the Malliavin estimator is that puttingu(x) = Ef(XT x ), it holds under suitable regularity conditions that(∇u(x) = E f(XT x ) 1 ∫ )T(σ −1 (Xt x )Yt x ) t dW tTThis result is Proposition 3.2 of Fournié et al. (1999). Here, σ −1 denotes matrixinverse <strong>and</strong> not the inverse mapping. We are not able to prove this with the theorythat we have developed. The proof exploits several of the results from Section 4.5,among others the duality <strong>for</strong>mula. It also uses the Diffeomorphism Theorem <strong>and</strong>certain criteria <strong>for</strong> uni<strong>for</strong>m integrability from the theory of SDEs. See the notes <strong>for</strong>further details <strong>and</strong> references.0


208 <strong>Mathematical</strong> <strong>Finance</strong>We will now see how this result specialises to the case of one-dimensional financialmarkets. First consider the case where the financial instrument is given on the <strong>for</strong>mdS t = rS t dt + σ(S t )S t dW t <strong>for</strong> some positive measurable mapping σ. Let f be suchthat Ef(S T ) 2 is finite, we will find the delta of the T -claim f(S T ). We assume thatthe short rate is zero, our results will then obviously immediately extend to the caseof deterministic short rate. Assuming that the result from Fournié et al. (1999) can∂be applied, we can use the fact that∂S 0S T = S TS0to immediately conclude(∂Ef(S T ) = E f(S T ) 1 ∫ T∂S 0 T 0(= Ef(S T ) 1T S 0∫ T1σ(S T )S TS TS 0dW t)01σ(S T ) dW tThis is the basis <strong>for</strong> the Malliavin weight estimator of the delta in the case where thevolatility is of the local <strong>for</strong>m. This applies, <strong>for</strong> example, to the Black-Scholes model.Next, we consider a market of the <strong>for</strong>m).dS t = rS t dt + σ t S t√1 − ρ2 dW 1 t + σ t S t ρ dW 2 tdσ t = α(σ t ) dt + β(σ t ) dW 2 t ,with constant initial conditions S 0 <strong>and</strong> σ 0 , where we assume that α <strong>and</strong> β are differentiable<strong>and</strong> ρ ∈ (−1, 1). We further assume that β is positive. This covers <strong>for</strong> examplethe Heston model. As be<strong>for</strong>e, consider f such that Ef(S T ) 2 is finite, we desire to find∂the delta of the T -claim f(S T ), that is,∂S 0Ef(S T ). To do so, we defineµ(x) =(rx 1α(x 2 )<strong>and</strong> put X t = (S t , σ t ). We then obtain)<strong>and</strong> σ(x) =(√ )x 2 x 1 1 − ρ2x 2 x 1 ρ,0 β(x 2 )dX 1 (t) = µ 1 (X t ) dt + σ 11 (X t ) dW 1 t + σ 12 (X t ) dW 2dX 2 (t) = µ 2 (X t ) dt + σ 21 (X t ) dW 1 t + σ 22 (X t ) dW 2 t∂with the initial conditions X 0 = (S 0 , σ 0 ). Note that∂S 0Ef(S T ) =∂∂X 1 (0) Ef(X 1(T )).This observation leads us to believe that we may be able to use the result from Fourniéet al. (1999) to obtain an expression <strong>for</strong> the sensitivity. There<strong>for</strong>e, we assume thatthere is a strong solution to this SDE, yielding a differentiable flow, <strong>and</strong> we assumethat the results of Fournié et al. (1999) applies to our situation. Let Y be the first


5.4 Methods <strong>for</strong> evaluating sensitivities 209variation process of X under the initial condition X 0 . We find⎛⎞√1ρ(−√ σ −1 (X t )Y t = ⎝ Xt 2X1 t 1−ρ 2 1−ρ 2 β(Xt 2)⎠=⎛⎝X 2 t X1 t01β(X 2 t )Y 11t√ − ρY 21t1−ρ 2Yt 11 Yt12Yt 21 Yt22Y√ √ 12t− ρY√ 22t1−ρ 2 β(Xt 2) Xt 2X1 t 1−ρ 2 1−ρ 2 β(Xt 2)Y 21tY 22β(Xt 2) tβ(Xt 2))⎞⎠ .Because X 2 does not depend on X0 1 , Yt 21 = ∂ X 2 ∂X0 1 t = 0. And since X 1 is given as thesolution to an exponential SDE, Y 11 = Xt 1 = 1 X 1 X0 1 t . There<strong>for</strong>e, we can reducethe above to⎛σ −1 (X t )Y t = ⎝1X 2 t X1 0√1−ρ 20∂∂X 1 0X 2 t X1 tY 12t√ − ρY 22t1−ρ 2Y 22tβ(X 2 t )√1−ρ 2 β(X 2 t )⎞⎠ .Transposing this, the result from Fournié et al.(∂Ef(XT 1 ) = E f(XT 1 ) 1 ∫ TT∂X 1 00(1999) yields)1√Xt 2 X dW 1 01 1 − ρ2t .Collecting our results <strong>and</strong> substituting S <strong>and</strong> σ <strong>for</strong> X 1 <strong>and</strong> X 2 , we may conclude(∂Ef(S T ) = E f(S T ) 1 ∫ )T1√∂S 0 S 0 T σ dW 1 t 1 − ρ2t ,<strong>and</strong> this is the Malliavin estimator <strong>for</strong> the delta. Note that this is not the sameestimator as is obtained in Benhamou (2001). We will return to this point later, whenconsidering the practical implementation of the Malliavin estimator in the Hestonmodel.0Localisation. The method of localisation is more of a shrewd observation than anactual method. It can in principle be used in conjunction with any of the MonteCarlo methods we have described, but its primary usefulness <strong>for</strong> us will be togetherwith the likelihood ratio <strong>and</strong> Malliavin methods. We show how the method applies tothe likelihood ratio method, the application to the Malliavin method is similar. Wethere<strong>for</strong>e consider the same situation as in our discussion of the likelihood method,where we put X(θ) = f(X ′ (θ)) <strong>and</strong> desire to evaluate d dθ EX(θ). Letting g θ be thedensity of X ′ (θ) <strong>and</strong> putting α θ (x) = g′ θ g θ (x), the likelihood ratio method yields theequalityddθ EX(θ) = Ef(X′ (θ))α θ (X ′ (θ)).


210 <strong>Mathematical</strong> <strong>Finance</strong>Now, if α θ has a large variability <strong>and</strong> the supports of f <strong>and</strong> α θ have a large commonsupport, the factor α θ (X ′ (θ)) could be instrumental in making the variance of estimatorsbased upon the above rather large. Such detrimental effects are actually observedin reality, which means that the removal of the derivative in such cases have a verylarge price, <strong>and</strong> ordinary finite difference methods may be more effective.One way to avoid such enlargements of variance is to make the simple observation that<strong>for</strong> any sufficiently integrable mapping h, we haveddθ EX(θ) = d dθ Eh(X′ (θ)) + d dθ E(f − h)(X′ (θ))= Eh(X ′ (θ))α θ (X ′ (θ)) + d dθ E(f − h)(X′ (θ)).Here, we have split the mapping f into two parts, h <strong>and</strong> f − h, <strong>and</strong> have applied thelikelihood ratio method only to the first part. The second term can be evaluated usingother methods, such as the finite difference method or the pathwise method. If we canchoose h such that the variance of h(X ′ (θ))α θ (X ′ (θ)) is small, then we have obtainedan improvement over the usual likelihood ratio estimator provided that the secondterm does not add too much variance. One way of obtaining this reduced varianceis to make sure that the h has a small support. The choice of h, however, should bemade such that the usefulness of the likelihood ratio method is preserved <strong>and</strong> suchthat the remained term can be calculated with other methods. In other words, we wewould like the h to have small support <strong>and</strong> f − h to have reasonable smoothness. Oneexample of this is when working with the digital option, where f − h can be chosenas a smoothened version of the digital payoff, <strong>and</strong> h is then a kind of small bumpfunction with a jump. This enables one to use the likelihood ratio to take care ofthe discontinuity <strong>and</strong> using, say, the pathwise method, to evaluate the remainder. Weshall se examples of how to do this in the next section.5.5 Estimation in the Black-Scholes modelWe now consider the Black-Scholes model, given by the dynamicsdS t = µS t dt + σS t dW t ,endowed with a risk free asset B based on a constant short rate r with initial conditionB 0 = 1. We will prove that the model is free of arbitrage <strong>and</strong> implement the MonteCarlo methods discussed in the previous sections <strong>for</strong> the Black-Scholes model. Now,


5.5 Estimation in the Black-Scholes model 211our main goal of our numerical work is to analyze the per<strong>for</strong>mance of the Malliavinestimator <strong>and</strong> the Malliavin estimator combined with ordinary Monte Carlo methods,comparing it to other estimators. As we shall see, in the Black-Scholes case, theMalliavin method coincides with the Likelihood ratio method. There<strong>for</strong>e, in this case,the Malliavin calculus brings nothing new at all. As a consequence, the Black-Scholescase is mostly a kind of warm-up case <strong>for</strong> us, allowing us to familiarize ourselveswith the various methods in a simple setting be<strong>for</strong>e turning to the somewhat morecumbersome case of the Heston model.Theorem 5.5.1. The Black-Scholes model is free of arbitrage, <strong>and</strong> there is a T -EMMyielding the dynamics dS t = rS t dt + σS t dW t .Proof. Both statements of the theorem follows immediately from Corollary 5.1.10.Comment 5.5.2 The T -EMM is in fact in this case unique, corresponding to completenessof the Black-Scholes model.◦For the numerical experiments, we consider three problems:1. The price of a call option.2. The price of a digital option.3. The delta of a digital option.It is easy to check <strong>for</strong> consistency of our methods, because in all three cases, the correctvalue has an analytical <strong>for</strong>mula, as the following lemma shows.Lemma 5.5.3. Define d 1 (s) = 1σ √ T (log( s K ) + (r + 1 2 σ2 )T ) <strong>and</strong> d 2 (s) = d 1 (s) − σ √ T .1. The price of a strike K T -call option is sΦ(d 1 (S 0 )) − e −rT KΦ(d 2 (S 0 )).2. The delta of a strike K T -call option is Φ(d 1 (S 0 )).3. The delta of a strike K T -digital option is e −rT φ(d 2 (S 0 ))σ √ T S 0.Proof. The first result is the Black-Scholes <strong>for</strong>mula, see Proposition 7.10 in Björk(2004). The second is Proposition 9.5 in Björk (2004). To prove the last result, we


212 <strong>Mathematical</strong> <strong>Finance</strong>first calculate the price of a digital option ase −rT E Q (1 (ST ≥K))= e −rT Q(S T ≥ K)( ( 1= e −rT Qσ √ log S (TT s − ( 1= e(1 −rT − Φσ √ T= e −rT Φ(d 2 (S 0 )).From this we immediately obtainr − 1 ) )2 σ2 T(log K s − (r − 1 2 σ2 )T≥ 1σ √ T)))(log K (rs − − 1 ) ))2 σ2 TddS 0e −rT E Q (1 (ST ≥K)) = e −rT 1φ(d 2 (S 0 ))σ √ .T S 0We now proceed to the numerical experiments. All three problems all yield differingdegrees of irregularity, <strong>and</strong> we shall see that the methods applicable are different <strong>for</strong>all three cases.The price of a call option. Letting W be a Brownian motion <strong>and</strong> letting S bethe corresponding geometric Brownian motion with drift r <strong>and</strong> volatility σ, we wantto evaluate e −rT E(S T − K) + . We consider r = 4%, σ = 0.2, S 0 = 100, T = 1 <strong>and</strong>K = 104. We consider a plain Monte Carlo estimator, an antithetic estimator, astratified estimator, a control variate estimator <strong>and</strong> finally, an estimator combining allof these.The plain Monte Carlo estimator is simply implemented by simulating scaled lognormalvariables with scale S 0 , log mean (r − 1 2 σ2 )T <strong>and</strong> log variance σ 2 T . The antitheticestimator stratifies uni<strong>for</strong>m variables <strong>and</strong> trans<strong>for</strong>ms them with the quantile functionof the lognormal distribution. The control variate estimator uses S T as a controlvariate, which has the known mean e rT S 0 <strong>and</strong> has a correlation with (S T − K) + ofaround 60%. To analyze the results, we know that all the estimators are unbiased, so itwill suffice to compare st<strong>and</strong>ard deviations. We base our st<strong>and</strong>ard deviation estimateson 80 Monte Carlo simulations <strong>for</strong> each method with batches from 2000 simulations to90000 simulations with step sizes of 2000. Since some methods use more computationaltime per simulation than others, we compare the st<strong>and</strong>ard deviations as functions ofcomputing time instead of as functions of number of simulations. The results canbe seen in Figure 5.3. We see that the stratified <strong>and</strong> the combined estimators havegraphs which contain no data be<strong>for</strong>e around 0.002 seconds of computing time. Thisis because we base our results on the same batches of simulations. Since these twomethods take significantly longer time per simulation than the other methods, their


5.5 Estimation in the Black-Scholes model 213St<strong>and</strong>ard deviation0.00 0.05 0.10 0.15 0.20Plain MCAntitheticStratifiedControl variateCombined0.005 0.010 0.015 0.020Computation time (s)Figure 5.3: Comparison of st<strong>and</strong>ard deviations of Monte Carlo estimators <strong>for</strong> the priceof a call option in the Black-Scholes model.batches correspond to longer computational times. The effect is not particularly largehere, but the distinction between computing time <strong>and</strong> number of simulations willbecome important <strong>for</strong> example when considering the digital delta, where the mosttime-consuming method will spend six times as much computing time per simulationcompared to the simplest method. Comparing st<strong>and</strong>ard deviations as functions of thenumber of simulations would there<strong>for</strong>e have made the more advanced methods seemsix times more effective than they really are.The results show, consistently with the general rules of thumb set <strong>for</strong>th in Secion 4.7of Glasserman (2003), that stratified sampling yields the greatest benefit, followedby the control variate method <strong>and</strong> then antithetic sampling. Combining all of theseyields the best estimator. This estimator has a st<strong>and</strong>ard deviation which is up to 20times smaller than the ordinary Monte Carlo estimator.Importance sampling <strong>for</strong> OTM call options. So far, we have neglected theimportance sampling method. This is because <strong>for</strong> options which are not far out of themoney, it has little impact. For simplicity, we will in general only consider call <strong>and</strong>digital options which are not far out of the money. For completeness, however, we nowillustrate how importance sampling can be used to facilitate computations <strong>for</strong> out ofthe money options.


214 <strong>Mathematical</strong> <strong>Finance</strong>We consider the same call as be<strong>for</strong>e, but change the strike from 104 to 225. Since theinitial value of the instrument is 100, this makes the call quite far out of the money.This means that many of the simulated values of S T will yield a payoff of zero, whichmakes the estimation of the mean difficult. We will use importance sampling to changethe distribution of the underlying normal distribution such that samples giving a payoffare more likely. To this end, we note that with d 2 (x) = 1σ √ T (log x S 0− (r − 1 2 σ2 )T ) <strong>and</strong>Z = W Tσ √ T , we have E(S T − K) + = E(d 2 (Zσ √ T ) − K) + . Let µ be the distribution ofZ, µ is then the st<strong>and</strong>ard normal distribution. Let ν be the normal distribution withmean ξ <strong>and</strong> unit variance. Let f µ <strong>and</strong> f ν be the corresponding densities. We thenhavedµdν (x) = f µ(x)f ν (x)exp ( − 1 2=x2)exp ( − 1 2(x − ξ)2)(= exp − 1 (x 2 − (x − ξ) 2))2= exp(−ξx + 1 )2 ξ2 .Letting Y be a normal distribution with mean ξ <strong>and</strong> unit variance, we there<strong>for</strong>e obtainE(S T − K) + = E(d 2 (Y σ √ T ) − K) + exp(−ξY + 1 )2 ξ2 ,which is the basis of the importance sampling estimator based on ν. By trial <strong>and</strong>error experimentation, we find that ξ = 2 is a good choice <strong>for</strong> the problem at h<strong>and</strong>.Figure 5.4 compares estimated means <strong>and</strong> st<strong>and</strong>ard deviations <strong>for</strong> the ordinary MonteCarlo estimator, the importance sampling estimator <strong>and</strong> the importance samplingestimator with antithetic <strong>and</strong> stratified sampling <strong>and</strong> using Y as a control variate.Not surprisingly, we see that the importance sampling estimators are clearly superior.The combined estimator has a st<strong>and</strong>ard deviation which is up to 130 times smallerthan the ordinary estimator.The delta of a call option. Next, we consider the delta of a call option. Thissituation will allow us to use some of the Monte Carlo methods designed specifically<strong>for</strong> evaluation of sensitivities. We implement the finite difference method, the pathwisedifferentiation method <strong>and</strong> the likelihood ratio method. As we shall see, in this casethe pathwise differentiation method seems to be the optimal. We also consider anestimator combining the pathwise differentiation method with antithetic <strong>and</strong> stratifiedsampling. The methods <strong>for</strong> estimating sensitivities are not as straight<strong>for</strong>ward to use


5.5 Estimation in the Black-Scholes model 215St<strong>and</strong>ard deviation0e+00 2e−04 4e−04 6e−04 8e−04 1e−03Plain MCImportance SamplingCombined0.005 0.010 0.015 0.020Computation time (s)Figure 5.4: Comparison of means <strong>and</strong> st<strong>and</strong>ard deviations of Monte Carlo estimators<strong>for</strong> an OTM call option in the Black-Scholes model.as those <strong>for</strong> plain Monte Carlo estimation. We will spend a short time reviewing theanalysis <strong>and</strong> implementation.As we saw in Section 5.4, to implement the finite difference method, we need to choosehow fast to let the denominator in the finite difference approximation tend to zero asthe number of simulations increase, <strong>and</strong> this choice depends on the variance of thenumerator of the finite difference apprroximation. In practice, we also need to choosethe constant factor c in h n = cn −γ . In our case, the numerator is V (f(S T (S 0 + h)) −f(S T (S 0 − h))), where S T (S 0 ) is the solution of the SDE <strong>for</strong> the financial instrumentwith initial value S 0 , <strong>and</strong> f is the call payoff. Our simulation method is to simulate ageometric Brownian motion with initial value one, multiply by S 0 + h <strong>and</strong> S 0 − h <strong>and</strong>trans<strong>for</strong>m with the payoff function. f(S T (S 0 +h)) <strong>and</strong> f(S T (S 0 −h)) are then somewhatcorrelated. This both improves convergence <strong>and</strong> saves time generating simulations.When dicsussing the finite difference method in the last section, we gave arguments<strong>for</strong> different choices according to different situations. We begin by examining how muchof a difference these choices really makes. Figure 5.5 shows the results. Basically, thefigure shows that as long as our choices aren’t completely unreasonable, the differenceis quite small, in particular <strong>for</strong> large samples. The behaviour is consistent with a βvalue of 2. We pick γ = 1 6<strong>and</strong> c = 100. For the pathwise differentiation method, we


216 <strong>Mathematical</strong> <strong>Finance</strong>St<strong>and</strong>ard deviation0.000 0.005 0.010 0.015gamma = 1/4, initial = 0.01gamma = 1/5, initial = 0.01gamma = 1/6, initial = 0.01gamma = 1/4, initial = 1gamma = 1/5, initial = 1gamma = 1/6, initial = 1gamma = 1/4, initial = 100gamma = 1/5, initial = 100gamma = 1/6, initial = 1000.000 0.005 0.010 0.015 0.020SimulationsFigure 5.5: Comparison of st<strong>and</strong>ard deviations of Monte Carlo estimators <strong>for</strong> the deltaof a call option in the Black-Scholes model.wish to make the interchange∂E(S T − K) + = E ∂ E(S T − K) + .∂S 0 ∂S 0We need to check the conditions in Section 5.4. Since we have the explicit <strong>for</strong>mulaS T = S 0 exp((r − 1 2 σ2 )T + σW T ), the variables <strong>for</strong> varying S 0 are clearly defined onthe same probability space, <strong>and</strong> S T is always differentiable as a function of S 0 . Sincex ↦→ (x − K) + is everywhere differentiable except in K <strong>and</strong> S T = K with probabilityzero <strong>for</strong> any value of S 0 , we conclude that (S T −K) + is almost surely differentiable withrespect to S 0 , <strong>and</strong> since ∂S T∂S 0= S TS0, a derivative is 1 [K,∞) (S T ) S TS0, which is integrable.To check that the interchange of differentiation <strong>and</strong> integration is allowed, we merelynote that with S T (S 0 ) = S 0 exp((r − 1 2 σ2 )T + σW T ),|(S T (S 0 + h) − K) + − (S T (S 0 ) − K) + | ≤ |h| exp((r − 1 ) )2 σ2 T + σW T ,using that x ↦→ (x − K) + is Lipschitz with constant 1. Since the coefficient to |h| is integrable,by the results of Section 5.4, the interchange of differentiation <strong>and</strong> integrationis allowed, <strong>and</strong> the corresponding pathwise estimator is based on E1 [K,∞) (S T ) S TS0.


5.5 Estimation in the Black-Scholes model 217For the likelihood ratio method, we want to make the interchange∂∂S 0E(S T − K) + = E(S T − K) +∂∂S 0g S0 (S T )g S0 (S T )to obtain the right-h<strong>and</strong> side as the basis <strong>for</strong> the likelihood ratio estimator, where g S0is the density of S T under the initial condition S 0 , which is given by the expressiong S0 (x) = 1xσ √ φ(d T2(S 0 , x)), with d 2 (S 0 , x) = 1σ √ (log x T S 0− (r − 1 2 σ2 )T ). We will nottry to justify the interchange, but merely conclude that numerical experiments showthat it in fact yields the correct result. To obtain a simpler expression, we note∂∂S 0g S0 (x)=g S0 (x)1xσ √ T1S 0 σ √ d T2(S 0 , x)φ(d 2 (S 0 , x))1xσ √ φ(d = d 2(S 0 , x)T2(S 0 , x))S 0 σ √ T ,which shows, using the representation of S T in terms of W T ,∂∂S 0g S0 (S T )= d 2(S 0 , S T )g S0 (S T ) S 0 σ √ = W TT S 0 σ √ T ,so that we need to calculate the mean E(S T − K) +W TS 0 σ √ . T,We mentioned in theintroduction to this section that the likelihood method <strong>and</strong> the Malliavin weightsmethod are equivalent <strong>for</strong> the Black-Scholes model. This is quite clear, since theMalliavin weights method yields∂E(S T − K) + = E(S T − K) + 1 ∫ T∂S 0 S 0 T 0which is precisely the same as the likelihood ratio method.1σ dW t = E(S T − K) + W TS 0 σ √ T ,We know that the pathwise differentiation <strong>and</strong> likelihood ratio methods are unbiased,but the finite difference method has a bias. Our simulation results show that this biasgenerally is between 10 −4 <strong>and</strong> 10 −5 . This is small enough that we can disregard it,<strong>and</strong> to compare the estimators, it will then suffice to compare st<strong>and</strong>ard deviations.We compare a finite difference estimator, a pathwise estimator, a likelihood ratioestimator <strong>and</strong> finally, a pathwise estimator with antithetic <strong>and</strong> stratified sampling. Asin the previous case, we base our st<strong>and</strong>ard deviation estimates on 80 Monte Carlosimulations <strong>for</strong> each method with batches from 2000 simulations to 90000 simulationswith step sizes of 2000. The results can be seen in Figure 5.6. We see that the finitedifference <strong>and</strong> the pathwise differentiation methods yield almost similar results. Whencombined with antithetic <strong>and</strong> stratified sampling, a considerable boost in effectivityis obtained. The likelihood ratio is in this case clearly inferior to the other methods,approximately doubling the st<strong>and</strong>ard deviation. This conclusion supports the common


218 <strong>Mathematical</strong> <strong>Finance</strong>St<strong>and</strong>ard deviation0.000 0.005 0.010 0.015Finite DifferencePathwise differentiationLikelihood ratioCombined0.005 0.010 0.015 0.020Computation time (s)Figure 5.6: Comparison of st<strong>and</strong>ard deviations of Monte Carlo estimators <strong>for</strong> the deltaof a call option in the Black-Scholes model.wisdom that the likelihood ratio method mostly is useful when applied in the contextof discontinuities. This conclusion is consistent with the results of Benhamou (2001)<strong>and</strong> Benhamou (2003). To see what goes wrong with the likelihood ratio method,recall that the likelihood ratio estimator is based on the <strong>for</strong>muladE(S T − K) + = E(S T − K) + W TdS 0 S 0 σ √ T .While the weight in the expectation on the right-h<strong>and</strong> side enables us to remove thederivative, it also introduces extra variance: The variable (S T − K) + W T gets verylarge when W T is large. We could use the localisation methods described in Section5.4 to minimize this effect, but in reality, the likelihood ratio method simply is notsuited to this particular problem. When considering the delta of a digital option, wewill see a situation where the likelihood ratio <strong>and</strong> localised likelihood ratio methodsprovide effective results.The delta of a digital option. Our final numerical experiment <strong>for</strong> the Black-Scholesmodel is the evaluation of the delta <strong>for</strong> a digital option. The methods which are apriori at our disposal are that of finite difference, pathwise differentiation <strong>and</strong> likelihoodratio. However, in this case, the pathwise differentiation method cannot be applied:Even though 1 [K,∞) (S T ) is almost surely differentiable <strong>for</strong> any initial condition, thederivative is also almost surely zero, so the pathwise differentiation method yields adelta <strong>for</strong> the digital option of zero, which is clearly nonsense. We conclude that the


5.5 Estimation in the Black-Scholes model 219interchange of differentiation <strong>and</strong> integration required <strong>for</strong> application of the pathwisedifferentiation method is not justified. Intuitively, this is because the payoff is notLipschitz. We are there<strong>for</strong>e left with the finite difference method <strong>and</strong> the likelihoodratio method. We implement these, <strong>and</strong> we also implement a localised version of thelikelihood ratio method. Finally, we also consider a localised likelihood ratio estimatorwith antithetic <strong>and</strong> stratified sampling.The finite difference <strong>and</strong> likelihood ratio methods are implemented as <strong>for</strong> the calldelta. We will discuss the details of the localisation of the likelihood method, wherewe proceed as described in Section 5.4. Let f(x) = 1 [K,∞) (x) be the digital payoff.Our goal is to make a decomposition of the <strong>for</strong>m f(x) = h ε (x) + (f − h ε )(x), wheref − h ε is reasonably smooth <strong>and</strong> h ε has small support <strong>and</strong> contains the discontinuityof the payoff. Led by Glasserman (2003), Section 7.3, we define h ε by the relationf(x) − h ε (x) = min{1 − 1 }2ε max{0, x − K + ε}We have in this way ensured that f − h ε is continuous. We then obtainh ε (x) = 1 [K,∞) (x)(K − ε − x) − + 1 (−∞,K) (K + ε − x) + .h ε is a function with small support containing the discontinuity of the payoff. f − h εis continuous. See Figure 5.7 <strong>for</strong> illustrations of the decomposition of the payoff.Payoff−1.0 −0.5 0.0 0.5 1.0Payoff−1.0 −0.5 0.0 0.5 1.0Payoff−1.0 −0.5 0.0 0.5 1.080 90 100 110 120Underlying80 90 100 110 120Underlying80 90 100 110 120UnderlyingFigure 5.7: From left to right: The payoff of a digital option, the smoothed payofff − h ε <strong>and</strong> the jump part h ε .The likelihood ratio method applied to the h ε term then yields∂∂S 0E(S T − K) + =∂∂S 0Eh ε (S T ) +W T= Eh ε (S T )S 0 σ √ T +∂ E(f − h ε )(S T )∂S 0 ∂ E(f − h ε )(S T ).∂S 0


220 <strong>Mathematical</strong> <strong>Finance</strong>We will use pathwise differentiation on the second term. f − h ε is almost surelydifferentiable <strong>for</strong> any initial condition with∂∂S 0(f − h ε )(S T ) = 1 2ε 1 (|S T −K|


5.5 Estimation in the Black-Scholes model 221MethodCall priceImprovement factorPlain MC 1.0Antithetic 1.2-1.8Control variates 1.7-2.2Stratified 7-12Combined 14-22OTM Call pricePlain MC 1.0Importance sampling 20-37Combined 69-132Call deltaPlain finite difference 1.0Pathwise 0.8-1.5Likelihood ratio 0.3-0.5Combined 9-14Digital deltaPlain finite difference 1.0Likelihood ratio 5-10Localised likelihood ratio 16-29Combined 122-205Table 5.1: Comparison of results <strong>for</strong> estimation of the call price, call delta <strong>and</strong> digitaldelta in the Black-Scholes model.Conclusions. We have now analyzed various Monte Carlo methods <strong>for</strong> the price ofa call, the delta of a call <strong>and</strong> the delta of a digital in the context of the Black-Scholesestimator. The overarching conclusion of our ef<strong>for</strong>ts is that when applied properly,using more than just plain Monte Carlo methods <strong>for</strong> pricing <strong>and</strong> plain finite difference<strong>for</strong> sensitivities can have a very large payoff.Be<strong>for</strong>e proceeding to our numerical experiments <strong>for</strong> the Heston model, we comparethe efficiency improvements <strong>for</strong> the various methods. Table 5.5 shows the factor ofimprovement <strong>for</strong> the st<strong>and</strong>ard deviations <strong>for</strong> the various methods implemented comparedto the simplest methods. The factors are calculated as the minimal <strong>and</strong> maximalfactors of improvement <strong>for</strong> the st<strong>and</strong>ard deviation <strong>for</strong> all computational times.


222 <strong>Mathematical</strong> <strong>Finance</strong>We see that in all cases, the addition of antithetic <strong>and</strong> stratified sampling adds considerablyto the cost effectiveness of the estimators in terms of st<strong>and</strong>ard deviation percomputing time. In particular, the combination of antithetic <strong>and</strong> stratified samplingwith the localised likelihood ratio method <strong>for</strong> the digital delta yields almost unbelievableresults. Our results show that this method allows one to calculate the digitaldelta with far more precision using the combined method with 2000 simulations thanusing the plain finite difference method with 90000 simulations.5.6 Estimation in the Heston ModelThe Heston model is a model <strong>for</strong> a single financial asset given bydS t = µS(t) dt + √ σ(t)S(t) √ 1 − ρ 2 dW 1 (t) + √ σ(t)S(t)ρ dW 2 tdσ t = κ(θ − σ t ) dt + ν √ σ t dW 2 (t),where (W 1 , W 2 ) is a two-dimensional Brownian motion, µ, κ, θ <strong>and</strong> ν are positiveparameters <strong>and</strong> ρ ∈ (−1, 1). We call this equation the Heston SDE <strong>for</strong> S <strong>and</strong> σ.We also assume given a risk-free asset B based on a constant short rate r. As inthe previous section, our mission is to argue that the model is free of arbitrage <strong>and</strong>analyze different estimation methods in this model. In contrast to the previous section,it is not obvious that there exists a solution (S, σ) to the pair of SDEs describing themodel. The question of arbitrage is also considerably more difficult than in the previoussection. In order not to get carried away by too much theory, we will without proofapply some results from the theory of SDEs to obtain the results necessary <strong>for</strong> ourpurposes.Theorem 5.6.1. Let κ, θ <strong>and</strong> ν be positive numbers. Assume that 2κθ ≥ ν 2 . TheSDE given by dσ t = κ(θ − σ t ) dt + ν √ σ t+ dW t has a weak solution, unique in law,<strong>and</strong> the solution is strictly positive. In particular, the solution also satisfies the SDEdσ t = κ(θ − σ t ) dt + ν √ σ t dW t .Comment 5.6.2 The process solving the SDE is known as the Cox-Ingersoll-Rossprocess, from Cox, Ingersoll & Ross (1985). The criterion 2κθ ≥ ν 2 ensures that themean reversion level θ is sufficiently large in comparison to the volatility so that thevariability of the solution cannot drive the process below zero.◦Proof. The existence of a positive weak solution is proved in Example 11.8 of Jacobsen


5.6 Estimation in the Heston Model 223(1989). To prove uniqueness in law, we use the Yamada-Watanabe uniqueness theorem,Theorem V.40.1 of Rogers & Williams (2000b). This result states that the SDE isunique in law if the drift cofficient is Lipschitz <strong>and</strong> the volatility coefficient satisfies(σ(x) − σ(y)) 2 ≤ ρ(|x − y|) <strong>for</strong> some ρ : [0, ∞) → [0, ∞) with ∫ ∞ 10 ρ(x)dx infinite. Inour case, the drift is µ(x) = κ(θ − x), which is clearly Lipschitz as it has constantderivative. For the volatility, we have σ(x) = √ x + . We put ρ(x) = x <strong>and</strong> note that<strong>for</strong> 0 ≤ x ≤ y,(σ(x) − σ(y)) 2 = x − 2 √ x √ y + y ≤ y − x = ρ(|x − y|).For x, y ≤ 0, we clearly also have (σ(x) − σ(y)) 2 = 0 ≤ ρ(|x − y|). Finally, in the casex ≤ 0 ≤ y we find (σ(x) − σ(y)) 2 = y ≤ ρ(|x − y|). We conclude that <strong>for</strong> any x, y ∈ R,(σ(x) − σ(y)) 2 ≤ ρ(|x − y|). Obviously, ρ satisfies the integrability criterion. Thus, thehypotheses of the Yamada-Watanabe theorem are satisfied, <strong>and</strong> we have uniquenessin law.With Theorem 5.6.1 in h<strong>and</strong>, we can conclude that the volatility process of the Hestonmodel exists <strong>and</strong> is positive if 2κθ ≥ ν 2 . By positivity, we can also take its square root.There<strong>for</strong>e, the SDE <strong>for</strong> the financial instrument in the Heston SDE is well-defined, <strong>and</strong>by Lemma 3.7.1, there exists a process S solving it. Because of these considerations,we shall in the following always assume 2κθ ≥ ν 2 .Next, we consider the question of arbitrage. Because the volatility is stochastic, wecannot directly apply the criterions of Section 5.1. With a bit of work, however, wecan use the results to obtain the desired conclusion.Theorem 5.6.3. The Heston model is free of arbitrage, <strong>and</strong> <strong>for</strong> any T > 0 there is aT -EMM Q yielding the dynamics on [0, T ] given bydS t = rS(t) dt + √ σ(t)S(t) √ 1 − ρ 2 dW 1 (t) + √ σ(t)S(t)ρ dW 2 tdσ t = κ(θ − σ t ) dt + ν √ σ t dW 2 (t),that is, the dynamics <strong>for</strong> the volatility is preserved, <strong>and</strong> the drift <strong>for</strong> the instrument ischanged from µ to r.Proof. We wish to use Corollary 5.1.8 to prove freedom from arbitrage. We there<strong>for</strong>eneed to analyze the equationµ − r = √ σ t√1 − ρ2 λ 1 (t) + √ σ t ρλ 2 (t).


224 <strong>Mathematical</strong> <strong>Finance</strong>To solve this, we put λ 2 = 0. The equation reduces to µ − r = σ t√1 − ρ2 λ(t), wherewe <strong>for</strong> notational simplicity have removed the subscript from the lambda. Since σ ispositive, this equation has a unique solution given byλ t =µ − r√1 − ρ2 √ σ t.Since σ is a st<strong>and</strong>ard process, it is clear that λ is progressive. By continuity, λ is locallybounded <strong>and</strong> there<strong>for</strong>e in L 2 (W ). By Corollary 5.1.8, if we can prove that E(M) isa martingale with M t = − ∫ t0 λ t dW 1 t , the market is free of arbitrage. It is then clearthat the Q-dynamics of S <strong>and</strong> σ are as stated in the theorem.To this end, it will by Lemma 3.8.6 suffice to show that EE(M) t = 1 <strong>for</strong> all t ≥ 0. Lett ≥ 0 be given, let τ n = inf{s ≥ 0||λ s | ≥ n}. Since λ is continuous, τ n increases toinfinity <strong>and</strong> λ τ nis bounded. Then [M τn∧t ] ∞ = ∫ t0 (λτ n s ) 2 ds is bounded, so the Novikovcriterion applies to show that E(M τn∧t ) is a uni<strong>for</strong>mly integrable martingale <strong>for</strong> anyt ≥ 0.Next, note that since τ n increases, τ n ≥ t from a point onwards <strong>and</strong> E(M) is nonnegative,the variables E(M) t 1 (τn ≥t) converge upwards to E(M) t . By monotone convergence,EE(M) t = lim n EE(M) t 1 (τn≥t). But E(M) t 1 (τn≥t) = E(M τn∧t ) ∞ 1 (τn≥t), soletting Q n be the measure such that dQn ∞dP ∞= E(M τn∧t ) ∞ , we may now concludeEE(M) t = limnEE(M) t 1 (τn ≥t) = limnEE(M τ n∧t ) ∞ 1 (τn ≥t) = limnQ n (τ n ≥ t).We need to show that the latter limit is equal to one. To do so, we first show thatQ n (τ n ≥ t) = P (τ n ≥ t). Under Q n , we know from Girsanov’s theorem that theprocess (Wt 1 + ∫ t0 λ s1 [0,τn ∧t](s) ds, Wt 2 ) is a F t Brownian motion. In particular, W 2 isa F t Brownian motion under Q n <strong>for</strong> all n. Now, we know that under P , σ satisfiesσ t = σ 0 +∫ t0κ(θ − σ s ) ds +∫ t0ν √ σ s dW 2 s .By Theorem 3.8.4, integrals under P <strong>and</strong> Q n are the same. Since ν √ σ s is continuous,it is integrable both under P <strong>and</strong> Q n , <strong>and</strong> there<strong>for</strong>e σ also satisfies the above equationunder the set-up with the Q n F t Brownian motion W 2 . Since the SDE satisfiesuniqueness in law by Theorem 5.6.1, the Q n distribution of σ must be the same asthe P distribution. Since λ is a trans<strong>for</strong>mation of σ <strong>and</strong> τ n is a trans<strong>for</strong>mation of λ,we conclude that the distribution of τ n is the same under Q n <strong>and</strong> P . In particular,Q n (τ n ≥ t) = P (τ n ≥ t). Since τ n tends almost surely to infinity, lim n P (τ n ≥ t) = 1.Combining this with our earlier findings, we conclude EE(M) t = 1. There<strong>for</strong>e, byLemma 3.8.6, E(M) is a martingale <strong>and</strong> by Corollary 5.1.8, the market is free ofarbitrage.


5.6 Estimation in the Heston Model 225We will now discuss estimating the digital delta in the Heston model. Fix a maturitytime T . Our first objective is to find out how to simulate values from S T . In the Black-Scholes model, this was a trivial question as the stock price then followed a scaledlognormal distribution. For the Heston model, there is no simple way to simulate fromS T , we need to discretise the SDE. Be<strong>for</strong>e considering how to do this, we will rewritethe SDE to a <strong>for</strong>m better suited <strong>for</strong> discretisation. Define X t = log S t , Itô’s lemma inthe <strong>for</strong>m of Lemma 3.6.4 yieldsdX t = 1 S tdS t − 12S 2 td[S] t= r dt + √ σ t√1 − ρ2 dW 1 t + √ σ t ρ dW 2 t − 1 2 σ t dt,with X 0 = log S 0 . In other words, in order to simulate values of S T , it will suffice toconsider the SDE(dX t = r − 1 )2 σ t dt + √ √σ t 1 − ρ2 dWt 1 + √ σ t ρ dWt2dσ t = κ(θ − σ t ) dt + ν √ σ t dW 2 (t),find a way to simulate from X T <strong>and</strong> then take the exponent of X T to obtain a simulationfrom S T . A discretisation of the above set of SDEs will yield a way to simulatefrom a distribution approximating that of X T . Such a discretisation consists of definingdiscrete processes X n∆ ′ <strong>and</strong> σ′ n∆ <strong>for</strong> n ≤ N with N∆ = T such that the distributionof X ′ (T ) approximates that of X(T ). There are several ways to do this. Andersen(2008) provides a useful overview. We will consider two methods: The full truncationmodified Euler scheme <strong>and</strong> a scheme with exact simulation of the volatility. Forbrevity, we will denote the full truncation modified Euler scheme as the FT scheme,short <strong>for</strong> full truncation, <strong>and</strong> we will call the other scheme the SE scheme, short <strong>for</strong>semi-exact.The FT scheme. We first describe the modified Euler scheme. In Lord et al. (2006),several ways of discretising the CIR process are considered, <strong>and</strong> the FT scheme isfound to be the one introducing the least bias. Let Z 1 <strong>and</strong> Z 2 be discrete processeson {0, ∆, . . . , N∆} of independent st<strong>and</strong>ard normal variables, the FT scheme is thediscretisation defined by putting X ′ (0) = X 0 , σ ′ (0) = σ 0 <strong>and</strong>X ′ (n+1)∆ = X′ n∆ +(r − 1 2 σ′ n∆σ ′ (n+1)∆ = σ′ n∆ + κ(θ − σ ′+n∆ ) + ν √σ ′+n∆√∆Z2n∆ .)√∆ + σn∆√ √ √′+ 1 − ρ2∆Zn∆ 1 + σ ′+n∆ ρ√ ∆Zn∆1The discretisation means that the volatility process σ ′ can turn negative. In this case,


226 <strong>Mathematical</strong> <strong>Finance</strong>there is an upwards drift of rate κθ. Taking positive parts ensure that the square rootsare well-defined.The SE scheme. The SE scheme takes advantage of the fact that the transitionprobability of the CIR process is a scaled noncentral χ 2 distribution. More explicitly,the distribution of σ t+∆ given σ ∆ = v is a scaled noncentral χ 2 distribution with scaleν 24κ (1 − e−κ∆ 4vκe), non-centrality parameter−κ∆4θκ<strong>and</strong> degrees of freedomν 2 (1−e −κ∆ ) ν. That2this is the case is a folklore result, although it is quite unclear where to find a proofof the result. In Feller (1951), Lemma 9, the Fokker-Planck equation <strong>for</strong> the SDEis solved, yielding the density of the noncentral χ 2 distribution. It would then bereasonable to expect that one might obtain a complete proof by using some results onwhen the solution to the Fokker-Planck equation in fact is the transition density of thesolution. In any case, the result allows exact simulation of values of the CIR process.Thus, we can define σ ′ by letting the conditional distribution of σ(n+1)∆ ′ given σ′ n∆ bethe noncentral χ 2 distribution defined above. This defines a discrete approximation ofσ. We will use this to obtain a discrete approximation of the price process. To do so,we first note thatX t+∆ − X t = r∆ − 1 2∫ t+∆tσ u du + √ ∫ t+∆√1 − ρ 2 σu dWu 1 + ρt∫ t+∆t√σu dW 2 u.The process W 2 is correlated with σ. Since we desire to simulate directly from thetransition probabilities <strong>for</strong> σ, our discretisation scheme cannot depend on this correlation.To remove the dW 2 integral, we note thatσ t+∆ − σ t =∫ t+∆tκ(θ − σ u ) du + νIsolating the last integral <strong>and</strong> substituting, we find∫ t+∆t√σu dW 2 u.X t+∆ − X t= r∆ −∫ t+∆= r∆ − ρκθ∆νtσ u2 + ρκν (θ − σ u) dt + √ 1 − ρ 2 ∫ t+∆+ ρ(σ t+∆ − σ t )ν+( ρκν − 1 ) ∫ t+∆2tt√σu dW 1 u + ρ(σ t+∆ − σ t )νσ u du + √ ∫ t+∆√1 − ρ 2 σu dWu.1The right-h<strong>and</strong> side depends only on the joint distributions of σ u <strong>and</strong> W 1 . Sinceσ u <strong>and</strong> W 1 are independent, this is an important simplification compared to ourearlier expression, which included an integral with respect to W 2 . Because of theindependence, the conditional distribution of ∫ t+∆ √σu dW 1 tu given σ is normal witht


5.6 Estimation in the Heston Model 227mean zero <strong>and</strong> variance ∫ t+∆σt u du. We now make the approximationst∫ t+∆t∫ t+∆∆σ u du ≈ σ t+∆2 + σ ∆t2√√σu dWu1 ∆≈ σ t+∆2 + σ tInserting these in our equation <strong>for</strong> X t+∆ , we obtain∆21√ (Wt+∆ 1 − Wt 1 ).∆X t+∆ ≈ X t + r∆ − ρκθ∆ + ρ(σ t+∆ − σ t )ν ν( ρκ+ν − 1 )∆(σt+∆ + σ t )+ √ √σt+∆1 − ρ2 22 + σ t(Wt+∆ 1 − Wt 1 )2(= X t + r − ρκθ ) ( ( ∆ ρκ∆ +ν 2 ν − 1 )− ρ )σ t2 ν( ( ∆ ρκ+2 ν − 1 )+ ρ ) √(1 − ρ2 )(σ t+∆ + σ t )σ t+∆ +(Wt+∆ 1 − Wt 1 ).2 ν2We are thus lead to defining the discretisation scheme X ′ <strong>for</strong> X byX ′ (n+1)∆ = X′ n∆ + K 0 + K 1 σ ′ n∆ + K 2 σ ′ (n+1)∆ + √K 3 (σ ′ (n+1)∆ + σ′ n∆ )Z n,where Z n is a process of independent st<strong>and</strong>ard normal variables on {0, ∆, . . . , N∆}independent of σ ′ <strong>and</strong>(K 0 = r − ρκθ )∆νK 1 = ∆ ( ρκ2 ν − 1 )− ρ 2 νK 2 = ∆ ( ρκ2 ν − 1 )+ ρ 2 νK 3 = ∆(1 − ρ2 ).2Modulo a short rate <strong>and</strong> some weights in the discretisation of the integrals, this is thesame as obtained in Andersen (2008), page 20. To sum up, the SE scheme defines theapproximation σ ′ to σ by directly defining the conditional distributions of the processusing the transition probabilities. The approximation X ′ to X is then obtained byX ′ (n+1)∆ = X′ n∆ + K 0 + K 1 σ ′ n∆ + K 2 σ ′ (n+1)∆ + √K 3 (σ ′ (n+1)∆ + σ′ n∆ )Z n,where Z n is a process of independent st<strong>and</strong>ard normal variables on {0, ∆, . . . , N∆},independent of σ ′ . Note that while the FT scheme simulates a discretisation of thetwo-dimensional process (W 1 , W 2 ), the SE scheme simulates only a discretisation of


228 <strong>Mathematical</strong> <strong>Finance</strong>W 1 . This will be an important point when comparing our Malliavin estimator withthe one obtained in Benhamou (2001). Also note that because of the exact simulationof the root volatility in the SE scheme, σ ′ can never reach zero, but in the FT scheme,it is quite probable that σ ′ hits zero. This will be important <strong>for</strong> the implementationof the Malliavin estimator.Per<strong>for</strong>mance of the schemes. We will now see how well each of these two schemescan approximate the true distribution of S T . As we have no way to obtain exact simulationsfrom the distribution, we must contend ourselves with checking convergence<strong>and</strong> compare results from each scheme. We will use S 0 = 100, σ 0 = 0.04, r = 0.04,κ = 2, θ = 0.04, ν = 0.4, ρ = 0.3 <strong>and</strong> T = 1 <strong>and</strong> these parameter values will beheld constant throughout all of this section unless stated otherwise. Figure 5.9 showsdensity estimates of S T <strong>for</strong> each of the two schemes, using 2, 5, 50 <strong>and</strong> 100 steps. Thefigure shows that both schemes seem to converge to some very similar distributions.Most of the error in the discretisations seem to come from the top shape of the hump,with the SE scheme converging quicker than the FT scheme. This is not surprising,considering that the SE scheme uses the exact distribution <strong>for</strong> the volatility, whilethe FT scheme only has an approximation. Both the schemes seem to have achievedstability at least at the 50-step discretisation.Density0.000 0.005 0.010 0.015 0.020 0.0252−step FT scheme5−step FT scheme50−step FT scheme100−step FT schemeDensity0.000 0.005 0.010 0.015 0.020 0.0252−step SE scheme5−step SE scheme50−step SE scheme100−step SE scheme0 50 100 150 200 2500 50 100 150 200 250ValueValueFigure 5.9: Density estimates <strong>for</strong> the FT <strong>and</strong> SE schemes based on 80000 simulations.Next, we compare call pricing under each of the schemes. In Heston (1993), an analyticalexpression <strong>for</strong> the Fourier trans<strong>for</strong>m of the survival function of S T is obtained,<strong>and</strong> it is explained how this can be inverted <strong>and</strong> used to obtain call prices in the Heston


5.6 Estimation in the Heston Model 229model. These results greatly contribute to the popularity of the Heston model <strong>and</strong>its offspring. We use the pricer at http://kluge.in-chemnitz.de/tools/pricer to obtainaccurate values <strong>for</strong> call prices. This pricer seems to be reliable <strong>and</strong> is also used <strong>for</strong> thenumerical experiments in Alòs & Ewald (2008). Using strike 104 yields a call price of7.71175.We want to see how the call prices obtained using Monte Carlo with each of thetwo schemes compare to this. The purpose of this is both to check how much bias isintroduced by the discretisation <strong>and</strong> to check that our fundamental simulation routineswork as they are supposed to. We will at the same time also check the functionalityof sampling from each of the schemes using latin hypercube sampling <strong>and</strong> antitheticsampling. We have explained earlier how antithetic sampling works, but we havenot mentioned latin hypercube sampling. We will not explain this method in detail,suffice it to say that it is can be thought of as a generalization of stratified sampling<strong>for</strong> high-dimensional problems, see Glasserman (2003), Section 4.4.In the left graph of Figure 5.10, we compare the relative errors of call prices <strong>for</strong> eachof the two schemes with <strong>and</strong> without latin hypercube <strong>and</strong> antithetic sampling, <strong>and</strong>we compare the st<strong>and</strong>ard deviations. We use discretisations steps from 10 to 100<strong>and</strong> consider 300 replications of the estimator, each estimate based on 20000 samples.We see that as expected, each of the methods yield prices which fit well with thetrue value. It seems that less than 50 steps yields some inaccuracy in excess of theexpected variability, with the 10-step results giving bias which is significantly largerthan those of the observations <strong>for</strong> higher levels of discretisation. However, the relativeerror never exceeds 1%. In reality, models are inaccurate anyway, so this bias would beunimportant in practical settings. In other words, <strong>for</strong> practical applications, it wouldseem that 20 steps would by sufficient <strong>for</strong> the contract under investigation.In the right graph, we compare st<strong>and</strong>ard deviation times computation time. Thisis a general measure of efficiency, with low values corresponding to high efficiency.We see that the efficiency decreases as the number of discretisation steps increases.This is not particularly surprising, but an important point nonetheless: The higherlevel of discretisation means that each sample from S T takes longer to generate, butsince the number of samples is held constant <strong>and</strong> the change in the distribution isminuscule, the st<strong>and</strong>ard deviation does not decrease. Thus, the st<strong>and</strong>ard deviationtimes computational time will increase as the number of discretisation steps increases,lowering per<strong>for</strong>mance. There<strong>for</strong>e, the choice of number of discretisation steps boilsdown to choosing between optimising bias <strong>and</strong> optimising st<strong>and</strong>ard deviation times


230 <strong>Mathematical</strong> <strong>Finance</strong>computational time.Note that the right graph cannot be used to measure the efficiency of latin hypercube<strong>and</strong> antithetic sampling, as the computational time is not held constant. However,experimental results show an improvement factor of around 1.4. This is very moderatecompared to the improvement factors found <strong>for</strong> antithetic <strong>and</strong> stratified sampling inthe previous section. The reason <strong>for</strong> this is probably that the trans<strong>for</strong>mation fromthe driving normal distributions to the final S T sample is very complex, <strong>and</strong> the latinhypercube sampling basically just stratifies in a very simple manner. To obtain thesame type of efficiency gains <strong>for</strong> the Heston model as in the Black-Scholes model, it willprobably be necessary to implement a more sophisticated <strong>for</strong>m of stratified sampling,such as terminal stratification or one of the other methods discussed in Subsection4.3.2 of Glasserman (2003).Relative error−0.010 −0.005 0.000 0.005 0.010FT scheme with plain Monte CarloFT scheme with Latin Hypercube <strong>and</strong> AntitheticsSE scheme with plain Monte CarloSE scheme with Latin Hypercube <strong>and</strong> AntitheticsSt<strong>and</strong>ard deviation times computational time (s)0.0 0.1 0.2 0.3 0.4 0.5FT scheme with plain Monte CarloFT scheme with Latin Hypercube <strong>and</strong> AntitheticsSE scheme with plain Monte CarloSE scheme with Latin Hypercube <strong>and</strong> Antithetics20 40 60 80 10020 40 60 80 100Discretisation stepsDiscretisation stepsFigure 5.10: Left: Relative errors <strong>for</strong> call prices calculated using the FT <strong>and</strong> SEschemes with <strong>and</strong> without latin hypercube sampling <strong>and</strong> antithetic sampling. Right:St<strong>and</strong>ard deviations times computational time.Comparison of methods <strong>for</strong> calculating the digital delta. Being done withthe introductory investigations of the Heston model, we now proceed to the mainpoint of this section, which is the comparison of the Malliavin <strong>and</strong> localised Malliavinmethods with the finite difference method. Note that we cannot directly comparethese methods with the likelihood ratio <strong>and</strong> pathwise differentiation methods as inthe previous section, as the likelihood ratio method does not apply directly to theHeston model (see, however, the arguments outlined in Glasserman (2003), page 414or Chen & Glasserman (2006), <strong>for</strong> application of the likelihood ratio method to thediscretisation of the Heston model) <strong>and</strong> the pathwise differentiation method does not


5.6 Estimation in the Heston Model 231apply to the digital delta. In the remainder of this section, we will always sample fromour two schemes using latin hypercube sampling <strong>and</strong> antithetic sampling.We consider a digital option with strike 104. Let f denote the corresponding payofffunction. As described above, we have six methods to consider: The finite differencemethod, the Malliavin method <strong>and</strong> the localised Mallivin method, each combined withone of the two simulation schemes. In Section 5.4, we found the basis of the Malliavinmethod through the identity∂∂S 0Ef(S T ) = E(f(S T ) 1S 0 T∫ T0)1√ √ dW 1σt 1 − ρ2t ,where we are using √ σ t instead of σ t because in our model <strong>for</strong>mulation, σ t is theroot volatility <strong>and</strong> not the ordinary volatility. The localised Malliavin method is thenobtained by decomposing the payoff in two parts <strong>and</strong> applying the Malliavin methodto one part <strong>and</strong> the pathwise differentiation method to the other, as was also donein Section 5.5. One problem with this is that we do not know how to sample fromthe stochastic integral in the expectation above. Instead, we have to make to with anapproximation. Based on the theory of Chapter 3, we would expect that a reasonableapproximation of an integral of the <strong>for</strong>m ∫ T0 H s dW 1 s is∫ T0H s dW 1 s =n∑H tk−1 (W tk − W tk−1 ),k=1where it is essential that we are using <strong>for</strong>ward increments <strong>and</strong> not backwards incrementsin the Riemann sum. With X ′ <strong>and</strong> σ ′ denoting the discretised versions of X<strong>and</strong> σ <strong>and</strong> Z 1 denoting the driving Brownian motion <strong>for</strong> the asset price process, it isthen natural to consider the approximation⎛∂Ef(S T ) ≈ E ⎝f∂S 0( )e X′ T1e X′ 0T √ 1 − ρ 2N∑k=1√∆Zk∆√σ ′ (k−1)∆This works well when using the SE scheme, where we know that σ ′ is positive. However,if we are using the FT scheme, we have no such guarantee, <strong>and</strong> the above is not welldefined.In this case, then, we resort to substituting max{σ ′ (k−1)∆ , ε} <strong>for</strong> σ′ (k−1)∆ inthe sum, where ε is some small positive number. Our experimental results show thatusing an ε which is too small such as 0.00000001 can lead to considerable numericalinstability. We use 0.0001, this leads to stable results <strong>and</strong> reasonable efficiency.⎞⎠ .We are now ready <strong>for</strong> the numerical experiments. We begin by investigating the biasof our estimators. We expect two sources of bias. First, there is an inherent bias


232 <strong>Mathematical</strong> <strong>Finance</strong>from the sampling of S T since we are only sampling from a distribution approximatingthat of S T . Second, each of the methods involve certain approximations - the finitedifference method approximates the derivative by a finite difference, <strong>and</strong> the Malliavinestimators approximate the Itô integral by a sum.In Figure 5.11, we have plotted the means <strong>for</strong> each of the six methods. We use 300replications of estimators based on samples of size 1000 to 29500 with increments of1500. We repeat the experiment <strong>for</strong> a 15-step <strong>and</strong> a 100-step discretisation. We seethat in the 15-step experiment, there is a considerable variation in the means, but thisvariation is almost gone in the 100-step experiment. Expecting that the finite differencemethod produces the least bias, we obtain from the experiment an approximatevalue <strong>for</strong> the digital delta of 0.02132436, we will use this value as a benchmark. Notethat we could also have used the semi-analytical results from Heston (1993) to obtainbenchmark values, however, the results from the finite difference method will besufficient <strong>for</strong> our needs. Using this benchmark, we find that the Malliavin methods inthe 15-step experiment yields average relative errors of -6.8% <strong>and</strong> 7.7% percent <strong>for</strong> theFT <strong>and</strong> SE schemes, respectively. The localised methods have average relative errorsof -3.3% <strong>and</strong> 3.7%, while the finite difference relative errors are less than 0.1%. In the100-step experiment, the Malliavin methods have average relative errors of -0.8% <strong>and</strong>1.9% <strong>for</strong> the FT <strong>and</strong> SE schemes, respectively, while the localised Malliavin methodshave average relative errors of -0.3% <strong>and</strong> 0.9%, respectively. We conclude that it wouldseem that all of our estimators are asymptotically unbiased.Mean0.010 0.015 0.020 0.025 0.030SE Finite differenceFT Finite differenceSE MalliavinFT MalliavinLocalised SE MalliavinLocalised FT MalliavinMean0.010 0.015 0.020 0.025 0.030SE Finite differenceFT Finite differenceSE MalliavinFT MalliavinLocalised SE MalliavinLocalised FT Malliavin0.0 0.5 1.0 1.50 1 2 3 4 5 6 7Computation time (s)Computation time (s)Figure 5.11: Estimated estimator means <strong>for</strong> estimating the digital delta. 15-stepdiscretisation on the left, 100-step discretisation on the right.


5.6 Estimation in the Heston Model 233Next, we consider st<strong>and</strong>ard deviation. The results can be found in Figure 5.12. Forboth the 15-step <strong>and</strong> 100-step experiments <strong>and</strong> both the FT <strong>and</strong> SE schemes, we seethat the Malliavin method is superior to the finite difference method <strong>and</strong> the localisedMalliavin method is superior to the ordinary Malliavin method.For the 15-step experiment, the SE scheme yields the better per<strong>for</strong>mance, while thesituation is reversed <strong>for</strong> the 100-step experiment. The explanation <strong>for</strong> this probablyhas something to do with the FT scheme being faster than the SE scheme since itis faster to generate normal distributions than noncentral χ 2 distributions. Thus, <strong>for</strong>fine discretisations, the resulting distributions of S T are nearly the same, but the FTscheme is computationally faster.St<strong>and</strong>ard deviation0.000 0.001 0.002 0.003 0.004 0.005 0.006SE Finite differenceFT Finite differenceSE MalliavinFT MalliavinLocalised SE MalliavinLocalised FT MalliavinSt<strong>and</strong>ard deviation0.000 0.001 0.002 0.003 0.004 0.005 0.006SE Finite differenceFT Finite differenceSE MalliavinFT MalliavinLocalised SE MalliavinLocalised FT Malliavin0.0 0.5 1.0 1.50 1 2 3 4 5 6 7Computation time (s)Computation time (s)Figure 5.12: Estimated estimator st<strong>and</strong>ard deviations <strong>for</strong> estimating the digital delta.15-step discretisation on the left, 100-step discretisation on the right.The difference between our results <strong>and</strong> those of Benhamou (2001). Wementioned in Section 5.4 that the expression we have obtained <strong>for</strong> the Malliavin methodis not the same as is obtained in Benhamou (2001). We will now explain what exactlythe difference is <strong>and</strong> investigate how this difference manifests itself in practice.We will first detail the model description <strong>and</strong> corresponding Malliavin estimator usedby ourselves <strong>and</strong> by Benhamou (2001). Our description of the Heston model <strong>and</strong> the


234 <strong>Mathematical</strong> <strong>Finance</strong>corresponding Malliavin estimator isdS t = µS(t) dt + √ σ(t)S(t) √ 1 − ρ 2 dW 1 (t) + √ σ(t)S(t)ρ dW 2 tdσ t = κ(θ − σ t ) dt + ν √ σ t dW 2 (t)(f(S T ) 1 ∫ TS 0 T∂∂S 0Ef(S T ) = E01√σt√1 − ρ2 dW 1 twhere (W 1 , W 2 ) is a two-dimensional Brownian motion. In contrast to this, Benhamou(2001) list the model description <strong>and</strong> Malliavin estimator as)dS t = µS(t) dt + σ(t)S(t) dB 1 (t)dσt 2 = κ(θ − σt 2 ) dt + νσ t dB 2 (t)(∂Ef(S T ) = E f(S T ) 1 ∫ )T1dBt1 ,∂S 0 S 0 T σ twhere (B 1 , B 2 ) is a two-dimensional Brownian motion with correlation ρ. Rewritingthis with our model <strong>for</strong>mulation, we obtain0,dS t = µS(t) dt + √ σ(t)S(t) √ 1 − ρ 2 dW 1 (t) + √ σ(t)S(t)ρ dW 2 tdσ t = κ(θ − σ t ) dt + ν √ σ t dW 2 (t)( (∂Ef(S T ) = E f(S T ) 1 √1− ρ2∂S 0 S 0 T∫ T0∫1T√ dWt 1 + ρσt 0))1√ dWt2 .σtObviously, there seems to be some difference here. The two expressions <strong>for</strong> the Malliavinestimator are equal in the case of zero correlation, but different whenever thereis nonzero correlation. We will now see how this difference manifests itself when calculatingthe digital delta. We will compare the results of the estimator of Benhamou(2001) with our estimators <strong>and</strong> the finite difference estimator. Since the estimator ofBenhamou (2001) involves an integral with respect to both W 1 <strong>and</strong> W 2 , we cannotuse the SE scheme with this method. We there<strong>for</strong>e restrict ourselves to comparing theMalliavin estimator of Benhamou (2001) under the FT scheme with our Malliavinestimator under both the FT <strong>and</strong> SE schemes. Finally, we also consider the finitedifference method under the SE scheme. Since we have already seen that the finitedifference method is quite insensitive to the choice of discretisation scheme, it shouldbe unnecessary to consider also the finite difference method under the FT scheme.We compare the digital delta estimators when the correlation varies between -0.95 <strong>and</strong>0.95. We use 300 replications of estimators based on 20000 simulations each. Theresults can be seen in the left graph of Figure 5.13. As expected, the results from


5.6 Estimation in the Heston Model 235Benhamou (2001) fits with our results in the case of zero correlation. However, it isquite obvious that when the correlation is nonzero, in particular when there is verystrong positive or negative correlation, the estimator from Benhamou (2001) does notfit with the otherwise very reliable finite difference estimator, whereas our Malliavinestimator works well.To do a more <strong>for</strong>mal check that there is an actual difference, we have in the rightgraph of Figure 5.13 plotted density estimates <strong>for</strong> the finite difference estimator <strong>and</strong>the Benhamou estimator <strong>for</strong> the case of correlation −0.95. The thin lines are momentmatchednormal densities. We see that the estimators are approximately normal, withthe finite difference estimator having some minor fluctuations in the right tail. Itwould there<strong>for</strong>e be reasonable to use the Welch t-test <strong>for</strong> testing equality of means <strong>for</strong>normal distributions with unequal variances to test whether the two estimators havedifferent means. The p-value of the test comes out less than 10 −15 . For comparison,when using the same test to compare our FT scheme based Malliavin estimator withthe finite difference estimator, we obtain a p-value of 71%. This strongly suggests thateven when taking the bias from the approximation to the Itô integral in the Benhamouestimator into account, the Benhamou estimator does not yield the correct value ofthe digital delta.Mean0.016 0.018 0.020 0.022 0.024LA Finite DifferenceFT MalliavinLA MalliavinFT BenhamouDensity0 100 200 300 400 500Finite differenceBenhamou−1.0 −0.5 0.0 0.5 1.00.016 0.018 0.020 0.022 0.024CorrelationValueFigure 5.13: Left: Comparison of the estimator from Benhamou (2001) with ourMalliavin estimators <strong>and</strong> a finite difference estimator. Right: Density plots <strong>for</strong> thefinite difference <strong>and</strong> Benhamou estimators <strong>for</strong> correlation -0.95. Thick lines are densityestimates, thin lines are normal approximations.


236 <strong>Mathematical</strong> <strong>Finance</strong>Method15-step digital deltaImprovement factorFT scheme, finite difference 1.0SE scheme, finite difference 0.9-1.1FT scheme, Malliavin 1.0-1.4SE scheme, Malliavin 1.0-2.5FT scheme, localised Malliavin 1.6-2.4SE scheme, localised Malliavin 2.6-3.8100-step digital deltaFT scheme, finite difference 1.0SE scheme, finite difference 0.9-1.1FT scheme, Malliavin 2.3-3.1SE scheme, Malliavin 1.3-2.4FT scheme, localised Malliavin 3.4-5.1SE scheme, localised Malliavin 2.7-3.8Table 5.2: Comparison of results <strong>for</strong> estimation of the digital delta in the Hestonmodel.Conclusions. We have considered the digital delta <strong>for</strong> three types of estimators, eachcombined with two different discretisation schemes. As in the previous section, weconclude that methods more exotic than just plain Monte Carlo with finite differencecan make a considerable difference. Table 5.5 shows the factor of improvement <strong>for</strong> thest<strong>and</strong>ard deviations <strong>for</strong> the various methods implemented compared to the simplestmethods.We see that the Malliavin <strong>and</strong> localised Malliavin methods indeed yield an improvement,but the improvement is somewhat disappointing compared to the results weobtained <strong>for</strong> the Black-Scholes model, where the likelihood ratio <strong>and</strong> localised likelihoodratio methods yielded improvement factors <strong>for</strong> the digital delta of respectively 5to 10 <strong>and</strong> 16 to 29. It is difficult to say why the improvement factors are so small inthis case. The Malliavin method requires no extra simulations, only the calculation ofthe approximation to the Itô integral, which is merely a simple sum.We also again note that while the SE scheme is most effective <strong>for</strong> the 15-step case,the FT scheme is most effective <strong>for</strong> the 100-step scheme. Now, the localised Malliavinmethod <strong>for</strong> the 15-step case yielded a bias of up to 3.7%, which may be a little much.One might get more satisfactory results using, say, a 30-step discretisation. In this


5.7 Notes 237case, the SE scheme is probably still the most effective. We conclude that <strong>for</strong> practicalpurposes, the SE scheme should be preferred above the FT scheme.5.7 NotesThe main references <strong>for</strong> the material in Section 5.1 are Björk (2004) <strong>and</strong> Øksendal(2005), Chapter 12. Other useful resources are Karatzas & Shreve (1998) <strong>and</strong> Steele(2000). Our results are not as elegant as they could be because of the limited theoryavailable to us. In general, the result of Theorem 5.1.5 on the existence of EMMs<strong>and</strong> freedom from arbitrage goes both ways, in the sense that under certain regularityconditions <strong>and</strong> appropriate concepts of freedom from arbitrage, a market is free fromarbitrage if <strong>and</strong> only if there exists an EMM. For a precise result, see <strong>for</strong> exampleDelbaen & Schachermayer (1997), Theorem 3. The theory associated with the necessityof the existence of EMMs in order to preclude arbitrage is rather difficult <strong>and</strong> ispresented in detail in Delbaen & Schachermayer (2008). The simplest introduction tothe field seems to be Levental & Skorohod (1995).All of the Monte Carlo methods introduced in Section 5.3 <strong>and</strong> Section 5.4 except theresults on the Malliavin method are described in the excellent Glasserman (2003).Rigorous resources on the Malliavin calculus applied to calculation of risk numbersare hard to come by. Even today, the paper Fournié et al. (1999) seems to be thebest, because of its very clear statement of results. Di Nunno et al. (2008) is alsorefreshingly explicit. Other papers on the applications of the Malliavin calculus <strong>for</strong>calculations of risk numbers are Fournié et al. (2001), which extends the investigationsstarted in Fournié et al. (1999), <strong>and</strong> also Benhamou (2001) <strong>and</strong> Benhamou (2003). Avery interesting paper is Chen & Glasserman (2006), which proves, using results fromthe theory of weak convergence of SDEs, that the Malliavin method in a reasonablylarge class of cases is equivalent to the limit of an average of likehood ratio estimators<strong>for</strong> Euler discretisations of the SDEs under consideration. Montero & Kohatsu-Higa(2002) provides an in<strong>for</strong>mal overview of the role of Malliavin calculus in the calculationof risk numbers. Nualart (2006), Chapter 6, also provides some resources.Other applications of the Malliavin calculus in finance such as the calculation of conditionalexpectations, insider trading <strong>and</strong> explicit hedging strategies are discussed inSchröter (2007), Di Nunno et al. (2008), Nualart (2006) <strong>and</strong> Malliavin & Thalmaier(2005). Here, Schröter (2007) is very in<strong>for</strong>mal, <strong>and</strong> Malliavin & Thalmaier (2005) is


238 <strong>Mathematical</strong> <strong>Finance</strong>extremely difficult to read.Much of the SDE theory used in the derivation of the Malliavin estimator in Section5.4 <strong>and</strong> Fournié et al. (1999) is described in Karatzas & Shreve (1988) <strong>and</strong> Rogers& Williams (2000b). Readable results on differentiation of flows is hard to find. Theclassical resource is Kunita (1990). A seemingly more readable proof of the mainresult can be found in Ikeda & Watanabe (1989), Proposition 2.2 of Chapter V.Several of the numerical results of Section 5.5 can be compared with those of Fourniéet al. (1999) <strong>and</strong> Benhamou (2001). Glasserman (2003) is, as usual, a useful generalguide to the effectivity of Monte Carlo methods.The proof in Section 5.6 that the Heston model does not allow arbitrage is based onthe techniques of Cheridito et al. (2005), specifically the proof of Theorem 1 in thatpaper. Numerical results on the per<strong>for</strong>mance of Malliavin methods <strong>for</strong> the calculationof risk numbers in the Heston model are, puzzlingly, very difficult to find, actuallyseemlingly impossible to find. Also, the only paper giving an explicit <strong>for</strong>m of theMalliavin weight <strong>for</strong> the Heston model seems to be Benhamou (2001), so it is unclearwhether it is generally understood, as we concluded, that the <strong>for</strong>m given is erroneous.There are, however, several other papers on applications of the Malliavin calculus tothe Heston model, but these applications are not on the calculation of risk numbers.See <strong>for</strong> example Alòs & Ewald (2008).Pertaining discretisation of the Heston model, Andersen (2008) seems to be the bestresource available, <strong>and</strong> contains references to many other papers on the subject. Foran curious paper giving a semi-analytical <strong>for</strong>mula <strong>for</strong> the density of the log of the assetprice in the Heston model, see Dragulescu & Yakovenko (2002). This could perhapsbe useful as a benchmark <strong>for</strong> the true density.The methods <strong>for</strong> calculating risk numbers considered here all have one major advantageover the finite difference method, an advantage which is not seen in our context. In ourcase, we only consider risk numbers with respect to changes in the initial value S 0 of thethe underlying asset. Since S T is linear in S 0 , we can in the finite difference methodreuse our simulations <strong>for</strong> the estimation of both means by scaling our simulations.There<strong>for</strong>e, in spite of the finite difference method requiring the calculation of twomeans instead of just one, as in the other methods we consider, the computational timespent on simulations is not much different from the other methods. For risk numberswith respect to general parameters, the connection between S T <strong>and</strong> the risk number


5.7 Notes 239variable is not as straight<strong>for</strong>ward, <strong>and</strong> in such cases we would have to simulate twice asmany variables in the finite difference method as <strong>for</strong> the other methods <strong>for</strong> calculatingrisk numbers. When having to calculate many different risk numbers, which is oftenthe case in practice, this observation can have a major impact on computational time.All of the numerical work in this chapter is done in R. We also considered usingthe language OCaml instead, hoping <strong>for</strong> improvements in both elegance <strong>and</strong> speed.However, in the end, the mediocrity of the IDE available <strong>for</strong> OCaml <strong>and</strong> the strong<strong>and</strong> efficient statistical library available <strong>for</strong> R made R the better choice.


240 <strong>Mathematical</strong> <strong>Finance</strong>


Chapter 6ConclusionsIn this chapter, we will discuss the results which we have obtained in all three mainparts of the thesis, <strong>and</strong> we will consider opportunities <strong>for</strong> extending this work.6.1 Discussion of resultsWe wil discuss the results obtained <strong>for</strong> stochastic integrals, the Malliavin calculus <strong>and</strong>the applications to mathematical finance.<strong>Stochastic</strong> integrals. In Chapter 3, we successfully managed to develop the theoryof stochastic integration <strong>for</strong> integrators of the <strong>for</strong>m A t + ∑ n∫ tk=1 0 Y tk dWt k , where Ais a continuous process of finite variation <strong>and</strong> Y k ∈ L 2 (W ), <strong>and</strong> we proved the basicresults of the theory without reference to any external results. However, the increase inrigor compared to Øksendal (2005) or Steele (2000) came at some price, resulting insome honestly very tedious work. Furthermore, while we managed to prove Girsanov’stheorem without reference to external results, the resulting proof was rather longcompared to the modern proofs based on stochastic integration <strong>for</strong> general continuouslocal martingales.We must there<strong>for</strong>e conclude that the ideal presentation of the theory would be basedon general continuous local martingales. And with the proof of the existence of the


242 Conclusionsquadratic variation of Section C.1, the biggest difficulty of that theory could be convenientlyseparated from the actual development of the integral.The Malliavin calculus. We presented in Chapter 3 the basic results on the Malliavinderivative. While we only managed to cover a very small fraction of the resultsin Nualart (2006), we have managed to develop the proofs in much higher detail,including the development of some results used in the proofs such as Lemma 4.4.19which curiously are completely absent from other accounts of the theory. The valueof this work is considerable, clarifying the fundamentals of the theory <strong>and</strong> there<strong>for</strong>epaving the way <strong>for</strong> higher levels of detail in further proofs as well.Also, the extension of the chain rule proved in Theorem 4.3.5 is an important newresult. Considering how fundamental the chain rule is to the theory, this extensionshould make the Malliavin derivative somewhat easier to work with.<strong>Mathematical</strong> finance. Chapter 5 contained our work on mathematical finance. Wemanaged to use the theory developed Chapter 3 to set up a basic class of financialmarket models <strong>and</strong> proved sufficient conditions <strong>for</strong> absence of arbitrage, demonstratingthat the theory of stochastic integration developed in Chapter 3 was sufficiently richto obtain reasonable results. The use of the Theorem 3.8.4 in the proof of Theorem5.1.5 <strong>and</strong> the Itô representation theorem in the proof of Theorem 5.1.11 were crucial,details which are often skipped in other accounts.We also made an in-depth investigation of different estimation methods <strong>for</strong> prices<strong>and</strong> sensitivities in the Black-Scholes model, documenting the high effectivity of moreadvanced Monte Carlo methods <strong>and</strong> confirming results found in the literature. Weapplied the Malliavin methods to the Heston, obtaining new results. We saw that theefficiency gains seen in the Black-Scholes case could not be sustained in the Hestonmodel. We also noted a problem with the Malliavin estimator given Benhamou (2001)<strong>and</strong> corrected the error.6.2 Opportunities <strong>for</strong> further workAll three main parts of the thesis presents considerable oppportunities <strong>for</strong> further work.We will now detail some of these.


6.2 Opportunities <strong>for</strong> further work 243<strong>Stochastic</strong> integration <strong>for</strong> discontinuous FV processes. It is generally acceptedthat while the theory of stochastic integration <strong>for</strong> continuous semimartingales is somewhatmanageable, the corresponding theory <strong>for</strong> general semimartingales is very difficult,as can be seen <strong>for</strong> example in Chapter VI of Rogers & Williams (2000b) or inProtter (2005). Nonetheless, financial models with jumps are becoming increasinglypopular, see <strong>for</strong> example Cont & Tankov (2004). It would there<strong>for</strong>e be convenient tohave a theory of stochastic integration at h<strong>and</strong> which covers the necessary results. Tothis end, we observe that the vast majority of financial models with jumps in use onlyuses the addition of a compound poisson process as a driving term in, say, the assetprice. Such a process is of finite variation. There<strong>for</strong>e, if it were possible to develop atheory of stochastic integration <strong>for</strong> integrators of the <strong>for</strong>m A+M where A is a possiblydiscontinuous process of finite variation <strong>and</strong> M is a continuous local martingale, onecould potentially obtain a simplification of the general theory which would still be versatileenough to cover most financial applications. A theory alike to this is developedin Chapter 11 of Shreve (2004).Measurability results. There are several problems relating to measurability propertiesin the further theory of the Malliavin calculus. As outlined in Section 4.5, if uis in the domain of the Skorohod integral with u t ∈ D 1,2 <strong>for</strong> all t ≤ T <strong>and</strong> there aremeasurable versions of (t, s) ↦→ D t u s <strong>and</strong> ∫ T0 D tu s δW s , then∫ tD t u s δW s =0∫ T0D t u s δW s + u t .It would be useful <strong>for</strong> the theory if there were general criteria to ensure that all of themeasurability properties necessary <strong>for</strong> this to hold were satisfied. Another problemraises its head when considering the Itô-Wiener expansion. Also described in Section4.5, this is a series expansion <strong>for</strong> variables F ∈ L 2 (F T ), stating that there exists squareintegrabledeterministic functions f n on ∆ n such that we have the L 2 -summation<strong>for</strong>mulaF =∞∑n=0∫ T ∫ tn∫ t3∫ t2n! · · · f n (t) dW (t 1 ) · · · dW (t n ).0 0 0 0Here, it is not trivial to realize that the iterated integrals are well-defined. To see thenature of the problem, consider defining the single iterated integral∫ T ∫ t200f(t 1 , t 2 ) dW (t 1 ) dW (t 2 ).Fix t 2 ≤ T . The process ∫ t0 f(t 1, t 2 ) dW (t 1 ) is well-defined if t 1 ↦→ f(t 1 , t 2 ) is progressivelymeasurable. However, <strong>for</strong> varying t 2 , the processes ∫ t0 f(t 1, t 2 ) dW (t 1 ) are


244 Conclusionsbased on different integr<strong>and</strong>s, <strong>and</strong> there is there<strong>for</strong>e a priori no guarantee that thesecan be combined in such a way as to make t 2 ↦→ ∫ t 20 f(t 1, t 2 ) dW (t 1 ) progressivelymeasurable, which is necessary to be able to integrate this process with respect tothe Brownian motion. Thus, it is not clear at all how to make the statement of theItô-Wiener expansion rigorous. A further problem is that in the proof of the expansionbased on the Itô representation theorem, it is necessary to know that the integr<strong>and</strong>sin the representations can be chosen in certain measurable ways.The simplest problems are those involving the Wiener-Itô expansion. As usual, puttingθ(f) = ∫ T0 f(t) dW t, θ is a continuous mapping from L 2 [0, T ] to L 2 (F T ). Furthermore,if we let R(X) denote the Itô representation of a variable X ∈ L 2 (F T ) in the sensethatX =∫ T0R(X) t dW t ,then R is a continuous linear mapping from L 2 (F T ) to L 2 (Σ π [0, T ]). Thus, the problemsrelated to the Wiener-Itô expansion are both in some sense about iterating continuouslinear operators over L 2 spaces. In contrast to this, the problems related to theMalliavin derivative are in some sense about selecting measurable versions of closedlinear operators over subspaces of L 2 spaces.It is not clear how to solve these problems in general. However, a first step would be toshow a result which <strong>for</strong> example would allow us to, given a process X indexed by [0, T ] 2<strong>and</strong> being B[0, T ] 2 ⊗ Σ π [0, T ] measurable, select a version of (s, t) ↦→ ∫ tX(s, t) dW (t)0which is B[0, T ] 2 ⊗ Σ π [0, T ] measurable. Applying this several times in an appropriatemanner could help defining the iterated integrals, <strong>and</strong> it is also a result of this typenecessary in the Malliavin calculus.In order to cover all of our cases, let us consider a mapping A : L 2 (E, E) → L 2 (K, K)between two L 2 spaces. We endow these L 2 spaces with their Borel-σ-algebras B(E)<strong>and</strong> B(K) <strong>and</strong> assume that A is B(E)-B(K) measurable. This obviously covers thecases from the Wiener-Itô expansion, since continuous mappings are measurable. Andby Corollary, 4.4.16, the Malliavin derivative is continous on H n <strong>for</strong> each n ≥ 0,where L 2 (F T ) = ⊕ ∞ n=0H n , leading us to suspect that the Malliavin derivative indeedis measurable as well on its domain.Let (D, D) be another measurable space. We will outline how to prove the followingresult: If X : D × E → R is D ⊗ E measurable such that <strong>for</strong> each x ∈ D it holds thatX(x, ·) ∈ L 2 (E, E), there exists a D ⊗ K measurable version of (x, y) ↦→ A(X(x, ·))(y).


6.2 Opportunities <strong>for</strong> further work 245This can be directly applied to obtain measurable versions of stochastic integr<strong>and</strong>swhere only one coordinate is integrated.Let X : D × E → R be given with the properties stated above. We first defineY : D → L 2 (E, E) by putting Y (x)(y) = X(x, y). By a monotone class argument,we prove that Y is D-B(E) measurable. We can then use that A is measurable toobtain that x ↦→ A(Y (x)) is D-B(K) measurable. Defining Z(x, z) = A(Y (x))(z), Zis a mapping from D × K to R. By definition, Z(x, ·) = A(Y (x)) = A(X(x, ·)). Weare done if we can argue that there exists a measurable version of Z. This is the mostdifficult part of the argument. It is easy in the case where x ↦→ A(Y (x)) is simple.For the general case, we use that L 2 convergence implies almost sure convergence <strong>for</strong>a subsequence. Selecting a subsequence <strong>for</strong> each x in a measurable manner, we canobtain a measurable version. The argument is something like that found in Protter(2005), Theorem IV.63. However, the technique of that proof depends on the linearityof the integral mapping. We need to utilize measurability properties directly, makingthe proof more cumbersome.Risk numbers. In our investigation of the Malliavin method <strong>for</strong> the Heston model,we only considered the digital delta. Furthermore, we only applied a few extra MonteCarlo methods to our routines, namely latin hypercube sampling <strong>and</strong> antithetic sampling.There are several other methods available, such as those of Fries & Yoshi (2008)or those detailed in Glasserman (2003), which would be interesting to attempt to applyto the Heston model.Furthermore, it would be interesting to investigate the efficiency gains possible <strong>for</strong>path-dependent options, both of european, bermudan or american type, <strong>and</strong> it wouldbe interesting to extend the results to other models. Here, the extension to modelswith jumps poses an extra challenge, as the Malliavin calculus we have developed onlyworks in the case of Brownian motion. For Malliavin calculus <strong>for</strong> processes with jumpssuch as Lévy processes, see Di Nunno et al. (2008).


246 Conclusions


Appendix A<strong>Analysis</strong>In this appendix, we develop the results from pure analysis that we will need. Wewill give some main theorems from analysis, <strong>and</strong> we will consider the basic resultson spaces L p (µ), Cc∞ (R n ) <strong>and</strong> C ∞ (R n ). We also consider Hermite polynomials <strong>and</strong>Hilbert spaces. The appendix is more or less a smorgasbord, containing a large varietyof different results. For some results, references are given to proofs. For other results,full proofs are given. The reason <strong>for</strong> using so much energy on these proofs is basicallythat there does not seem to be any simple set of sources giving all the necessary results.Our main sources <strong>for</strong> the real analysis are Carothers (2000), Berg & Madsen (2001),Rudin (1987) <strong>and</strong> Cohn (1980). For Hilbert space theory, we use Rudin (1987),Hansen (2006) <strong>and</strong> Meise & Vogt (1997).A.1 Functions of finite variationWe say that a function F : [0, ∞) → R has finite variation ifn∑V F (t) = sup |F (t k ) − F (t k−1 )|k=1is finite <strong>for</strong> all t > 0, where the supremum is taken over all partitions of [0, t]. In thiscase, we call V F the variation of F over [0, t]. We will give the basic properties offunctions of finite variation. This will be used when defining the stochastic integral


248 <strong>Analysis</strong><strong>for</strong> integrators which have paths of finite variation.Lemma A.1.1. V F is continuous if <strong>and</strong> only of F is continuous.Proof. See Carothers (2000), Theorem 13.9.Theorem A.1.2. There is a unique measure µ F such that µ F (a, b] = F (b) − F (a) <strong>for</strong>any 0 ≤ a ≤ b < ∞.Proof. See Theorem 3.2.6 of Dudley (2002) <strong>for</strong> the case where F is monotone. Thegeneral case follows easily by applying the decomposition of F into a difference ofmonotone functions.Lemma A.1.3. If F is continuous, then µ F has no atoms.Proof. This follows directly from the definition of µ F given in Theorem A.1.2.Since functions of finite variation corresponds to measures by Theorem A.1.2, we c<strong>and</strong>efine the integral with respect to a function of finite variation.Definition A.1.4. We put L 1 (F ) = L 1 (µ F ) <strong>and</strong> define, <strong>for</strong> any function x ∈ L 1 (F ),∫ t0 x u dF u = ∫ t0 x u dµ F (u).Lemma A.1.5. Let F be continuous <strong>and</strong> of finite variation. If g is continuous, theng is integrable with respect to F over [0, t], <strong>and</strong> the Riemann sumsn∑g(t k−1 )(F (t k ) − F (t k−1 ))k=1converge to ∫ tg(s) dF (s) as the mesh of the partition tends to zero.0Proof. We haven∑g(t k−1 )(F (t k ) − F (t k−1 )) =k=1n∑g(t k−1 )µ F ((t k−1 , t k ]) =k=1∫ t0s n dµ F ,where s n = ∑ nk=1 g(t k−1)1 (tk−1 ,t k ]. As the mesh tends to zero, s n tends to g by theuni<strong>for</strong>m continuity of g on [0, t]. Since g is continous, it is bounded on [0, t]. There<strong>for</strong>e,by dominated convergence, ∫ t0 s n dµ F tends to ∫ t0 g(s) dµ F (s), as desired.


A.2 Some classical results 249Lemma A.1.6. If x ∈ L 1 (F ), then t ↦→ ∫ t0 x u dF (u) has finite variation.Proof. This follows by decomposing F into a difference between monotone functions<strong>and</strong> applying the linearity of the integral in the integrator.A.2 Some classical resultsIn this section, we collect a few assorted results. These will be used in basicallyevery part of the thesis. Gronwall’s lemma is used in the proof of Itô’s representationtheorem, the multidimensional mean value theorem is used in various approximationarguments, the Lebesgue differentiation theorem, is used in density arguments <strong>for</strong> theItô integral <strong>and</strong> the duality result on L p <strong>and</strong> L q is used when extending the Malliavinderivative.Lemma A.2.1 (Gronwall). Let g, b : [0, ∞) → [0, ∞) be two measurable mappings<strong>and</strong> let a be a nonnegative constant. Let T > 0 <strong>and</strong> assume ∫ Tb(s) ds < ∞. If0g(t) ≤ a + ∫ t0 g(s)b(s) ds <strong>for</strong> t ≤ T , then g(t) ≤ a exp(∫ tb(s) ds) <strong>for</strong> t ≤ T .0Proof. Define B(t) = ∫ t0 b(s) ds <strong>and</strong> G(t) = ∫ tg(s)b(s) ds. Then0ddt e−B(t) G(t) = e −B(t) b(t)g(t)−e −B(t) G(t)b(t) = e −B(t) b(t)(g(t)−G(t)) ≤ ae −B(t) b(t)<strong>for</strong> t ≤ T . Integrating this equation from 0 to T , we finde −B(t) G(t) ≤∫ tThis shows G(t) ≤ a(e B(t) − 1) <strong>and</strong> there<strong>for</strong>eas was to be proved.g(t) ≤ a +0∫ t0(ae −B(s) b(s) ds = a 1 − e −B(t)) .g(s)b(s) ds = a + G(t) ≤ ae B(t) ,Lemma A.2.2. Let ϕ ∈ C 1 (R). For any distinct x, y ∈ R n , there exists ξ on the linebetween x <strong>and</strong> y such thatϕ(y) − ϕ(x) =n∑k=1∂ϕ∂x k(ξ k )(y k − x k ).


250 <strong>Analysis</strong>Proof. Define g : [0, 1] → R n by g(t) = ty + (1 − t)x. Then, by the ordinary meanvalue theorem <strong>and</strong> the chain rule, <strong>for</strong> some θ ∈ [0, 1],as desired.f(y) − f(x) = (f ◦ g)(1) − (f ◦ g)(0)= (f ◦ g) ′ (θ)n∑ ∂f= (g(θ))g∂xk(θ)′k=k=1n∑k=1∂f∂x k(θy + (1 − θ)x)(y k − x k ),Theorem A.2.3 (Lebesgue’s differentiation theorem). Let f ∈ L 1 (R) <strong>and</strong> putF (x) = ∫ xf(y) dy. Then F is almost everywhere differentiable with derivative f.−∞Proof. See Rudin (1987), Theorem 7.11 <strong>and</strong> Theorem 7.7.Corollary A.2.4. Let f : R → R be bounded. Then it holds thatalmost everywhere <strong>for</strong> t ∈ R.lim 1 1n∫ tt− 1 nf(x) dx = f(t)Proof. Consider any bounded interval (a, b), <strong>and</strong> put g (a,b) = f(x)1 (a,b) (x). Theng ∈ L 1 (R), <strong>and</strong> theorem A.2.3 yields <strong>for</strong> almost all t ∈ (a, b) thatlim 1 1n∫ tt− 1 nf(x) dx = lim 1 1n∫ tt− 1 ng(x) dx = g(t) = f(t).Since a countable union of null sets is again a null set, the conclusion of the lemmafollows.Theorem A.2.5. Let p, q ≥ 1 be dual exponents <strong>and</strong> let (E, E, µ) be a measure space.The dual of L p (E, E, µ) is isometrically isomorphic to L q (E, E, µ), <strong>and</strong> an isometricisomorphism ϕ : L q (E, E, µ) → L p (E, E, µ) ′ is given by∫ϕ(G)(F ) = G(x)F (x) dµ(x).In particular, an element F ∈ L p (E, E, µ) is almost surely zero if <strong>and</strong> only if it holdsthat ∫ G(x)F (x) dµ(x) = 0 <strong>for</strong> all G ∈ L q (E, E, µ).


A.3 Dirac families <strong>and</strong> Urysohn’s lemma 251Proof. For the duality statement <strong>and</strong> the isometric isomorphism, see Proposition 13.13of Meise & Vogt (1997). The last statement of the theorem follows from Proposition6.10 of Meise & Vogt (1997).A.3 Dirac families <strong>and</strong> Urysohn’s lemmaIn this section, we wil develop a version of Urysohn’s Lemma <strong>for</strong> smooth mappings.We first state the original Urysohn’s Lemma.Theorem A.3.1 (Urysohn’s Lemma). Let K be a compact set in R n , <strong>and</strong> let V bean open set. Assume K ⊆ V . There exists f ∈ C c (R n ) such that K ≺ f ≺ V .Proof. This is proven in Rudin (1987), Theorem 2.12.The mappings which exist by Theorem A.3.1 are called bump functions. Our goalis to extend Theorem A.3.1 to show the existence of bump functions which are notonly continuous, but are smooth. To do so, recall that <strong>for</strong> two Lebesgue integrablemappings f <strong>and</strong> g, the convolution f ∗ g is defined by (f ∗ g)(x) = ∫ f(x − y)g(y) dy.Fundamental results on the convolution operation can be found in Chapter 8 of Rudin(1987). The convolution of f with g has an averaging effect. The following two resultswill allow us to use this averaging effect to smooth out nonsmooth functions, allowingus to prove the smooth version of Urysohn’s lemma.Lemma A.3.2. If f : R n → R is Lebesgue integrable <strong>and</strong> g ∈ C ∞ (R n ), then theconvolution f ∗ g is in C ∞ (R n ).Proof. By definition, (f ∗ g)(x) = ∫ f(y)g(x − y) dy. Now, since g is differentiable,g is bounded on bounded sets <strong>and</strong> there<strong>for</strong>e, by Theorem 8.14 in Hansen (2004b),(f ∗g)(x) is differentiable in the k’th direction with ∂∂x k(f ∗g)(x) = ∫ f(y) ∂g∂x k(x−y) dy.Here ∂g∂x kis in C ∞ (R n ), so it follows inductively that f ∗ g ∈ C ∞ (R n ).Theorem A.3.3 (Dirac family). Let µ be a Radon measure on R n <strong>and</strong> let ‖ · ‖ beany norm on R n . There exists a family of mappings ψ ε ∈ C ∞ (R n ) such that ψ ε ≥ 0,suppψ ε ⊆ B ε (0) <strong>and</strong> ∫ ψ ε (x) dµ(x) = 1. Here, B ε (0) is the ε-ball in ‖ · ‖. We call (ψ ε )a Dirac family with respect to ‖ · ‖ <strong>and</strong> µ.


252 <strong>Analysis</strong>Proof. Since all norms on R n are equivalent, it will suffice to prove the result <strong>for</strong> ‖ · ‖ 2 .There<strong>for</strong>e, B ε (0) will in the following denoted the centered ε-ball in ‖ · ‖ 2 . As seenin Berg & Madsen (2001), page 190, there exists χ ∈ C ∞ (R n ) such that 0 ≤ χ ≤ 1<strong>and</strong> suppχ ⊆ B 1 (0). Then χ ε given by χ ε (x) = χ( x ε ) is also in C∞ (R n ), 0 ≤ χ ε ≤ 1<strong>and</strong> suppχ ε ⊆ B ε (0). Since µ is Radon, any open ball in R n has finite µ-measure, <strong>and</strong>there<strong>for</strong>e χ ε ∈ L 1 (µ). Putting ψ ε = 1‖χ ε ‖ 1χ ε , ψ ε then has the desired properties.Theorem A.3.4 (Smooth Urysohn’s Lemma). Let K be a compact set in R n , <strong>and</strong>let V be an open set. Assume K ⊆ V . There exists f ∈ C ∞ c (R n ) such that K ≺ f ≺ V .Proof. By Theorem A.3.1, there exists f ∈ C c (R n ) such that K ≺ f ≺ V . Letε ∈ (0, 1 2) <strong>and</strong> choose δ parrying ε <strong>for</strong> the uni<strong>for</strong>m continuity of f. Define the setsK ′ = f −1 ([1 − ε, 1]) <strong>and</strong> V ′ = f −1 (ε, ∞).Since K ≺ f ≺ V , clearly K ⊆ K ′ ⊆ V ′ ⊆ V . Furthermore, K is compact <strong>and</strong> V isopen. If x is such that ‖y − x‖ ≤ δ <strong>for</strong> some y ∈ K, then |f(x) − 1| = |f(x) − f(y)| ≤ ε<strong>and</strong> there<strong>for</strong>e f(x) ≥ 1 − ε, so x ∈ K ′ . On the other h<strong>and</strong>, if x ∈ V ′ then f(x) > ε,so if ‖y − x‖ ≤ δ, |f(y) − f(x)| ≤ ε <strong>and</strong> there<strong>for</strong>e f(y) > 0, showing y ∈ V . In otherwords, B δ (x) ⊆ V .Now let g ∈ C c (R n ) with K ′ ≺ g ≺ V ′ <strong>and</strong> let h = g ∗ψ δ . We claim that h satisfies theproperties in the theorem. By Lemma A.3.2, h ∈ C ∞ c (R n ). We will show that h ≺ V .Let x ∈ V , we need to prove that h(x) = 0. We have h(x) = ∫ g(y)ψ δ (x − y) dy. Now,whenever ‖x − y‖ ≤ δ, the integr<strong>and</strong> is zero because ψ δ (x − y) is zero. On the otherh<strong>and</strong>, if ‖x − y‖ > δ, we must have y /∈ V ′ , because if y ∈ V ′ , B δ (y) ⊆ V . There<strong>for</strong>e,g(y) is zero <strong>and</strong> we again find that the integr<strong>and</strong> is zero. Thus, h(x) is zero <strong>and</strong> h ≺ V .Next, we show that K ≺ h. Let x ∈ K. Whenever ‖y − x‖ ≤ δ, we have y ∈ K ′ <strong>and</strong>there<strong>for</strong>e g(y) is one. On the other h<strong>and</strong>, whenever ‖y − x‖ > δ, ψ δ is zero. Thus,h(x) = ∫ g(y)ψ δ (x − y) dy = 1 <strong>and</strong> K ≺ h. We conclude K ≺ h ≺ V , as desired.We are also going to need a version of Urysohn’s lemma <strong>for</strong> closed sets instead ofcompact sets.Lemma A.3.5. Let F be a closed set in R n , <strong>and</strong> let V be an open set. Assume F ⊆ V .There exists f ∈ C ∞ (R n ) such that F ≺ f ≺ V .Proof. Defining f(x) =d(x,V c )d(x,F )+d(x,V c ), we find that f is continuous <strong>and</strong> F ≺ f ≺ V .


A.4 Approximation <strong>and</strong> density results 253Using the same technique as in the proof of Theorem A.3.4, we extend this to providesmoothness of the bump function.A.4 Approximation <strong>and</strong> density resultsIn this section, we will review an assortment of approximation <strong>and</strong> density resultswhich will be used throughout the thesis. In particular the Malliavin calculus makesgood use of various approximation results.Whenever our approximation results involve smooth functions, we will almost alwaysapply a Dirac family as a mollifier <strong>and</strong> take convolutions. The next three results areapplications of this technique, showing how to approximate respectively continuous,Lipschitz <strong>and</strong> a class of differentiable mappings.Lemma A.4.1. Let f ∈ C(R n ), <strong>and</strong> let ψ ε be a Dirac family. Then f ∗ ψ ε convergesuni<strong>for</strong>mly to f on compacts.Proof. Let M > 0. For any x with ‖x‖ 2 ≤ M, we have|f(x) − (f ∗ ψ ε )(x)| =≤∫∫∣ f(x)ψ ε (x − y) dy − f(y)ψ ε (x − y) dy∣∫|f(x) − f(y)|ψ ε (x − y) dy≤sup |f(x) − f(y)|.y∈B ε (x)We may there<strong>for</strong>e concludesup |f(x) − (f ∗ ψ ε )(x)| ≤‖x‖ 2 ≤Msupsup‖x‖ 2 ≤M y∈B ε (x)|f(x) − f(y)|.Since f is continuous, f is uni<strong>for</strong>mly continuous on compact sets. In particular, f isuni<strong>for</strong>mly continuous on B M+δ (0) <strong>for</strong> any δ > 0, <strong>and</strong> there<strong>for</strong>e the above tends tozero.Lemma A.4.2. Let f : R n → R be a Lipschitz mapping with respect to ‖ · ‖ ∞ suchthat |f(x) − f(y)| ≤ K‖x − y‖ ∞ . There exists a sequence of mappings (g ε ) in C ∞ (R n )with partial derivatives bounded by K such that g ε converges uni<strong>for</strong>mly to f.


254 <strong>Analysis</strong>Proof. Let (ψ ε ) be a Dirac family on R n with respect to ‖ · ‖ ∞ <strong>and</strong> define g ε = f ∗ ψ ε .Then g ε ∈ C ∞ (R n ), <strong>and</strong> we clearly have, with e k the k’th unit vector,|g ε (x + he k ) − g ε (x)| =≤≤∫∫∣ f(x + he k − y)ψ ε (y) dy − f(x − y)ψ ε (y) dy∣∫|f(x + he k − y) − f(x)|ψ ε (y) dyKh.It follows that g ε has partial derivatives bounded by K. It remains to show uni<strong>for</strong>mconvergence of g ε to f, but this follows since|f(x) − g ε (x)| =≤≤≤∫∫∣ f(x)ψ ε (x − y) dy − f(y)ψ ε (x − y) dy∣∫|f(x) − f(y)|ψ ε (x − y) dy∫K ‖x − y‖ ∞ ψ ε (x − y) dyεK.The proof is finished.Lemma A.4.3. Let f ∈ C 1 (R n ), <strong>and</strong> assume that f has bounded partial derivatives.The exists a sequence (g ε ) ⊆ C ∞ p (R n ) with the following properties:1. g ε converges uni<strong>for</strong>mly to f.2. ∂ xk g ε converges pointwise to ∂ xk f <strong>for</strong> k ≤ n.3. ‖∂ xk g‖ ∞ ≤ ‖∂ xk f‖ ∞ <strong>for</strong> k ≤ n.Proof. Let (ψ ε ) be a Dirac family on R n . Define g ε = f ∗ ψ ε . We then have g ε ∈C ∞ (R n ). We claim that g ε satisfies the properties of the lemma.


A.4 Approximation <strong>and</strong> density results 255Step 1: Uni<strong>for</strong>m convergence. We find∫∫|f(x) − g ε (x)| =∣ f(x)ψ ε (x − y) dy − f(y)ψ ε (x − y) dy∣∫≤ |f(x) − f(y)|ψ ε (x − y) dy∫ ∣ n ∣∣∣∑ ∂f≤(ξ k )(x k − y k )∂x k ∣ ψ ε(x − y) dyk=1∫≤ max∂fk≤n∥ (ξ k )∂x k∥ ‖x − y‖ 1 ψ ε (x − y) dy∞≤√ ∫n max∂fk≤n∥ (ξ k )∂x k∥ ‖x − y‖ 2 ψ ε (x − y) dy∞≤ ε √ n max∂fk≤n∥ (ξ k )∂x k∥ ,∞so g ε converges uni<strong>for</strong>mly to f.Step 2: Pointwise convergence of partial derivatives. We have∂g ε(x) =∂ ∫∫∂f(x − y)ψ ε (y) dy = f(x − y)ψ ε (y) dy = (∂ xk f) ∗ ψ ε .∂x k ∂x k ∂x kBy the same calculations as in the first step, we there<strong>for</strong>e find∫|∂ xk g ε (x) − ∂ xk f(x)| ≤ |∂ xk f(x) − ∂ xk f(y)|ψ ε (x − y) dy≤sup |∂ xk f(x) − ∂ xk f(y)|,y∈B ε (x)where B ε (x) denotes the ε-ball around x in ‖·‖ 2 . Continuity of ∂ xk f shows the desiredpointwise convergence.Step 3: Boundedness of partial derivatives. To show the boundedness conditionon the partial derivatives of g ε , we calculate∣ ∫∣∂g ε(x)∂x k∣ =∂∣ f(x − y)ψ ε (y) dy∂x k∣∫ ∣ ∣∣∣ ∂f≤ (x − y)∂x k∣ ψ ε(y) dy∥ ≤∂f ∥∥∥∞∥ ,∂x kshowing the final claim of the lemma. This also shows g ε ∈ C ∞ p (R n ).


256 <strong>Analysis</strong>We now prove that <strong>for</strong> any Radon measure µ on R n <strong>and</strong> any p ≥ 1, the space C ∞ c (R n )is dense in L p (µ).Lemma A.4.4. Cc∞ (R n ) is uni<strong>for</strong>mly dense in C c (R n ). Furthermore, <strong>for</strong> any elementf ∈ C c (R n ), the approximating sequence in Cc∞ (R n ) can be taken to have a commoncompact support.Proof. Let f ∈ C c (R n ), <strong>and</strong> let ψ ε be a Dirac family on R n . Let M > 0 be suchthat suppf ⊆ B M (0). Put f ε = f ∗ ψ ε . By Lemma A.3.2, f ε ∈ C ∞ (R n ). And sincesuppψ ε ⊆ B ε (0), it holds that <strong>for</strong> x with ‖x‖ 2 ≥ M + ε,∫∫f ε (x) = f(y)ψ ε (x − y) dy = 1 BM (0)(y)1 Bε (x)(y)(x − y) dy = 0,so f ε has compact support. There<strong>for</strong>e f ε ∈ Cc∞ (R n ). By Lemma A.4.1, f ε convergesuni<strong>for</strong>mly to f on compacts. Since suppf ε ⊆ B M+ε (0), our approximating sequencehas a common compact support. In particular, the uni<strong>for</strong>m convergence on compactsis actually true uni<strong>for</strong>m convergence. This shows the claims of the lemma.Theorem A.4.5. Let µ be a Radon measure on R n . Then C ∞ c (R n ) is dense in L p (µ)<strong>for</strong> p ≥ 1.Proof. First note that by combining Theorem 2.14, Theorem 2.18 <strong>and</strong> Theorem 3.14of Rudin (1987), we can conclude that C c (R n ) is dense in L p (µ). It will there<strong>for</strong>esuffice to show that we can approximate elements of C c (R n ) in L p (µ) with elementsof Cc ∞ (R n ). Let f ∈ C c (R n ). By Lemma A.4.4, there exists f n ∈ Cc ∞ (R n ) converginguni<strong>for</strong>mly to f n , <strong>and</strong> by the same lemma, we can assume that all the functions f n <strong>and</strong>f have the same compact support K. Then,∫lim |f(x) − f n (x)| p dµ(x) ≤ lim ‖f − f n ‖ ∞ µ(K) = 0.nnA.5 Hermite polynomialsIn this section, we present <strong>and</strong> prove some basic results on Hermite polynomials. Asource <strong>for</strong> results such as these is Szegö (1939), Chapter V, where a good deal ofresults on Hermite polynomials are gathered. Most results are presented without


A.5 Hermite polynomials 257proof, however. Also note that what is called Hermite polynomials in Szegö (1939) isa little bit different than what is called Hermite polynomials here.Definition A.5.1. The n’th Hermite polynomial is defined byWe put H 0 (x) = 1 <strong>and</strong> H −1 (x) = 0.H n (x) = (−1) n e x22d n x2e− 2dxn .It is not completely obvious from Definition A.5.1 that the Hermite polynomials arepolynomials at all. Be<strong>for</strong>e showing that this is actually the case, we will obtain arecursion relation <strong>for</strong> the Hermite polynomials.Lemma A.5.2. It holds thatexp( )tx − t2 =2∞∑n=0H n (x)t n ,n!so that <strong>for</strong> any x, H n (x) is the n’th coefficient of the Taylor expansion of exp(tx − t2 2 )as a function of t. In particular,)∣H n (x) =(tx ∂n∂t n exp − t2 ∣∣∣t=0.2Proof. The mapping t ↦→ exp(− 1 2 (x − t)2 ) is obviously entire when considered as acomplex mapping, <strong>and</strong> there<strong>for</strong>e given by its Taylor expansion. We then obtain( ) (exp tx − t2 x2= exp22 − 1 )2 (x − t)2∞∑= e x2 t n d n2(−n! dt n exp 1 )∣∣∣∣t=02 (x − t)2= e x22==∞∑n=0∞∑n=0n=0∞∑ t n dn(−1)n(−n! dt n exp 1 )∣∣∣∣t=x2 t2 n=0t n(n! e x22 (−1)n dndx n exp − 1 )2 x2H n (x)t n .n!Lemma A.5.3. It holds that <strong>for</strong> n ≥ 0, H ′ n(x) = nH n−1 (x).


258 <strong>Analysis</strong>Proof. We see that∞∑n=0H n(x)′ t n = dn! dx∞∑n=0H n (x)t nn!)= d (txdx exp − t2 2( )= t exp tx − t2 2==∞∑n=0∞∑n=1H n (x)t n+1n!nH n−1 (x)t n .n!Recalling that H −1 (x) = 0, uniqueness of coefficients yields the result.Lemma A.5.4. It holds that <strong>for</strong> n ≥ 0, H n+1 (x) = xH n (x) − nH n−1 (x).Proof. We find, using the product rule <strong>and</strong> Lemma A.5.3,H n+1 (x) = (−1) n+1 e x22as was to be proved.= (−1) n+1 e x22= (−1) n+1 ( ddxd n+1 x2e− 2dxn+1 d d n x2e− 2ndx(dxe x22= −H ′ n(x) + (−1) n xe x22= −nH n−1 (x) + xH n (x),d n x2e− 2dxn d n x2e− 2dxn ) ( ) )d−dx e x2 dnx22 e− 2dxn Lemma A.5.2, Lemma A.5.3 <strong>and</strong> Lemma A.5.4 along with the definition are the basicworkhorses <strong>for</strong> producing results about the Hermite polynomials. In particular, wecan now show that the Hermite polynomials actually are polynomials.Lemma A.5.5. The n’th Hermite polynomial H n is a polynomial of degree n.Proof. We use complete induction. The result is clearly true <strong>for</strong> n = 0. Assume thatit is true <strong>for</strong> all k ≤ n. We know that H n+1 (x) = xH n (x) − nH n−1 (x), by LemmaA.5.4. Here, nH n−1 (x) is a polynomial of degree n − 1 by assumption, <strong>and</strong> H n is a


A.6 Hilbert Spaces 259polynomial of degree n. There<strong>for</strong>e, xH n (x) is a polynomial of degree n + 1, <strong>and</strong> theresult follows.Lemma A.5.6. The span of the n first Hermite polynomials H 0 , . . . , H n−1 is the sameas the span of the n first monomials 1, x, x 2 , . . . , x n−1 .Proof. From Lemma A.5.5, it is clear that the span of the n first monomials includesthe n first Hermite polynomials. To show the reverse inclusion, we use induction.Since H 0 (x) = 1, the result is trivially true <strong>for</strong> n = 1. Assume that it has been proven<strong>for</strong> n. Let H n+1 be the span of the first n + 1 Hermite polynomials. We need to provethat x k ∈ H n+1 <strong>for</strong> k ≤ n. By our induction hypothesis, x k ∈ H n+1 <strong>for</strong> k ≤ n − 1. Toprove that x n ∈ H n+1 , note that by Lemma A.5.5, we know that H n is a polynomialof degree n, that is,n∑H n (x) = a k x k ,where a n ≠ 0. Since x k ∈ H n+1 <strong>for</strong> k ≤ n − 1, we obtain a n x n ∈ H n+1 . Since a n ≠ 0,this implies x n ∈ H n+1 , as desired.k=0A.6 Hilbert SpacesIn this section, we review some results from the theory of Hilbert spaces. Our mainsources are Hansen (2006) <strong>and</strong> Meise & Vogt (1997).Definition A.6.1. A Hilbert Space H is a complete normed linear space over R suchthat the norm of H is induced by an inner product.In the remainer of the section, H will denote a Hilbert space with inner product 〈·, ·〉<strong>and</strong> norm ‖ · ‖. We say that two elements x, y ∈ H are orthogonal <strong>and</strong> write x⊥y if〈x, y〉 = 0. We say that two subsets M <strong>and</strong> N of H are orthogonal <strong>and</strong> write M⊥Nif it holds that <strong>for</strong> each x ∈ M <strong>and</strong> y ∈ N, x⊥y. The elements orthogonal to M aredenoted M ⊥ . We write span M <strong>for</strong> the linear span of M, <strong>and</strong> we write span M <strong>for</strong>the closure of the span of M.Lemma A.6.2. Let M <strong>and</strong> N be subsets of H. If M <strong>and</strong> N are orthogonal, then soare span M <strong>and</strong> span N.


260 <strong>Analysis</strong>Proof. By the linearity of the inner product, span M <strong>and</strong> span N are orthogonal. Bythe continuity of the inner product, span M <strong>and</strong> span N are orthogonal as well.Lemma A.6.3. If U is a subspace of H, then U ⊥ is a closed subspace of H.Proof. See Rudin (1987), p. 79.Lemma A.6.4. Let M is a subspace of H. M is dense in H if <strong>and</strong> only if M ⊥ = {0}.Proof. This follows from Corollary 11.8 of Meise & Vogt (1997).Theorem A.6.5 (Pythagoras’ Theorem). Let x 1 , . . . , x n be orthogonal elementsof H. Then∥ n∑ ∥∥∥∥2 n∑x∥ k = ‖x k ‖ 2 .k=1k=1Proof. See Hansen (2006), Theorem 3.1.14.Theorem A.6.6. Let U be a closed subspace of H. Each element x ∈ H has a uniquedecomposition x = P x + x − P x, where P x ∈ U <strong>and</strong> x − P x ∈ U ⊥ . The mappingP : H → U is a linear contraction <strong>and</strong> is called the orthogonal projection onto U.Proof. The existence of the decomposition <strong>and</strong> the linearity of P is proved in Rudin(1987), Theorem 4.11. That P is a contraction follows by applying Theorem A.6.5 toobtain‖P x‖ 2 ≤ ‖P x‖ 2 + ‖x − P x‖ 2 = ‖P x + x − P x‖ 2 = ‖x‖ 2 .Lemma A.6.7. Let U be a closed subspace of H. The orthogonal projection P ontoU satisfies P 2 = P , R(P ) = U <strong>and</strong> N(P ) = U ⊥ . Also, ‖x − P x‖ = d(x, U), where ddenotes the metric induced by ‖ · ‖.Proof. Let x ∈ H. Then P x ∈ U, <strong>and</strong> there<strong>for</strong>e P 2 x = P x, showing the first claim.Since P maps into U <strong>and</strong> P x = x <strong>for</strong> x ∈ U, it is clear that range satisfies R(P ) = U.To show the claim about the null space, is is clear that N ⊥ ⊆ N(P ). To show theother inclusion, let x ∈ N(P ). Then x = P x + x − P x = x − P x ∈ U ⊥ , as desired.


A.6 Hilbert Spaces 261Finally, the <strong>for</strong>mula ‖x − P x‖ = d(x, U) follows by Lemma 11.5 of Meise & Vogt(1997).If M <strong>and</strong> N are orthogonal subspaces of H, we define the orthogonal sum M ⊕ N byM ⊕ N = {x + y|x ∈ M, y ∈ N}.Lemma A.6.8. If M <strong>and</strong> N are closed orthogonal subspaces of H, then M ⊕ N is aclosed subspace of H as well. Letting P <strong>and</strong> Q be the orthogonal projections onto M<strong>and</strong> N, the orthogonal projection onto M ⊕ N is P + Q.Proof. That M ⊕ N is closed is proved in Lemma 3.5.8 of Hansen (2006). Let anelement x ∈ H be given. We write x = (P + Q)x + x − (P + Q)x, <strong>and</strong> if we can arguethat x − (P + Q)x ∈ (M ⊕ N) ⊥ , we are done, since (P + Q)x obviously is in M ⊕ N.To this end, first note that we have x = P x + (x − P x), where x − P x ∈ M ⊥ bythe definition of the orthogonal projection P . Using the definition of the orthogonalprojection Q, we also have x − P x = Q(x − P x) + (x − P x − Q(x − P x)), wherex − P x − Q(x − P x) ∈ N ⊥ . Since R(Q) = N, Q(x − P x) ∈ M ⊥ . We concludethat x − P x − Q(x − P x) is orthogonal to both N <strong>and</strong> M. In particular, then, it isorthogonal to M ⊕ N.Finally, by Lemma A.6.7, N(Q) = N ⊥ ⊇ M = R(P ). Thus, QP = 0 <strong>and</strong> we obtainx − (P + Q)x = x − P x − Q(x − P x) ∈ (M ⊕ N) ⊥ ,as desired.Lemma A.6.9. Letting M <strong>and</strong> N be orthogonal subspaces of H, it holds that M ⊕ Nis the span of M ∪ N.Proof. Clearly, M ⊕ N ⊆ span M ∪ N. On the other h<strong>and</strong>, if x ∈ span M ∪ N, wehave x = ∑ nk=1 λ kx k where x k ∈ M ∪ N. Splitting the sum into parts in M <strong>and</strong> partsin N, we obtain the reverse inclusion.Lemma A.6.10. Let x n be an orthogonal sequence in H. ∑ ∞n=1 x n is convergent if<strong>and</strong> only if ∑ ∞n=1 ‖x n‖ 2 is finite.


262 <strong>Analysis</strong>Proof. Both directions of the equivalence follows from the relation, with n > m,∥ n∑ m∑ ∥∥∥∥2 n∑x∥ k − x k = ‖x k ‖ 2k=1k=1combined with the completeness of H.k=m+1We are also going to need infinite sums of orthogonal subspaces. If (U n ) is a sequenceof orthogonal subspaces of H, we define the orthogonal sum ⊕ ∞ n=1U n of the subspacesby{∞⊕∞∣ }∑ ∣∣∣∣ ∞∑U n = x n x n ∈ U n <strong>and</strong> ‖x n ‖ 2 < ∞n=1n=1This is well-defined, since we know from Lemma A.6.10 that ∑ ∞n=1 x n is convergentwhenever ∑ ∞n=1 ‖x n‖ 2 is finite.Lemma A.6.11. ⊕ ∞ n=1U n is a closed subspace of H, <strong>and</strong> ⊕ ∞ n=1U n = span ∪ ∞ n=1 U n .n=1Proof. We first prove that ⊕ ∞ n=1U n is closed. From the second assertion of the lemma,it will follow that ⊕ ∞ n=1U n is a subspace.Let x n ∈ ⊕ ∞ n=1U n with x n = ∑ ∞k=1 xk n, <strong>and</strong> assume that x n converges to x. We need toprove x ∈ ⊕ ∞ n=1U n . From Lemma A.6.10, ∑ ∞k=1 ‖xk n‖ 2 is finite <strong>for</strong> any n. In particular,by the triangle inequality <strong>and</strong> the relation (x + y) 2 ≤ 2x 2 + 2y 2 , ∑ ∞k=1 ‖xk n − x k m‖ 2 isfinite <strong>for</strong> any n, m. This implies that ∑ ∞k=1 xk n − x k m is convergent, <strong>and</strong> we obtain2‖x n − x m ‖ 2 ∞∑=x k n − x k ∞∑ ∥m= ∥x k n − x k ∥ 2 m .∥∥k=1Now, since x n is convergent, it is in particular a cauchy sequence. From the aboveequality, we then obtain that x k n is a cauchy sequence <strong>for</strong> every k, convergent to a limitx k . Since U k is closed, x k ∈ U k . We want to prove x = ∑ ∞k=1 xk . To this end, defineα n = (‖x k n‖) k≥1 . α n is then a member of l 2 , <strong>and</strong>‖α n − α m ‖ 2 l = ∑∞ ∞∑(‖x k n‖ − ‖x k m‖) 2 ≤ ‖x k 2n − x k m‖ 2 = ‖x n − x m ‖ 2 ,k=1so α n is cauchy in l 2 , there<strong>for</strong>e convergent to a limit α. In particular, αn k convergesto α k . But αn k = ‖x k n‖, <strong>and</strong> ‖x k n‖ converges to ‖x k ‖. There<strong>for</strong>e, α k = ‖x k ‖. We maynow concludelimn∞ ∑k=1k=1k=1(‖x k n‖ − ‖x k ‖) 2 = limn‖α n − α‖ 2 l 2 = 0,


A.6 Hilbert Spaces 263in particular lim n∑ ∞k=1 ‖xk n‖ 2 = ∑ ∞k=1 ‖xk ‖ 2 . We then obtain by orthogonality,∥ n ∥ x − ∑ ∥∥∥∥2x kk=1∥ ∥∥∥∥ ∑∞= lim x k m −mk=1∥n∑ ∥∥∥∥2x kk=1∑n= lim ‖x k m − x k ‖ 2 +mk=1= limm∞∑=∞∑k=n+1k=n+1‖x k ‖ 2 ,‖x k m‖ 2∞∑k=n+1‖x k m‖ 2<strong>and</strong> since ∑ ∞k=1 ‖xk ‖ 2 is finite, it follows that x = ∑ ∞k=1 xk . Then x ∈ ⊕ ∞ n=1U n , <strong>and</strong>we may finally conclude that ⊕ ∞ n=1U n is closed.Next, we need to prove the equality ⊕ ∞ n=1U n = span ∪ ∞ n=1 U n . As in the proof ofLemma A.6.9, it is clear that ⊕ ∞ n=1U n ⊆ span ∪ ∞ n=1 U n . To prove the other inclusion,note that we clearly have span ∪ ∞ n=1 U n ⊆ ⊕ ∞ n=1U n . Since we have seen that ⊕ ∞ n=1U nis closed, it follows that span ∪ ∞ n=1 U n ⊆ ⊕ ∞ n=1U n .Lemma A.6.12. Let (U n ) be a sequence of closed orthogonal subspaces <strong>and</strong> let P n bethe orthogonal projection onto U n . For any x ∈ ⊕ ∞ n=1U n , it holds that x = ∑ ∞n=1 P nx.Proof. Since x ∈ ⊕ ∞ n=1U n , we know that x = ∑ ∞n=1 x n <strong>for</strong> some x n ∈ U n . Since P k iscontinuous <strong>and</strong> the subspaces are orthogonal, we findas desired.P k x =∞∑P k x n = x n ,n=1Lemma A.6.13. For any subset A, it holds that A ∩ A ⊥ = {0}.Proof. Let x ∈ A ∩ A ⊥ . We find ‖x‖ 2 = 〈x, x〉 = 0, showing x = 0.Lemma A.6.14. Let M <strong>and</strong> N be orthogonal closed subspaces. Assume x ∈ M ⊕ N.If x ∈ M ⊥ , then x ∈ N.


264 <strong>Analysis</strong>Proof. Since N <strong>and</strong> M are orthogonal, N ⊆ M ⊥ . Let x = y + z with y ∈ M <strong>and</strong>z ∈ N. Then y = x − z ∈ M ⊥ . Since also y ∈ M, Lemma A.6.13 yields y = 0 <strong>and</strong>there<strong>for</strong>e x = z ∈ N, as desired.Lemma A.6.15. Let U, M <strong>and</strong> N be closed subspaces. Assume that U <strong>and</strong> M areorthogonal, <strong>and</strong> assume that U <strong>and</strong> N are orthogonal. If U ⊕ M = U ⊕ N, thenM = N.Proof. By symmetry, it will suffice to show M ⊆ N. Assume that x ∈ M. Thenx ∈ U ⊕ M, <strong>and</strong> there<strong>for</strong>e x ∈ U ⊕ N. Now, since M ⊆ U ⊥ , we have x ∈ U ⊥ . ByLemma A.6.14, x ∈ N.Lemma A.6.16. Let (U n ) be a sequence of closed orthogonal subspaces. It then holds<strong>for</strong> any n ≥ 1 that(∞⊕n⊕) ( ∞)⊕U k = U k ⊕ U k .k=1k=1k=n+1Proof. It is clear that (⊕ n k=1 U k)⊕(⊕ ∞ k=n+1 U k) ⊆ ⊕ ∞ k=1 U k. To show the other inclusion,let x ∈ ⊕ ∞ k=1 U k. We then have x = ∑ ∞k=1 x k <strong>for</strong> some x k ∈ U k , where ∑ ∞k=1 ‖x k‖ 2 .There<strong>for</strong>e, in particular ∑ ∞k=n+1 ‖x k‖ 2 . Thus, by the decompositionwe obtain the desired result.( n) (∑ ∑ ∞x = x k +k=1k=n+1x k),Lemma A.6.17. Let x, x ′ ∈ H <strong>and</strong> assume that 〈y, x〉 = 〈y, x ′ 〉 <strong>for</strong> all y ∈ H. Thenx = x ′ .Proof. It follows that x − x ′ is orthogonal to H. From Lemma A.6.13, x − x ′ = 0 <strong>and</strong>there<strong>for</strong>e x = x ′ .Lemma A.6.18. Let H be a Hilbert space <strong>and</strong> let U be a closed subspace. Let Pdenote the orthogonal projection onto U. Let x ∈ H <strong>and</strong> let x ′ ∈ U. If it holds <strong>for</strong> ally ∈ U that 〈y, x〉 = 〈y, x ′ 〉, then x ′ = P x.


A.6 Hilbert Spaces 265Proof. Let Q be the orthogonal projection onto U ⊥ . We then have, <strong>for</strong> any y ∈ U,〈y, P x〉 = 〈y, Qx + P x〉= 〈y, x〉= 〈y, x ′ 〉.Since P x <strong>and</strong> x ′ are both elements of U, <strong>and</strong> U is a Hilbert space since it is a closedsubspace of H, we conclude by Lemma A.6.17 that P x = x ′ .We end with a few results on weak convegence. We say that a sequence x n is weaklyconvergent to x if 〈y, x n 〉 tends to 〈y, x〉 <strong>for</strong> all y ∈ H.Lemma A.6.19. If x n has a weak limit, it is unique.Proof. See Lemma 3.6.3 of Hansen (2006).Lemma A.6.20. If x n converges to x, it also converges weakly to x.Proof. This follows immediately by the continuity properties of the inner product.Theorem A.6.21. Let H be a Hilbert space <strong>and</strong> let (x n ) be a bounded sequence in H.Then (x n ) has a weakly convergent subsequence.Proof. See Schechter (2002), Theorem 8.16, where the theorem is proved <strong>for</strong> reflexiveBanach spaces. Since every Hilbert space is reflexive according to Corollary 11.10 ofMeise & Vogt (1997), this proves the claim.Lemma A.6.22. Assume that (x n ) converges weakly to x. Then ‖x‖ ≤ lim inf ‖x n ‖.Proof. See Theorem 3.4.11 of Ash (1972).Lemma A.6.23. Let x n be a sequence in H converging weakly to x. Let A : H → Hbe linear <strong>and</strong> continuous. Then Ax n converges weakly as well.Proof. This follows from Theorem 3.4.11 of Ash (1972).


266 <strong>Analysis</strong>A.7 Closable operatorsIn this section, we consider the concept of closable <strong>and</strong> closed operators. This will beused when extending the Malliavin derivative. Let X <strong>and</strong> Y be Banach spaces, thatis, complete normed linear spaces. Consider a subspace D(A) of X <strong>and</strong> an operatorA : D(A) → Y. We say that A is closable if it holds that whenever (x n ) is a sequencein D(A) converging to zero <strong>and</strong> Ax n is convergent, then the limit of Ax n is zero. Wesay that A is closed if it holds that whenever (x n ) is a sequence in D(A) such thatboth x n <strong>and</strong> Ax n are convergent, then lim x n ∈ D(A) <strong>and</strong> A(lim x n ) = lim Ax n .Lemma A.7.1. Let A be some operator with domain D(A). A is closed if <strong>and</strong> onlyif the graph of A is closed.Proof. This follows since the graph of A is closed precisely if it holds <strong>for</strong> any convergentsequence (x n , Ax n ) where x n ∈ D(A) that lim x n ∈ D(A) <strong>and</strong> A(lim x n ) = lim Ax n .Lemma A.7.2. Let A be a closable operator. We define the domain D(A) of theclosure of A as the set of x ∈ X such that there exists x n in D(A) converging to x <strong>and</strong>such that Ax n is convergent. The limit of Ax n is independent of the sequence x n <strong>and</strong>D(A) is a linear space.Proof. We first prove the result on the uniqueness of the limit. Assume that x ∈ D(A)<strong>and</strong> consider two different sequences x n <strong>and</strong> y n in D(A) converging to x such thatAx n <strong>and</strong> Ay n are both convergent. We need to prove that the limits are the same.But x n − y n converges to zero <strong>and</strong> A(x n − y n ) is convergent, so since A is closable, itfollows that A(x n − y n ) converges to zero <strong>and</strong> thus lim n Ax n = lim n Ay n , as desired.Next, we prove that D(A) is a linear space. Let x, y ∈ D(A) <strong>and</strong> let λ, µ ∈ R. Let x n<strong>and</strong> y n be sequences in D(A) converging to x <strong>and</strong> y, respectively. Then λx n + µy n isa sequence in D(A) converging to λx + µy, <strong>and</strong> let A(λx n + µy n ) is convergent sinceA is linear. Thus, λx + µy ∈ D(A) <strong>and</strong> D(A) is a linear space.Theorem A.7.3. Assume that A is closable. There is a unique extension A of A asa linear operator from D(A) to D(A), <strong>and</strong> the extension is closed.Proof. Let x ∈ D(A). From Lemma A.7.2, we know that there is x n in D(A) convergingto x <strong>and</strong> such that Ax n is convergent, <strong>and</strong> the limit of Ax n is independent of x n .


A.7 Closable operators 267We can then define Ax as the limit of Ax n . We need to prove that this extension islinear <strong>and</strong> closed.To prove linearity, let x, y ∈ D(A) <strong>and</strong> let λ, µ ∈ R. Let x n <strong>and</strong> y n be sequences inD(A) converging to x <strong>and</strong> y, respectively, such that Ax n <strong>and</strong> Ay n converges to Ax<strong>and</strong> Ay, respectively. Then λx n + µy n converges to λx + µy <strong>and</strong> A(λx n + µy n ) isconvergent. By definition, the limit is A(λx + µy), <strong>and</strong> we concludeA(λx + µy) = limnA(λx n + µy n ) = limnλAx n + µAy n = λAx + µAy,as desired. To show that the extension is closed, we show that the graph of A is theclosure of the graph of A. Assume that (x, y) is a limit point of the graph of A, thenthere exists x n in D(A) such that x = lim x n <strong>and</strong> y = Ax n . By definition, we obtainx ∈ D(A) <strong>and</strong> Ax = y. There<strong>for</strong>e, (x, y) is in the graph of A. On the other h<strong>and</strong>, if(x, y) is in the graph of A, in particular x ∈ D(A) <strong>and</strong> Ax = y. Then, there existsx n in D(A) converging to x such that Ax n is convergent as well, <strong>and</strong> the limit is Ax.Thus, (x n , Ax n ) converges to (x, Ax), so (x, y) is a limit point of the graph of A. Weconclude that the graph of A is the closure of the graph of A, <strong>and</strong> by Lemma A.7.1,A is closed.


268 <strong>Analysis</strong>


Appendix BMeasure theoryThis appendix contains some results from abstract measure theory <strong>and</strong> probabilitytheory which we will need. Our main sources are Hansen (2004a), Berg & Madsen(2001), Jacobsen (2003) <strong>and</strong> Rogers & Williams (2000a).B.1 The Monotone Class TheoremIn this section, we state the monotone class theorem <strong>and</strong> develop an important corollary,used when analysing the Brownian motion under the usual conditions.Definition B.1.1. A monotone vector space H on Ω is a subspace of the vector spaceB(Ω) of bounded real functions on Ω, such that H contains all constant functions <strong>and</strong>such that if (f n ) is a sequence of nonnegative functions increasing pointwise towardsa bounded function f, then f ∈ H.Theorem B.1.2 (Monotone Class Theorem). Let K ⊆ B(Ω) be stable undermultiplication, <strong>and</strong> let H ⊆ B(Ω) be a monotone vector space. If K ⊆ H, H containsall σ(K)-measurable bounded functions.Proof. The theorem is stated <strong>and</strong> proved as Theorem I.21 in Dellacherie & Meyer(1975), under the assumption that H is closed under uni<strong>for</strong>m convergence. Since


270 Measure theorySharpe (1988) proves that any monotone vector space is closed under uni<strong>for</strong>m convergence,the claim follows.By C([0, ∞), R n ), we denote the continuous mappings from [0, ∞) to R n . We endowthis space with the σ-algebra C([0, ∞), R n ) induced by the coordinate projections X ◦ t ,X ◦ t (f) = f(t) <strong>for</strong> any f ∈ C([0, ∞), R n ).Lemma B.1.3. Let F be closed in R n . There exists (f n ) ⊆ Cc∞converges pointwise to 1 F .(R n ) such that f nProof. By Cohn (1980), Proposition 7.1.4, there exists a sequence G n of open setssuch that F ⊆ G n <strong>and</strong> F = ∩ ∞ n=1G n . Let K n be an increasing sequence of compactsets with ∪ ∞ n=1K n . Then F ∩ K n is compact, so by Theorem A.3.4, there is f n withF ∩ K n ≺ f n ≺ G n . We claim that this sequence of functions fulfills the conditionsin the lemma. Let x ∈ F . From a certain point onwards, x ∈ F ∩ K n , <strong>and</strong> thenf n (x) = 1. Assume next x /∈ F . Then, x /∈ G n from a certain point onwards, <strong>and</strong> thenf n (x) = 0. We conclude lim f n = 1 F .Corollary B.1.4. Let H ⊆ B(C([0, ∞, R n )) be a monotone vector space. If H containsall functions of the <strong>for</strong>m ∏ nk=1 f k(X ◦ t k) with f k ∈ C c (R n ), then H contains allC([0, ∞), R n ) measurable functions.Proof. Since the compact sets are closed under finite intersections, it is clear thatC c (R n ) is an algebra. There<strong>for</strong>e, putting{ n}∏K = f k (Xt ◦ k)∣ 0 ≤ t 1 ≤ · · · ≤ t n <strong>and</strong> f 1 , . . . , f n ∈ C c (R) ,k=1it is clear that K is an algebra. By Theorem B.1.2, H contains all σ(K) meausrablemappings. We wish to argue that K = C([0, ∞), R n ). It is clear that K ⊆ C([0, ∞), R n ),so it will suffice to show the other inclusion. To do so, we only need to show that allcoordinate projections are K meausrable. Let t ≥ 0 <strong>and</strong> let F be closed in R n . FromLemma B.1.3, there exists a sequence (f n ) ⊆ Cc ∞ (R n ) converging pointwise to 1 F .This means that 1 F (Xt ◦ ) is the pointwise limit of f n (Xt ◦ ), so 1 F (Xt ◦ ) is K-measurable.This shows that (Xt ◦ ∈ F ) ∈ K. There<strong>for</strong>e Xt◦ is K measurable, as desired.


B.2 L p <strong>and</strong> almost sure convergence 271B.2 L p <strong>and</strong> almost sure convergenceThis section contains some of the classical results on L p <strong>and</strong> almost sure convergence.Most of these results are surely well-known, we repeat them here <strong>for</strong> completeness.Lemma B.2.1. Let (E, E, µ) be a measure space. Assume that f n converges to f <strong>and</strong>g n converges to g in L 2 (E). Then f n g n converges to fg in L 1 (E).Proof. By the Cauchy-Schwartz inequality,‖f n g n − fg‖ 1 ≤ ‖f n g n − f n g‖ 1 + ‖f n g − fg‖ 1so since ‖f n ‖ 2 is bounded, the conclusion follows.≤ ‖f n ‖ 2 ‖‖g n − g‖ 2 + ‖f n − f‖ 2 ‖g‖ 2 ,Lemma B.2.2. Let (E, E, µ) be a measure space. Assume that f n converges in L 1 tof, <strong>and</strong> assume that g n converges almost surely to g. If the sequence g n is dominatedby a constant, then f n g n converges in L 1 to fg.Proof. We have‖f n g n − fg‖ 1 ≤ ‖f n g n − fg n ‖ 1 + ‖fg n − fg‖≤ M‖f n − f‖ 1 + ‖fg n − fg‖ 1 .Here, the first term trivially tends to zero. By dominated convergence with bound2Mf, the second term also tends to zero.Lemma B.2.3 (Scheffé). Let (E, E, µ) be a measure space <strong>and</strong> let f n be a sequence ofprobability densities with respect to µ. Let f be another probability density with respectto µ. If f n converges µ-almost surely to f, then f n converges to f in L 1 .Proof. This is Lemma 22.4 of Hansen (2004a).Lemma B.2.4. Let (E, E, µ) be a finite measure space. If 1 ≤ p ≤ r, it holds thatL r (E) ⊆ L p (E) <strong>and</strong> L r convergence implies L p convergence.Proof. See Theorem 7.11 of Berg & Madsen (2001).


272 Measure theoryLemma B.2.5. Let (E, E, µ) be a measure space. If f n converges to f in L p , there isa subsequence f nk converging µ almost surely to f, dominated by a mapping in L p .Proof. This is proved as Corollary 7.20 in Berg & Madsen (2001).B.3 Convergence of normal distributionsIn this section, we prove a result on convergence of normal distributions which will beessential to our development of the Malliavin calculus. We begin with a few lemmas.Lemma B.3.1 (Hölder’s inequality). Let (E, E, µ) be a measure space <strong>and</strong> letf 1 , . . . , f n be real measurable mappings <strong>for</strong> some n ≥ 2. Let p 1 , . . . , p n ∈ (1, ∞) with∑ nk=1 1p k= 1. It then holds that ‖ ∏ nk=1 f k‖ 1 ≤ ∏ nk=1 ‖f k‖ pn . We say that p 1 , . . . , p nare conjugate exponents.Proof. We use induction. The case n = 2 is just the ordinary Hölder’s inequality,proving the induction start. Assume that the results holds in the case of n, we willprove it <strong>for</strong> n + 1. Let q = p n+1p , then q is the conjugate exponent to p n+1−1 n+1, so theordinary Hölder’s inequality yields ‖ ∏ n+1k=1 f k‖ ≤ ‖ ∏ nk=1 f k‖ q‖f n+1 ‖ pn+1 . Next, notethat ∑ nk=1 1p k= 1 − 1p n+1= pn+1−1p n+1= 1 qthe conjugate exponents p 1q, . . . , p nq, we obtain∥ n∏ ∥∥∥∥ f∥ kk=1 q=≤==, so applying the induction hypothesis with(∥ ∥ ∥∥∥∥ ∏n ∥∥∥∥|f k | qk=1 1(∏ nk=1(∏ n (∫k=1) 1q‖|f k | q ‖ p 1q) 1qn∏‖f k ‖ pk ,k=1) q ) 1qp 1|f k | p1 dµwhich, combined with our earlier finding, yields the desired result.


B.3 Convergence of normal distributions 273Lemma B.3.2. Let X be a normal distribution with mean µ <strong>and</strong> variance σ 2 . Letn ∈ N 0 . Then we haveEX n =[n/2]∑k=0(2k)!2 k k! σ2k µ n−2k .Proof. Since the odd central moments of X are zero, we obtainas desired.EX n = E(X − µ + µ) nn∑( k= E(X − µ)n)k µ n−k==k=0[n/2]∑k=0[n/2]∑k=0( 2kn)E(X − µ) 2k µ n−2k(2k)!2 k k! σ2k µ n−2k ,Lemma B.3.3. Let X k be a sequence of n-dimensional normally distributed variables.Assume that <strong>for</strong> any i ≤ n, X i k converges weakly to X. Let f Rn to R have polynomialgrowth. Then f(X k ) is bounded over k in L p <strong>for</strong> any p ≥ 1.Proof. If f has polynomial growth, then so does |f| p <strong>for</strong> any p ≥ 1. There<strong>for</strong>e, itwill suffice to show that f(X k ) is bounded in L 1 <strong>for</strong> any f with polynomial growth.Let p be a polynomial in n variables dominating f. Assume that p has degree m<strong>and</strong> p(x) = ∑ |a|≤m λ ∏ na i=1 xai i , using usual multi-index notation. Obviously, we thenobtain|f(X k )| ≤ |p(X k )|≤∑ n∏|λ a | |(Xk) i a i|≤|a|≤m∑|a|≤m|λ a |i=1n∏(1 + (Xk) i 2 ) ai ).Letting q(x) = ∑ |a|≤m |λ a| ∏ ni=1 (1 + x2 i )ai , q is a polynomial of degree 2m in n variables,<strong>and</strong> we have E|f(X k )| ≤ E|q(X k )|. Furthermore, the exponents in q are alleven. Thus, it will suffice to show the result in the case f is a polynomial with evenexponents.i=1


274 Measure theoryThere<strong>for</strong>e, assume that f is a polynomial in n variables of degree m. Assume thatwe have f(x) = ∑ a∈I λ ∏ n n am i=1 xa ii , where λ a is zero whenever a has an odd element.The generalized Hölder’s inequality of Lemma B.3.1 then yieldsE|f(X k )| ≤ ∑ ( n)∏|λ a |E (Xk) i a i≤|a|≤m∑|a|≤m|λ a |i=1n∏ (E(Xik ) na ) 1 i n.Let µ i k denote the mean of Xi k <strong>and</strong> σi k denote the st<strong>and</strong>ard deviation of Xi k . SinceXk i is normally distributed <strong>and</strong> converges weakly, µi k <strong>and</strong> σi kare convergent, there<strong>for</strong>ebounded. Now, by Lemma B.3.2, E(Xk i )n is a continuous function of µ i k <strong>and</strong> σi k <strong>for</strong>fixed i <strong>and</strong> n. There<strong>for</strong>e, k ↦→ E(Xk i )n is bounded <strong>for</strong> any i <strong>and</strong> any n. This showsthat E|f(X k )| is bounded over k, as was to be proved.Theorem B.3.4. Let X k be a sequence of n-dimensional variables converging in probabilityto X. Assume that (X k , X) is normally distributed <strong>for</strong> any k. Let f ∈ C 1 p(R n ).Then f(X k ) converges to f(X) in L p , p ≥ 1.i=1Proof. Let f ∈ Cp(R 1 n ) be given. Let p i be a polynomial in n variables dominating∂f∂x i. The mean value theorem yields, <strong>for</strong> some ξ k on the line segment between X k <strong>and</strong>X,n∑|f(X k ) − f(X)| =∂f∣ (ξ k )(Xk i − X i )∂x i∣≤i=1n∑|p i (ξ k )||Xk i − X i |.i=1Now, defining Yk i = 2 + (Xi k )2 + (X i ) 2 , we have |ξk i | ≤ |Xi k | + |Xi | ≤ Yk i <strong>and</strong> there<strong>for</strong>e|p i (ξ k )| ≤ q i (Y k ) <strong>for</strong> some other polynomial q i . Using Jensen’s inequality <strong>and</strong> theCauchy-Schwartz inequality, we obtain( n) p∑E|f(X k ) − f(X)| p ≤ E q i (Y k )|Xk i − X i |k=1∑n≤ En p (qi (Y k )|Xk i − X i | ) pi=1∑n= n p Eq i (Y k ) p |Xk i − X i | pi=1∑n≤ n p ‖q i (Y k ) p ‖ 2 ‖|Xk i − X i | p ‖ 2 .i=1


B.4 Separability <strong>and</strong> bases <strong>for</strong> L 2 -spaces 275Now, q i (Y k ) is a polynomial trans<strong>for</strong>mation of the 2n-dimensional normally distributedvariable (X k , X). Since X k converges in probability to X k , we also have weak convergence.Thus, (X k , X) converges coordinatewisely weakly to (X, X). By Lemma B.3.3,we may then conclude that ‖q i (Y k ) p ‖ 2 is bounded over k <strong>for</strong> any i ≤ n. Let C be abound, we can then conclude∑nE|f(X k ) − f(X)| p ≤ Cn p ‖|Xk i − X i | p ‖ 2i=1∑n= Cn p (E|Xik − X i | 2p) 1 2.To prove the desired result, it will there<strong>for</strong>e suffice to prove that Xk i tends to Xi inL p <strong>for</strong> any p ≥ 1. It will be enough to prove that we have L 2n convergence <strong>for</strong> n ∈ N.Let n ∈ N be given. Since Xk i converges in probability to Xi , Xk i − Xi converges inprobability to zero, so Xk i − Xi converges in distribution to the point measure at zero.Because we have asumed that (X k , X) is normally distribued, Xk i − Xi is normallydistributed. Let ξk i be the mean <strong>and</strong> let νi kbe the st<strong>and</strong>ard deviation. We thenconcluce that ξk i <strong>and</strong> νi kboth tend to zero as k tends to infinity. By Lemma B.3.2, wethen conclude that E(Xk i − Xi ) 2n tends to zero, showing the desired convergence inL 2n <strong>and</strong> implying the result of the lemma.i=1B.4 Separability <strong>and</strong> bases <strong>for</strong> L 2 -spacesThis section yields some results on L 2 spaces which will be necessary <strong>for</strong> the developmentof the Hilbert space results of the Malliavin calculus.Lemma B.4.1. If (E, E, µ) is a finite measure space which is countably generated,then L 2 (E) is separable as a pseudometric space.Proof. Let D be a countable generating family <strong>for</strong> E. We can assume without lossof generality that E ∈ D. Let H denote the family of sets ∩ n k=1 A k where A k ∈ D<strong>for</strong> k ≤ n. Then H is countable, D ⊆ H, <strong>and</strong> H is stable under intersections. PutS = {1 A |A ∈ E} <strong>and</strong> S H = {1 A |A ∈ H}.Step 1: span S H is dense in L 2 (E, E, µ). We will begin by showing that span S H isdense in L 2 (E). After doing so, we will identify a dense subset of span S H . To showthat span S H is dense, first note that from Theorem 7.27 of Berg & Madsen (2001),


276 Measure theoryspan S is dense in L 2 (E). There<strong>for</strong>e, to show that span S H is dense in L 2 (E), it willsuffice to show that span S H is dense in span S. And to do so, it will suffice to showthat S ⊆ span S H .Define K = {A ∈ E|A ∈ span S H }. To show that S ⊆ span S H , we must show thatK = E. Clearly, H ⊆ K. Since H is a generating family <strong>for</strong> E stable under intersections,it will suffice to show that K is a Dynkin class.Clearly, E ∈ K. Assume that A, B ∈ K with B ⊆ A. Then 1 A\B = 1 A − 1 B . Sinceeach of these is in span S H , so is 1 A − 1 B <strong>and</strong> there<strong>for</strong>e A \ B ∈ K. Finally, assumethat A n is an increasing sequence in K <strong>and</strong> put A = ∪ ∞ n=1A n . Since µ is finite, 1 Antends to 1 A in L 2 (E) by dominated convergence. Since 1 An ∈ span S H <strong>for</strong> all n <strong>and</strong>span S H is closed, 1 A ∈ span S H <strong>and</strong> there<strong>for</strong>e A ∈ K. Thus, K is a Dynkin class. Weconclude that K = E <strong>and</strong> there<strong>for</strong>e S ⊆ span S H , from which we deduce that span S His dense in L 2 (E).Step 2: Identification of a dense countable set. In order to show separabilityof L 2 (E), it will suffice to show that there exists a dense countable subset of span S H .Define{ n∣ }∑ ∣∣∣∣V = λ k 1 Ak λ k ∈ Q, A k ∈ H, k ≤ n .k=1Since H is countable, V is countable. We will show that V is dense in span S H . Letf ∈ span S H be given, f = ∑ nk=1 λ k1 Ak . Let λ j k tend to λ k through Q <strong>and</strong> definef j = ∑ nk=1 λj k 1 A k. Then f j ∈ V <strong>and</strong>n∑‖f − f j ‖ 2 ≤ |λ k − λ j k |‖1 A k‖ 2 ,showing the desired result.k=1Lemma B.4.2. Let (E, E, µ) <strong>and</strong> (K, K, ν) be two finite measure spaces which arecountably generated. Let ([f i ]) <strong>and</strong> ([g j ]) be countable orthonormal bases <strong>for</strong> L 2 (E) <strong>and</strong>L 2 (K), respectively. Then the family ([f i ⊗ g j ]) is an orthonormal basis <strong>for</strong> L 2 (E ⊗ K).Proof. We first show that the family is orthonormal. By the Tonelli theorem, we find‖f i ⊗ g j ‖ 2 = ‖f i ‖ 2 ‖g j ‖ 2 = 1. To show that the family is orthogonal, consider f i ⊗ g j<strong>and</strong> f k ⊗ g n . We then obtain by the Fubini theorem,∫〈f i ⊗ g j , f k ⊗ g n 〉 = f i (s)g j (t)f k (s)g n (t) d(µ ⊗ ν)(s, t)= 〈f i , f k 〉〈g j , g n 〉,


B.5 Uni<strong>for</strong>m integrability 277showing that if i ≠ k or j ≠ n, f i ⊗ g j is orthogonal to f k ⊗ g n , as desired. It remainsto show that the family is complete. To this end, consider ψ ∈ L 2 (E ⊗ K) <strong>and</strong> assumethat ψ is orthogonal to f i ⊗ g j <strong>for</strong> any i <strong>and</strong> j. We need to show that ψ is almostsurely zero. To this end, we first calculate∫〈ψ, f i ⊗ g j 〉 = ψ(s, t)f i (s)g j (t) d(µ ⊗ ν)(s, t)=∫ ∫ψ(s, t)f i (s) dµ(s)g j (t) dν(t)=∫〈ψ(·, t), f i 〉g j (t) dν(t).Put φ i (t) = 〈ψ(·, t), f i 〉. By the Cauchy-Schwartz inequality <strong>and</strong> Fubini’s theorem, ψ iis in L 2 (K), <strong>and</strong> the above shows that 〈φ i , g j 〉 = 〈ψ, f i ⊗ g j 〉 = 0, so since ([g j ]) isan orthonormal basis of L 2 (K), we conclude that φ i is ν almost surely zero. Since acountable union of null sets again is a null set, we find that ν almost surely, φ i (t) = 0<strong>for</strong> all i. For any such t, ψ(·, t) is orthogonal to all f i , showing that ψ(·, t) is zero µalmost surely. All in all, we can then conclude that ψ is zero µ ⊗ ν almost surely, asdesired.B.5 Uni<strong>for</strong>m integrabilityLet (X i ) i∈I be a family of stochastic variables. We say that X i is uni<strong>for</strong>mly integrableiflim E|X i |1 (|Xi |>x) = 0.supx→∞ i∈IWe will review some basic results about uni<strong>for</strong>m integrability. We refer the resultsmainly <strong>for</strong> discrete sequences of variables, but the result extend to sequences indexedby [0, ∞) as well.Lemma B.5.1. Let (X i ) i∈I be a family of stochastic variables. If (X i ) is bounded inL p <strong>for</strong> some p > 1, then (X i ) is uni<strong>for</strong>mly integrable.Proof. See Lemma 20.5 of Rogers & Williams (2000a).Lemma B.5.2. Let (X i ) i∈I be uni<strong>for</strong>mly integrable. Then (X i ) is bounded in L 1 .Proof. This follows from Lemma 20.7 of Rogers & Williams (2000a).


278 Measure theoryLemma B.5.3. Let X n be a sequence of stochastic variables, <strong>and</strong> let X be anothervariable. X n converges in L 1 to X if <strong>and</strong> only if X n is uni<strong>for</strong>mly integrable <strong>and</strong>converges in probability to X.Proof. This is Theorem 21.2 of Rogers & Williams (2000a).Lemma B.5.4. Let X n be a sequence of nonnegative integrable variables, <strong>and</strong> let Xbe another integrable variable. Assume that X n converges almost surely to X. (X n ) isuni<strong>for</strong>mly integrable if <strong>and</strong> only if EX n converges to EX.Proof. First assume that (X n ) is uni<strong>for</strong>mly integrable. Since X n also converges inprobability to X, it follows from Lemma B.5.3 that X n converges in L 1 to X, <strong>and</strong>there<strong>for</strong>e the means converge as well.Conversely, assume that EX n converges to EX. We have|X n − X| = (X n − X) + + (X n − X) −= X n − X + (X n − X) − + (X n − X) −= X n − X + 2(X n − X) − .Now, since X n converges to X almost surely, X n − X converges to 0 almost surely.Because x ↦→ x − is continuous, (X n − X) − converges to 0 almost surely. Clearly,X must be nonnegative. Thus, both X n <strong>and</strong> X are nonnegative. Since x ↦→ x − isdecreasing, this yields (X n −X) − ≤ (−X) − = X. By dominated convergence, we thenobtainlimnE|X n − X| = limnEX n − EX + 2 limnE(X n − X) − = 2E limn(X n − X) − = 0.B.6 MiscellaneousThis final section of the appendix contains a variety of results which do not really fitanywhere else. We begin with some results on density of a certain class of stochasticvariables in L 2 spaces.


B.6 Miscellaneous 279Lemma B.6.1. Let (E, E, µ) be a measure space <strong>and</strong> let f 1 , . . . , f n be real mappingson E. If f 1 , . . . , f n has exponential moments of all orders, then so does ∑ nk=1 λ kf k ,where λ ∈ R n .Proof. We proceed by induction. For n = 1, there is nothing to prove. Assume thatwe have proved the claim <strong>for</strong> n. We find <strong>for</strong> λ ∈ R by the Cauchy-Schwartz inequalitythat∫ ( )n+1∑exp λ λ k f k dµ=≤∫k=1( n)∑exp λλ k f k exp (λλ n+1 f n+1 ) dµk=1( ∫exp(2λ)n∑λ k f k dµk=1) 12 (∫which is finite by the induction hypothesis, proving the claim.) 12exp (2λλ n+1 f n+1 ) dµ ,Theorem B.6.2. Let (E, E, µ) be a measure space, where µ is a finite measure. Assumethat E is generated by a family of real mappings K, where all mappings in Khave exponential moments of all orders. Then the variables( n)∑exp λ k f k ,k=1where f 1 , . . . , f n ∈ K <strong>and</strong> λ ∈ R n , are dense in L 2 (E).Proof. Let H denote the family of variables of the <strong>for</strong>m exp ( ∑ nk=1 λ kf k ). Note thatby Lemma B.6.1, all elements of H have moments of all orders, so H is a subset ofL 2 (E) <strong>and</strong> the conclusion of the theorem is well-defined.Now, let g ∈ L 2 (E) be given, orthogonal to H. We wish to show that g is µ almostsurely equal to zero. Consider f 1 , . . . , f n ∈ K. Since g ∈ L 2 (E) <strong>and</strong> µ is finite, g ∈L 1 (E), so g · µ is a well-defined signed measure. Define ν = (f 1 , . . . , f n )(g · µ), then ν isalso a signed measure, <strong>and</strong> <strong>for</strong> any A ∈ B k , we then have ν(A) = ∫ 1 A (f 1 , . . . , f n )g dµ.The Laplace trans<strong>for</strong>m of ν is∫ ( n)∑∫ ( n)∑exp λ k x k dν(x) = exp λ k f k d(g · µ)k=1=∫k=1( n)∑exp λ k f k g dµ,k=1


280 Measure theory<strong>and</strong> the last expression is equal to zero by assumption. Thus, ν has Laplace trans<strong>for</strong>midentically equal to zero, so by Lemma B.6.6, ν is the zero measure. That is,∫1A (f 1 , . . . , f n )g dµ <strong>for</strong> any A ∈ B k . Since the sets ((f 1 , . . . , f n ) ∈ A) is stable underintersections <strong>and</strong> generates E, the Dynkin Lemma yields that ∫ 1 G g dµ <strong>for</strong> any G ∈ E.This shows that g is µ almost surely zero.Next, we review some basic results on measurability on product spaces.Lemma B.6.3. Let (E, E) <strong>and</strong> (K, K) be measurable spaces. Let E <strong>and</strong> K be generatingsystems <strong>for</strong> E <strong>and</strong> K, respectively, closed under intersections. If E contains E <strong>and</strong> Kcontains K, then E ⊗ K is generated by the sets of the <strong>for</strong>m A × B, where A ∈ E <strong>and</strong>B ∈ K.Proof. Let H be the σ-algebra generated by the sets of the <strong>for</strong>m A × B, where A ∈ E<strong>and</strong> B ∈ K. We need to prove E ⊗ K = H. Clearly, H ⊆ E ⊗ K, we need to prove theother inclusion. To do so, note that letting D be the family of sets A × B where A ∈ E<strong>and</strong> B ∈ K, E ⊗ K is generated by D. It will there<strong>for</strong>e suffice to prove D ⊆ H.To this end, let F be the family of sets B ∈ K such that E × B ∈ H. Then F is stableunder complements <strong>and</strong> increasing unions, <strong>and</strong> since E ∈ E, K ⊆ F. In particular,K ∈ F. We conclude that F is a Dynkin class containing K, there<strong>for</strong>e K ⊆ F. Thisshows that E × B ∈ H <strong>for</strong> any B ∈ K. Analogously, we can prove that A × K ∈ H <strong>for</strong>any A ∈ E. Letting A ∈ E <strong>and</strong> B ∈ K, we then obtain A×B = (A×K)∩(E ×B) ∈ H,as desired. We conclude D ⊆ H <strong>and</strong> there<strong>for</strong>e H = E ⊗ K.Lemma B.6.4. Let (E, E) <strong>and</strong> (K, K) be measurable spaces. Let E <strong>and</strong> K be classesof real functions on E <strong>and</strong> K, respectively. Assume that there exists sequences offunctions in E <strong>and</strong> K, respectively, converging towards a nonzero constant. Assumethat E = σ(E) <strong>and</strong> K = σ(K). The σ-algebra E ⊗ K is generated by the functions onthe <strong>for</strong>m f ⊗ g, where f ∈ E <strong>and</strong> g ∈ K.Proof. Let H be the σ-algebra generated by the functions f ⊗ g, where f ∈ E <strong>and</strong>g ∈ K. Clearly, H ⊆ E ⊗ K. We need to show the other inclusion. By definition, f ⊗ gis H-measurable whenever f ∈ E <strong>and</strong> g ∈ K. Now, let f ∈ E be fixed. By assumption,there is a sequence g n in K converging towards a constant, say c. Then we find thatcf(x) = c(f(x) lim g n (y)) = lim c(f ⊗ g n )(x, y),


B.6 Miscellaneous 281showing that the function (x, y) ↦→ f(x) is H-measurable <strong>for</strong> any f ∈ E. This meansthat f −1 (A) × K ∈ H <strong>for</strong> any f ∈ E <strong>and</strong> A ∈ B. Now, the sets {B ∈ E|B × K ∈ H}<strong>for</strong>m a σ-algebra, <strong>and</strong> by what we have just shown, this σ-algebra contains the family{f −1 (A)|f ∈ E, A ∈ B}, which generates E. We can there<strong>for</strong>e conclude that B×K ∈ H<strong>for</strong> any B ∈ E. In the samme manner, we can conclude that E ×C ∈ H <strong>for</strong> any C ∈ K.There<strong>for</strong>e, <strong>for</strong> B ∈ E <strong>and</strong> C ∈ K, we findB × C = (B × K) ∩ (E × C) ∈ H.These sets generate E ⊗ K, <strong>and</strong> we may there<strong>for</strong>e conclude that E ⊗ K ⊆ H.Lemma B.6.5. Let (E, E, µ) <strong>and</strong> (K, K, ν) be two measure spaces. Let f n be a sequenceof real mappings on E × K, measurable with respect to E ⊗ K. Let p ≥ 1 <strong>and</strong>assume that f n tends to f in L p (E ⊗ K). There is a subsequence such that <strong>for</strong> ν almostall y, ∫ |f nk (x, y)| p dµ(x) tends to ∫ |f(x, y)| p dµ(x).Proof. Since ∫ |f n − f| p d(µ ⊗ ν) tends to zero, the mapping in y on K given by∫|fn (x, y) − f(x, y)| p dµ(x) tends to zero in L 1 (K). By Lemma B.2.5, there is asubsequence converging ν almost surely. Let y be an element of K such that we haveconvergence. By the inverse triangle inequality of the norm in L p (E), we obtain(∫∣) 1 (∫|f nk (x, y)| p pdµ(x) −|f(x, y)| p dµ(x)) 1p∫∣ ≤ |f nk (x, y) − f(x, y)| p dµ(x),showing that ∫ |f nk (x, y)| p dµ(x) tends to ∫ |f(x, y)| p dµ(x), as desired.Finally, a few assorted results.Lemma B.6.6. Let µ <strong>and</strong> ν be two bounded signed measures on R ktrans<strong>for</strong>ms ψ <strong>and</strong> ϕ, respectively. If ψ = ϕ, then µ = ν.with LaplaceProof. See Jensen (1992), Theorem C.1.Lemma B.6.7 (Doob-Dynkin lemma). Let X be a variable on (Ω, F, P ) takingvalues in a polish space S. Let Y be another variable on the same probability spacetaking its values in a measurable space (E, E). X is σ(Y ) measurable if <strong>and</strong> only ifthere exists a measurable mapping ψ : E → S such that X = ψ(Y ).Proof. See Dellacherie & Meyer (1975), Theorem 18 of Section 12.1. The originalproof can be found in Doob (1953), p. 603.


282 Measure theoryLemma B.6.8. Let (E, E, µ) be a measure space <strong>and</strong> let K be a sub-σ-algebra of E.Let f be K measurable. With µ |K denoting the restriction of µ to K, we have∫ ∫f dµ = f dµ |K .Proof. If A ∈ K, we clearly have∫∫1 A dµ = µ(A) = µ |K (A) =1 A dµ |K .By linearity, the result extends to simple K measurable mappings. By monotoneconvergence, it extends to nonnegative K measurable mappings. By splitting intopositive <strong>and</strong> negative parts, the proof is finished.Lemma B.6.9. Let (Ω, F, P ) be a probability space. Let N be the null sets of thespace <strong>and</strong> define G = σ(F, N ). Any G measurable real variable is almost surely equalto some F measurable variable.Proof. Let X be G measurable. Let ξ be a version of X − E(X|F). If we can provethat ξ is almost surely zero, we are done. Note that with A denoting the almost suresets of F, G = σ(F, A). Since both F <strong>and</strong> A contains Ω, the sets of the <strong>for</strong>m A ∩ Bwhere A ∈ F <strong>and</strong> B ∈ A <strong>for</strong>m a generating system <strong>for</strong> G stable under intersections.By the Dynkin lemma, to prove that ξ is almost surely zero, it will suffice to provethat E1 A∩B ξ <strong>for</strong> A ∈ F <strong>and</strong> B ∈ N . To this end, we simply rewriteE1 A∩B ξ)E1 A ξ) = E1 A (X − E(X|F)) = E1 A X − EE(1 A X|F) = 0.The lemma is proved.


Appendix CAuxiliary resultsThis appendix contains some results which were developed in connection with the mainparts of the thesis, but which eventually turned out not to fit in or became replacedby simpler arguments. Since the results have independent interest, their proofs arepresented here.C.1 A proof <strong>for</strong> the existence of [M]In this section, we will discuss the existence of the quadratic variation <strong>for</strong> a continuousbounded martingale. This existence is proved in several different ways in the litteratur.Protter (2005) defines the quadratic variation directly through an integration-by-parts<strong>for</strong>mula <strong>and</strong> derives the properties of the quadratic variation afterwards. Karatzas &Shreve (1988) uses the Doob-Meyer Theorem to obtain the quadratic variation as thecompensator of the squared martingale. The most direct proof seems to be the onefound in Rogers & Williams (2000b). This proof, however, still requires a substantialbackground, including preliminary results on stochastic integration with respect toelementary processes. Furthermore, even though the proof may seem short, a gooddeal of details are omitted, <strong>and</strong> the full proof is actually very lengthy <strong>and</strong> complicated,as may be seen in, say, Jacobsen (1989) or Sokol (2005), including a good deal ofcumbersome calculations.


284 Auxiliary resultsWe will here present an alternative proof. The proof is a bit longer, but its structure issimpler, using quite elementary theory <strong>and</strong> requiring few prerequisites. In particular,no development of the stochastic integral is needed <strong>for</strong> the proof. The virtue of thisis that it allows the development of the quadratic variation be<strong>for</strong>e the developmentof the stochastic integral. This yields a separation of concerns <strong>and</strong> there<strong>for</strong>e a moretransparent structure of the theory. Be<strong>for</strong>e beginning the proof, we will outline themain ideas <strong>and</strong> the progression of the section.In the remainder of the section, we work in the context of a filtered probability space(Ω, F, P, F t ) <strong>and</strong> a continuous bounded martingale M.Consider a partition π of [0, t] <strong>and</strong> let s ≤ t. By π s , we denote the partition of [0, s]given by π s = {s}∪π ∩[0, s]. Our plan is to show that the limit of the process given ass ↦→ Q π s(M) converges uni<strong>for</strong>mly in L 2 on [0, t] as the partition grows finer. We willthen define the limit as the quadratic variation of M. Demonstrating this convergencewill require two main lemmas, the relatively easy Lemma C.1.3 <strong>and</strong> the somewhatdifficult Lemma C.1.4. After having defined the quadratic variation in this manner,we will argue that it is continuous <strong>and</strong> increasing <strong>and</strong> show that it can alternativelybe characterised as the unique increasing process such that M 2 − [M] is a uni<strong>for</strong>mlyintegrable martingale. We will show how to extend the definition of the quadraticvariation to continuous local martingales <strong>and</strong> show how to characterise the quadraticvariation when M is a square-integrable martingale <strong>and</strong> when M is a continuous localmartingale.We begin with a few definitions concerning convergence of sequences indexed by partitions.Let t ≥ 0, <strong>and</strong> let (x π ) be a family of elements of a pseudometric space (S, d),indexed by the partitions of [0, t]. We call (x π ) a net, inspired by the usual definitionof nets as given in, say, Munkres (2000).We say that x π is convergent to x if it holds that <strong>for</strong> any ε > 0, there exists a partitionπ such that whenever ϖ ⊇ π is another partition, d(x ϖ , x) < ε. We say that x π iscauchy if it holds that <strong>for</strong> any ε > 0, there exists a partition π such that wheneverϖ, ϑ ⊇ π are two other partitions, d(x ϖ , x ϑ ) < ε. Clearly, to show that a net is cauchy,it will suffice to show that <strong>for</strong> any ε > 0, there exists a partition such that wheneverϖ ⊇ π, d(x ϖ , x π ) ≤ ε.Lemma C.1.1. Let x π be a net in (S, d). If x π has a limit, it is unique. If (S, d) iscomplete <strong>and</strong> x π is cauchy, then x π has a limit.


C.1 A proof <strong>for</strong> the existence of [M] 285Proof. Assume that x π has limits x <strong>and</strong> y. For any partition π, it clearly holds thatd(x, y) ≤ d(x, x π ) + d(x π , y). From this, it immediately follows that limits are unique.Now assume that (S, d) is a complete pseudometric space <strong>and</strong> that x π is cauchy. Foreach n, let π n be the partition such that whenever ϖ, ϑ ⊇ π n , d(x ϖ , x ϑ ) ≤ 12. Bynexp<strong>and</strong>ing the partitions, we can assume without loss of generality that π n is increasing.We then obtain d(x πn , x πn+1 ) ≤ 12, <strong>and</strong> in particular, <strong>for</strong> any n ≥ m,nd(x πn , x πm ) ≤m∑k=n+1d(x πk−1 , x πk ) ≤ 12 n .There<strong>for</strong>e, x πn is a cauchy sequence, there<strong>for</strong>e convergent. Let x be the limit. Wewish to show that x is the limit of the net x π . By continuity of the metric, we obtaind(x πn , x) ≤ 12. Let ε > 0 be given. Let n be such that 1n 2< ε. Then we have, <strong>for</strong>nany ϖ ⊇ π n ,d(x ϖ , x) ≤ d(x ϖ , x πn ) + d(x πn , x) ≤ 22 n ≤ 2ε.Since ε was arbitrary, we have convergence.Lemma C.1.2. Let x π be convergent with limit x. Let ε n be a positive sequence tendingto zero. For each n, let π n be a partition such that whenever ϖ ⊇ π n , d(x ϖ , x) ≤ ε n .Then x πn is a sequence converging to x.Proof. This is immediate.The following two concepts will play important roles in the coming arguments. Wedefine the oscillation ω M [s, t] of M over [s, t] by ω M [s, t] = sup s≤u,r≤t |M u −M r |. Withπ = (t 0 , . . . , t n ) a partition, by δ π (M) we denote the modulus of continuity over π,defined as δ π (M) = max 0≤i


286 Auxiliary resultsLemma C.1.3. Let π be a partition of [s, t]. It holds thatEQ π (M) 2 ≤ 3Eω 2 M [s, t]Q π (M).We also have the weaker bound EQ π (M) 2 ≤ 12‖M ∗ ‖ 2 ∞E(M t − M s ) 2 .Proof. We first exp<strong>and</strong> the square to obtainEQ π (M) 2( n) 2∑= E (M tk − M tk−1 ) 2= Ek=1n∑(M tk − M tk−1 ) 4 + 2Ek=1n−1∑i=1 j=i+1The last term can be computed using Lemma 2.4.12,We then find,n−1∑2En−1∑= 2 Ei=1 j=i+1i=1⎛n∑(M ti − M ti−1 ) 2 (M tj − M tj−1 ) 2⎛⎝(M ti − M ti−1 ) 2 E ⎝n∑(M ti − M ti−1 ) 2 (M tj − M tj−1 ) 2 .∣ ⎞⎞n∑∣∣∣∣∣(M tj − M tj−1 ) 2 F ti⎠⎠j=i+1n−1∑= 2 E ( (M ti − M ti−1 ) 2 E((M t − M ti ) 2 |F ti ) )i=1n−1∑= 2 E(M ti − M ti−1 ) 2 (M t − M ti ) 2i=1n−1∑≤ 2EωM 2 [s, t] (M ti − M ti−1 ) 2i=1= 2Eω 2 M [s, t]Q π (M).EQ π (M) 2 ≤ En∑(M tk − M tk−1 ) 4 + 2EωM 2 [s, t]Q π (M)k=1≤ Eω 2 M [s, t]≤n∑(M tk − M tk−1 ) 2 + 2EωM 2 [s, t]Q π (M)k=13Eω 2 M [s, t]Q π (M),as desired. Since ω 2 M [s, t] ≤ 2‖M ∗ ‖ ∞ , we also getEQ π (M) 2 ≤ 12‖M ∗ ‖ 2 ∞EQ π (M) = 12‖M ∗ ‖ 2 ∞E(M t − M s ) 2 .


C.1 A proof <strong>for</strong> the existence of [M] 287Lemma C.1.4. Let π be a partition of [s, t]. Then(E sup (Mu − M s ) 2 − Q πu (M) ) 2≤ 17Eω2M [s, t] ( (M t − M s ) 2 + Q π (M) ) .s≤u≤tProof. Considering ((M u −M s ) 2 −Q π u(M)) 2 , we note that both terms in the square arenonnegative. There<strong>for</strong>e, the mixed term in the expansion of the square is nonpositive<strong>and</strong> we obtain the bound(E sup (Mu − M s ) 2 − Q π u(M) ) 2≤ E sup (M u − M s ) 4 + E sup Q π u(M) 2 .s≤u≤ts≤u≤ts≤u≤tThe proof of the lemma will proceed by first making an estimate that allows us to getrid of the suprema above. Then, we will evaluate the resulting expression to obtainthe result of the lemma. Specifically, we will remove the suprema by provingE sup (M u − M s ) 4 + E sup Q π u(M) 2s≤u≤ts≤u≤t≤ 5E(M t − M s ) 4 + 2Eω 2 M [s, t]Q π (M) + 5EQ π (M) 2 .This is a somewhat intricate task. We will split it into two parts, yielding the followingtotal structure of the proof: We first estimate the term E sup s≤u≤t (M u − M s ) 4 , thenestimate the term E sup s≤u≤t Q πu (M) 2 , <strong>and</strong> ultimately, we finalize our estimates <strong>and</strong>conclude.Step 1: Removing the suprema, first term. We begin by considering the termgiven by E sup s≤u≤t (M u − M s ) 4 . Note that whenever 0 ≤ u ≤ r, we have the equalityE(M s+r − M s |F s+u ) = M s+u − M s . There<strong>for</strong>e, M s+u − M s is a martingale in u withrespect to the filtration (F s+u ) u≥0 . Since the dual exponent of 4 is 4 3, the Doob Lpinequality yields( ) 4 4E sup (M u − M s ) 4 ≤ E(M t − M s ) 4 ≤ 5E(M t − M s ) 4 ,s≤u≤t3removing the supremum.Step 2: Removing the suprema, second term. Turning our attention to the


288 Auxiliary resultssecond term, E sup s≤r≤t Q π r(M) 2 , let s ≤ u ≤ t. We know that==Q πu (M) 2(∑ n) 2(M tk ∧u − M tk−1 ∧u) 2k=1n∑(M tk ∧u − M tk−1 ∧u) 4 +k=1i≠jn∑(M ti∧u − M ti−1∧u) 2 (M tj∧u − M tj−1∧u) 2 .There<strong>for</strong>e, we also have≤E sup Q π u(M) 2s≤u≤tn∑E sup (M tk ∧u − M tk−1 ∧u) 4s≤u≤tk=1+ E sups≤u≤tn∑(M ti ∧u − M ti−1 ∧u) 2 (M tj ∧u − M tj−1 ∧u) 2 .i≠jWe need to bound each of the two terms above. We begin by developing a bound <strong>for</strong>the first term. Applying Doob’s L p inequality used with the martingale M t k− M t k−1yieldsE sups≤u≤tk=1n∑n∑(M tk ∧u − M tk−1 ∧u) 4 ≤ E sup (M t ku − M t k−1u ) 4s≤u≤tk=1n∑ 256≤81 E(M t kt − M t k−1t ) 4k=1n∑≤ 5 E(M tk − M tk−1 ) 4 .Next, we consider the second sum, that is,E sups≤u≤tk=1n∑(M ti ∧u − M ti−1 ∧u) 2 (M tj ∧u − M tj−1 ∧u) 2 .i≠jWe start out by fixing s ≤ u ≤ t. Assume that u < t, the case u = t will trivially satisfythe bound we are about to prove. Letting c be the index such that t c−1 ≤ u < t c , wecan separate the terms where one of the indicies is equal to c <strong>and</strong> obtain three typesof terms: Those where i is distinct from c <strong>and</strong> j is equal to c, those where i is equal toc <strong>and</strong> j is distinct from c <strong>and</strong> those where i <strong>and</strong> j are distinct <strong>and</strong> both distinct fromc as well. Note that since we are only considering off-diagonal terms, we do not need


C.1 A proof <strong>for</strong> the existence of [M] 289to consider the case where i <strong>and</strong> j are both equal to c. We obtain≤++n∑(M ti∧u − M ti−1∧u) 2 (M tj∧u − M tj−1∧u) 2i≠jn∑(M ti ∧u − M ti−1 ∧u) 2 (M u − M tc−1 ) 2i≠cn∑(M u − M tc−1 ) 2 (M tj ∧u − M tj−1 ∧u) 2j≠cn∑(M ti∧u − M ti−1∧u) 2 (M tj∧u − M tj−1∧u) 2 .i≠j,i≠c,j≠cNow note that when i < c, t i ∧ u = t i <strong>and</strong> t i−1 ∧ u = t i−1 . And when i > c, t i ∧ u = u<strong>and</strong> t i−1 ∧ u = u. There<strong>for</strong>e,n∑(M ti ∧u − M ti−1 ∧u) 2 (M u − M tc−1 ) 2 =i≠c<strong>and</strong> analogously <strong>for</strong> the second term. Likewise,n∑(M ti − M ti−1 ) 2 (M u − M tc−1 ) 2i


290 Auxiliary resultsSince this bound holds <strong>for</strong> all s ≤ u ≤ t, we finally obtain≤E sups≤u≤tn∑(M ti ∧u − M ti−1 ∧u) 2 (M tj ∧u − M tj−1 ∧u) 2i≠j2Eω 2 M [s, t]Q π (M) + En∑(M ti − M ti−1 ) 2 (M tj − M tj−1 ) 2 .i≠jCombining our results, we finally obtainE sup Q πu (M) 2s≤u≤tn∑≤ 5 E(M tk − M tk−1 ) 4 + 2EωM 2 [s, t]Q π (M) + Ek=1≤ 2Eω 2 M [s, t]Q π (M) + 5EQ π (M) 2 .n∑(M ti − M ti−1 ) 2 (M tj − M tj−1 ) 2We have now developed the bounds announced earlier, <strong>and</strong> may all in all concludeas promised.i≠jE sup (M r − M s ) 4 + E sup Q π r(M) 2s≤r≤ts≤r≤t≤ 5E(M t − M s ) 4 + 2Eω 2 M [s, t]Q π (M) + 5EQ π (M) 2 ,Step 3: Final estimates <strong>and</strong> conclusion. We will now make an estimate <strong>for</strong> thefirst <strong>and</strong> last terms in the expression above, uncovering the result of the lemma. Forthe first term, we easily obtainFor the other term, Lemma C.1.3 yieldsCombining our findings, we conclude that5E(M t − M s ) 4 ≤ 5Eω 2 M [s, t](M t − M s ) 2 .5EQ π (M) 2 ≤ 15Eω 2 M [s, t]Q π (M).(E sup (Mu − M s ) 2 − Q πu (M) ) 2s≤u≤t≤ 5E(M t − M s ) 4 + 2Eω 2 M [s, t]Q π (M) + 5EQ π (M) 2≤5Eω 2 M [s, t](M t − M s ) 2 + 2Eω 2 M [s, t]Q π (M) + 15Eω 2 M [s, t]Q π (M)≤ 17Eω 2 M [s, t] ( (M t − M s ) 2 + Q π (M) ) ,which was precisely the proclaimed bound.


C.1 A proof <strong>for</strong> the existence of [M] 291Lemma C.1.5. Consider the space of processes on [0, t] endowed with the norm givenby ‖| · ‖| = ‖‖ · ‖ ∞ ‖ 2 . We denote convergence in ‖| · ‖| as uni<strong>for</strong>m L 2 convergence.This space is complete.Proof. Assume that X n is a cauchy sequence under ‖| · ‖|. Since a cauchy sequencewith a convergent subsequence is convergent, it will suffice to show that X nk has aconvergent subsequence.We know that <strong>for</strong> any ε > 0, there exists N such that <strong>for</strong> n, m ≥ N, it holds that‖‖X n − X m ‖ ∞ ‖ 2 ≤ ε. In particular, using 1 as our ε, there exists n8 k k such that <strong>for</strong>any n, m ≥ n k ,P(‖X n − X m ‖ ∞ > 1 )2 k ≤ 4 k ‖‖X n − X m ‖ ∞ ‖ 2 ≤ 1 2 k .We can assume without loss of generality that n k is increasing. We then obtain thatP (‖X nk − X nk+1 ‖ ∞ > 1 ) ≤ 1 , so ‖X2 k 2 k nk − X nk+1 ‖ ∞ ≤ 1 from a point onwards2 kalmost surely, by the Borel-Cantelli Lemma. Then we also find <strong>for</strong> i > j‖X ni − X nj ‖ ∞ ≤j∑k=i+1‖X nk−1 − X nk ‖ ∞ ≤ 1 2 k ,so X nk is almost surely uni<strong>for</strong>mly cauchy. There<strong>for</strong>e, it is almost surely uni<strong>for</strong>mlyconvergent. Let X be the limit. We wish to prove that X is the limit of X nk in ‖| · ‖|.Fixing k, we have‖|X nk − X‖| = ‖‖X nk − X‖ ∞ ‖ 2= ‖ lim ‖X nk − X nm ‖ ∞ ‖ 2m( () ) 12= E lim inf ‖X 2nm k− X nm ‖ ∞(= E lim inf (‖X nm k− X nm ‖ ∞ ) 2) 1 2= lim inf ‖‖X nm k− X nm ‖ ∞ ‖ 2 .Now note that X nk also is a cauchy sequence. Let ε > 0 be given <strong>and</strong> select N inaccordance with the cauchy property of X nk . Let k ≥ N, we then obtainlim infm‖‖X n k− X nm ‖ ∞ ‖ 2 = sup inf ‖‖X n k− X nm ‖ ∞ ‖ 2 ≤ ε.m≥ii≥NWe have now shown that <strong>for</strong> k ≥ N, ‖|X nk − X‖| ≤ ε. Thus, X nk converges to X,<strong>and</strong> there<strong>for</strong>e X n converges to X as well.


292 Auxiliary resultsWe are now ready to prove the convergence result needed to argue the existence ofthe quadratic variation. Letting π be a partition of [s, t], we earlier defined π u byπ u = {u} ∪ π ∩ [s, u] <strong>for</strong> s ≤ u ≤ t. We now extend this definition by defining π u beempty <strong>for</strong> u < s <strong>and</strong> letting π u = π <strong>for</strong> u > t. We use the convention that Q ∅ (M) = 0.Theorem C.1.6. Let t ≥ 0, <strong>and</strong> let π denote a generic partition of [0, t]. The mappings ↦→ Q πs (M) is uni<strong>for</strong>mly convergent on [0, t] in L 2 .Proof. Since uni<strong>for</strong>m L 2 convergence is a complete mode of convergence by LemmaC.1.5, it will suffice to show that s ↦→ Q π s(M) is a cauchy net with respect to uni<strong>for</strong>mL 2 convergence. We there<strong>for</strong>e consider the distance between elements u ↦→ Q π u(M)<strong>and</strong> u ↦→ Q ϖu (M), where ϖ ⊇ π. As usual, π = (t 0 , . . . , t n ). We put ϖ = (s 0 , . . . , s m )<strong>and</strong> let j k denote the unique element of ϖ such that s jk = t k . We define the partitionϖ k of [t k−1 , t k ] by ϖ k = (s jk−1 , . . . , s jk −1) <strong>and</strong> obtainQ ϖu (M) − Q πu (M) ===m∑n∑(Ms u k− Ms u k−1) 2 − (Mt u k− Mt u k−1) 2k=1n∑j k∑k=1 i=j k−1 +1k=1n∑(Ms u i− Ms u i−1) 2 − (Mt u k− Mt u k−1) 2k=1n∑∑nQ ϖk u (M) − (Mt u k− Mt u k−1) 2 .k=1k=1This implies that=≤≤≤‖ sup |Q ϖ u(M) − Q π u(M)|‖ 2u≤t∥ supn∑n∣∥ ∑∣∣∣∣ ∥∥∥∥ Q ϖk u (M) − tu≤t ∣k=1k=1(M u k− Mt u k−1) 2 2∥ n∑∣ ∥∥∥∥∣sup ∣Q ϖk u (M) − (Mu∥tk− Mt u k−1) 2 ∣∣u≤tk=12n∑∣∥ ∥ sup ∣∣Q ϖk u (M) − (Mutk− Mt u k−1) 2 ∣∣ ∥∥∥u≤tk=12∥n∑∣ ∣ ∥∥∥∥2 ∣∣Q ϖ supk u (M) − (Mu∥tk− Mt u k−1) 2 ∣∣.t k−1 ≤u≤t kk=1We have been able to reduce the supremum to [t k−1 , t k ] since the expression in thesupremum is constant <strong>for</strong> u ≤ t k−1 <strong>and</strong> <strong>for</strong> u ≥ t k , respectively. Now, using Lemma


C.1 A proof <strong>for</strong> the existence of [M] 293C.1.4, we conclude≤≤∥n∑∣ ∣ ∥∥∥∥2∣∣Q ϖ supk u (M) − (Mu∥tk− Mt u k−1) 2 ∣∣t k−1 ≤u≤t kn∑17EωM 2 [t k−1 , t k ] ( (M tk − M tk−1 ) 2 + Q ϖ k(M) )k=1k=1n∑17EδM 2 [s, t] ( (M tk − M tk−1 ) 2 + Q ϖ k(M) ) .k=1Since the union of the ϖ k is ϖ, we obtain, applying the Cauchy-Schwartz inequality<strong>and</strong> Lemma C.1.3,n∑17EδM 2 [s, t] ( (M tk − M tk−1 ) 2 + Q ϖ k(M) )k=1= 17Eδ 2 M [s, t](Q π (M) + Q ϖ (M))≤ 17‖δ 2 M [s, t]‖ 2 ‖Q π (M) + Q ϖ (M)‖ 2≤ 17‖δ 2 M [s, t]‖ 2 (‖Q π (M)‖ 2 + ‖Q ϖ (M)‖ 2 )≤ 34‖δM 2 ([s, t]‖ 2 12‖M ∗ ‖ 2 ∞E(M t − M s ) 2) 1 2=(34 √ )12‖M ∗ ‖ ∞ ‖M t − M s ‖ 2 ‖δM 2 [s, t]‖ 2 .As the mesh tends to zero, the continuity of M yields that δ π (M) tends to zero.By boundedness of M, Eδ π (M) 4 tends to zero as well. There<strong>for</strong>e, if π is chosensufficiently fine, the above can be made as small as desired. There<strong>for</strong>e, we concludethat s ↦→ Q π s(M) is uni<strong>for</strong>mly L 2 cauchy, <strong>and</strong> so it is convergent.Theorem C.1.7. There exists a unique continuous, increasing <strong>and</strong> adapted process[M] such that <strong>for</strong> any t ≥ 0, [M] t is the uni<strong>for</strong>m L 2 limit in π of the processess ↦→ Q πs (M) on [0, t].Proof. Letting n ≥ 1 <strong>and</strong> letting π be a generic partition of [0, n], by Lemma C.1.6,u ↦→ Q π u(M) is uni<strong>for</strong>mly L 2 convergent. Let X n be the limit. X n is then a processon [0, n]. By Lemma C.1.2, there is a sequence of partitions π n such that u ↦→ Q πn u (M)converges uni<strong>for</strong>mly in L 2 to X n . There<strong>for</strong>e, there is a subsequence converging almostsurely uni<strong>for</strong>mly. Since u ↦→ Q πn u (M) is continuous <strong>and</strong> adapted, X n is continuous<strong>and</strong> adapted as well.Now, X n is also the uni<strong>for</strong>m L 2 limit of u ↦→ Q π u(M), u ≤ t, where π is a genericpartition of [0, n + 1]. By uniqueness of limits, X n <strong>and</strong> X n+1 are indistinguishable


294 Auxiliary resultson [0, n]. By the Pasting Lemma, there exists a process [M] such that [M] n = X n .Since X n is continuous <strong>and</strong> adapted, [M] is continuous <strong>and</strong> adapted as well. Since[M] t is the limit of larger partitions than [M] s whenver s ≤ t, [M] t ≥ [M] s <strong>and</strong> [M]is increasing. And [M] has the property stated in the lemma of being the uni<strong>for</strong>m L 2limit on compacts of quadratic variations over partitions.Lemma C.1.8. The process M 2 t − [M] t is a uni<strong>for</strong>mly integrable martingale.Proof. We proceed in two steps. We first show that M 2 t − [M] t is a martingale, <strong>and</strong>then proceed to show that it is uni<strong>for</strong>mly integrable.Step 1: The martingale property. Let 0 ≤ s ≤ t. Let π be a partition of [0, t].Define ϖ = {s} ∪ π ∩ [s, t]. We then obtainThis shows thatE(Q π (M)|F s ) = E(Q π s(M) + Q ϖ (M)|F s )= Q π s(M) + E((M t − M s ) 2 |F s )= Q π s(M) + E(M 2 t |F s ) − M 2 s .E(M 2 t − Q π t(M)|F s ) = E(M 2 t − Q π (M)|F s ) = M 2 s − Q π s(M),so the process M 2 s −Q πs (M) is a martingale on [s, t]. Since Q πs (M) converges in L 2 to[M] s as π becomes finer, <strong>and</strong> conditional expectations are L 2 continuous, we concludeE(M 2 t − [M] t |F s ) = limπE(M 2 t − Q π t(M)|F s )= limπM 2 s − Q π s(M)= M 2 s − [M] s ,almost surely, showing the martingale property.Step 2: Uni<strong>for</strong>m integrability. By the continuity of the L 2 norm, we can useLemma C.1.3 to obtainE[M] 2 t = limπE(Q π (M) 2 ) ≤ 12‖M ∗ ‖ 2 ∞EM 2 t ≤ 12‖M ∗ ‖ 4 ∞,so [M] is bounded in L 2 , there<strong>for</strong>e uni<strong>for</strong>mly integrable. Since M 2 is bounded, it isclearly uni<strong>for</strong>mly integrable. We can there<strong>for</strong>e conclude that M 2 t − [M] t is a uni<strong>for</strong>mlyintegrable martingale.


C.1 A proof <strong>for</strong> the existence of [M] 295Our final task is localise our results from continuous bounded martingales to continuouslocal martingales <strong>and</strong> investigate how to characterize the quadratic variation in thelocal cases.Lemma C.1.9. Let π be a partition of [s, t]. Let τ be a stopping time with values in[s, ∞). ThenQ π τ(M) = Q π (M τ ).Proof. This follows immediately fromQ πτ (M) ==n∑(Mt 2 k ∧τ − Mt 2 k−1 ∧τ )k=1n∑((Mt τ k) 2 − (Mt τ k−1) 2 )k=1= Q π (M τ ).Lemma C.1.10. Let M be a continuous bounded martingale, <strong>and</strong> let τ be a stoppingtime. Then [M] τ = [M τ ].Proof. Let t ≥ 0. As usual, letting π be a generic partition of [0, t], we know that [M]is the uni<strong>for</strong>m L 2 limit on [0, t] of Q π s(M).E ([M] τ t − Q π (M τ )) 2 = E ([M] τ t − Q π (M) τ ) 2= E ([M] τ t − Q π τ(M)) 2(2≤ E sup |[M] s − Q (M)|) πs ,s≤twhich tends to zero. Thus, [M] τ t is the limit of Q π (M τ ). Since [M τ ] t is also thelimit of Q π (M τ ), we must have [M] τ t = [M τ ] t . Since the processes are continuous, weconclude that [M] τ = [M τ ] up to indistinguishability.Corollary C.1.11. Let M be a continous local martingale. There exists a uniquecontinuous, increasing <strong>and</strong> adapted process [M] such that if τ n is any localising sequencesuch that M τ nis a continuous bounded martingale, then [M] τ n= [M τ n].Proof. This follows from Theorem 2.5.11.


296 Auxiliary resultsLemma C.1.12. Let A be any increasing process with A 0 = 0. Then A is uni<strong>for</strong>mlyintegrable if <strong>and</strong> only if A ∞ ∈ L 1 .Proof. First note that the criterion in the lemma is well-defined, since A is convergent<strong>and</strong> there<strong>for</strong>e has a well-defined limit in [0, ∞]. We show each implication separately.Assume A ∞ ∈ L 1 . We then obtain, since A is increasing,lim sup E|A t |1 (|At|>x) ≤ limx→∞t≥0supx→∞ t≥0E|A ∞ |1 (|A∞|>x) = 0,by dominated convergence. There<strong>for</strong>e, A is uni<strong>for</strong>mly integrable. On the other h<strong>and</strong>,if A is uni<strong>for</strong>mly integrable, we find by monotone convergenceE|A ∞ | = limtE|A t | ≤ sup E|A t |,t≥0which is finite, since A is uni<strong>for</strong>mly integrable <strong>and</strong> there<strong>for</strong>e bounded in L 1 .Theorem C.1.13. Let M ∈ cM 2 0. The quadratic variation is the unique process [M]such that M 2 − [M] is a uni<strong>for</strong>mly integrable martingale.Proof. Let τ n be a localising sequence <strong>for</strong> M such that M τn is a continuous boundedmartingale. We will use Lemma 2.4.13 to prove the claim. There<strong>for</strong>e, we first provethat M 2 −[M] has an almost surely finite limit. We already know that M 2 has a limit,it will there<strong>for</strong>e suffice to show that this is the case <strong>for</strong> [M]. This follows sinceE[M] ∞ = E limn[M] τn= limnE[M] τn= limnE[M τn ] ∞= limnE(M τn ) 2 ∞≤ lim EM 2 ∞,which is finite. Now let a stopping time τ be given. Using that M ∈ cM 2 0 <strong>and</strong> applyingmonotone convergence, we obtainE(Mτ 2 − [M] τ ) = EMτ 2 − E[M] τ= E lim Mτ 2 n n∧τ − E lim[M] τn ∧τn= lim EMτ 2 n n∧τ − lim E[M] τn ∧τn= − lim E(M τn ) 2 τ − [M τn ] τn= 0.


C.2 A Lipschitz version of Urysohn’s Lemma 297By Lemma 2.4.13, M 2 − [M] is a uni<strong>for</strong>mly integrable martingale.Corollary C.1.14. Let M ∈ cM L 0 . The quadratic variation is the unique increasingprocess [M] such that M 2 − [M] is a continuous local martingale.C.2 A Lipschitz version of Urysohn’s LemmaIn this section, we prove an version of Urysohn’s Lemma giving a Lipschitz conditionon the mapping supplied. This result could be used in the proof of Lemma 4.3.3, butthe proof presented there seems simpler. However, since the extension of Urysohn’sLemma has independent interest, we present it here.Lemma C.2.1. Let ‖ · ‖ be any norm on R n <strong>and</strong> let d be the induced metric. Let Abe a closed set <strong>and</strong> let x ∈ A c . Then d(x, A) = d(x, y) <strong>for</strong> some y ∈ ∂A.Proof. First consider the case where A is bounded. Then A is compact. The mappingy ↦→ d(x, y) is continuous, <strong>and</strong> there<strong>for</strong>e attains its infimum over A. Thus, there existsy ∈ A such that d(x, A) = d(x, y). Assume that y ∈ A ◦ , we will seek a contradiction.Let ε > 0 be such that B ε (y) ⊆ A. There exists 0 < λ ≤ 1 with λx + (1 − λ)y ∈ B ε (y).We then find‖λx + (1 − λ)y − x‖ ≤ (1 − λ)‖x − y‖ < ‖x − y‖ = d(x, y),<strong>and</strong> there<strong>for</strong>e d(x, λx + (1 − λy)) < d(x, y), which is the desired contradiction. Thus,y /∈ A ◦ , <strong>and</strong> since A is closed, we obtain y ∈ ∂A.Next, consider the case where A is unbounded. Then, there exists a sequence z n in Awith norms converging to infinity. We defineK n = {y ∈ A|‖y − x‖ ≤ ‖z n − x‖}.Since the norm is continuous, K n is closed. And any y ∈ K n satisfies the relation‖y‖ ≤ ‖y − x‖ + ‖x‖ ≤ ‖z − x‖ + ‖x‖, so K n is bounded. For any y ∈ A \ K n , we haved(y, x) > d(x, z n ) ≥ d(x, A) <strong>and</strong> there<strong>for</strong>e d(x, A) = d(x, K n ).From what we already have shown, there is y n ∈ ∂K n such that d(x, A) = d(x, y n ).We need to prove that y n ∈ ∂A <strong>for</strong> some n. Note that with α n = ‖z n − x‖, we have∂K n = ∂(A ∩ B αn (x)) ⊆ ∂A ∪ ∂B αn (x) = ∂A ∪ {y ∈ R n |‖y − x‖ = α n }.


298 Auxiliary resultsSince ‖y n − x‖ = d(y n , x) = d(x, A), the sequence ‖y n − x‖ is constant, in particularbounded. But α n tends to infinity, since ‖z n ‖ tends to infinity. Thus, there is n suchthat ‖y n − x‖ ̸= α n , <strong>and</strong> there<strong>for</strong>e y n ∈ ∂A, as desired.Theorem C.2.2 (Urysohn’s Lemma with Lipschitz constant). Let K <strong>and</strong> Vbe compact, respectively open subsets of R n , with K ⊆ V . Let ‖ · ‖ be any norm onR n , <strong>and</strong> let d be the induced metric. Assume that V has compact closure. There existsf ∈ C c (R n ) such that K ≺ f ≺ V <strong>and</strong> such that f is d-Lipschitz continuous <strong>and</strong>d-Lipschitz constant4 sup x∈∂V,y∈∂K d(x, y)(inf x∈V \K ◦ d(x, K) + d(x, V c )) 2 .Proof. Clearly, we can assume K ≠ ∅. We definef(x) =d(x, V c )d(x, K) + d(x, V c ) ,<strong>and</strong> claim that f satisfies the properties given in the lemma. We first argue that f iswell-defined. We need to show that the denominator in the expression <strong>for</strong> f is finite<strong>and</strong> nonzero. Since K ≠ ∅, d(x, K) is always finite. Since V has compact closure,V ≠ R n , so d(x, V c ) is also always finite. Thus, the denominator is finite. It is alsononnegative, <strong>and</strong> if d(x, K) + d(x, V c ) = 0, we would have by the closedness of K <strong>and</strong>V c that x ∈ K ∩ V c , an impossibility. This shows that f is well-defined.It remains to prove that f ∈ C c (R n ), that K ≺ f ≺ V <strong>and</strong> that f has the desiredLipschitz properties. We will prove these claims in four steps.Step 1: f ∈ C c (R n ) with K ≺ f ≺ V . It is clear that f is continuous. If x ∈ K,d(x, K) = 0 <strong>and</strong> there<strong>for</strong>e f(x) = 1. If x ∈ V c , d(x, V c ) = 0 <strong>and</strong> f(x) = 0. Thus,K ≺ f ≺ V . Also, since f is zero outside of V , the support of f is included in V .Since V has compact closure, f has compact support.Step 2: Lipschitz continuity of f on V \ K ◦ . We begin by showing that f isLipschitz continuous on the compact set V \ K ◦ . Letting x, y ∈ V \ K ◦ , we findf(x) − f(y) =d(x, V c )d(x, K) + d(x, V c ) − d(y, V c )d(y, K) + d(y, V c )= d(x, V c )(d(y, K) + d(y, V c )) − d(y, V c )(d(x, K) + d(x, V c ))(d(x, K) + d(x, V c ))(d(y, K) + d(y, V c ))d(x, V c )d(y, K) − d(y, V c )d(x, K)=(d(x, K) + d(x, V c ))(d(y, K) + d(y, V c )) .


C.2 A Lipschitz version of Urysohn’s Lemma 299Since the mapping x ↦→ d(x, K) + d(x, V c ) is positive <strong>and</strong> continuous in x <strong>and</strong> y <strong>and</strong>V \ K ◦ is compact, we conclude that the infimum of d(x, K) + d(x, V c ) over x inV \ K ◦ is positive. Let C denote this infimum. Furthermore, note that the mappingsx ↦→ d(x, V c ) <strong>and</strong> x ↦→ d(x, K) is continuous, <strong>and</strong> there<strong>for</strong>e their suprema over x inV \ K ◦ are finite. Let c 1 <strong>and</strong> c 2 denote these suprema. Note that <strong>for</strong> any x, y, a, b ∈ R,( a + bxa − yb = x2= (x − y) a + b2+ a − b2)− y( a + b2+ (a − b) x + y ,2− a − b2so by symmetry, |xa − yb| ≤ |x − y| |a+b|2+ |a − b| |x+y|2. This shows that|d(x, V c )d(y, K) − d(y, V c )d(x, K)||f(x) − f(y)| =(d(x, K) + d(x, V c ))(d(y, K) + d(y, V c ))≤ 1C 2 |d(x, V c )d(y, K) − d(y, V c )d(x, K)|≤ 1C 2 (c 2|d(x, V c ) − d(y, V c )| + c 1 |d(x, K) − d(y, K)|)≤ 1C 2 (c 2d(x, y) + c 1 d(x, y))≤ c 1 + c 2C 2 d(x, y).Now, since , We have now shown the Lipschitz property of f <strong>for</strong> x, y ∈ V , obtainingthe Lipschitz constant c = c1+c2C. We will now extend the Lipschitz property to all of2R n .)Step 3: Lipschitz property of f on R n . We first extend the Lipschitz propertyto R n \ K ◦ . Let x, y ∈ R n \ K ◦ be arbitrary. Since we already have the Lipschitzproperty when x, y ∈ V \ K ◦ , we only need to consider the case where one of x <strong>and</strong>y are in V c . If x <strong>and</strong> y both are in V c , they are also both in V c , so f(x) − f(y) = 0<strong>and</strong> the Lipschitz property trivially holds here. It remains to consider the case where,say, x ∈ V c <strong>and</strong> y ∈ V \ K ◦ . Now, since V is compact, by Lemma C.2.1 there existsz ∈ ∂V such that d(x, V ) = d(x, z). Now, since V is open, ∂V = ∂V = ∂V c ⊆ V c .Since f is zero on V c , we conclude f(z) = 0. Since also f(x) = 0, we obtain|f(x) − f(y)| ≤ |f(x) − f(z)| + |f(z) − f(y)| = |f(y) − f(z)| ≤ cd(y, z).Now, z is the element of V which minimizes the distance to x. Since y ∈ V , it followsthat d(x, z) ≤ d(x, y) <strong>and</strong> there<strong>for</strong>e d(y, z) ≤ d(y, x) + d(x, z) ≤ 2d(x, y). We conclude|f(x) − f(y)| ≤ 2cd(y, z).


300 Auxiliary resultsWe have now proven that f is Lipschitz on R n \ K ◦ with Lipschitz constant 2c.Finally, we extend the Lipschitz property to all of R n . Let x, y ∈ R n . As be<strong>for</strong>e, itwill suffice to consider the case where either x ∈ K ◦ or y ∈ K ◦ . In the case x, y ∈ K ◦ ,f(x) − f(y) = 0 <strong>and</strong> the Lipschitz property is trivial.We there<strong>for</strong>e consider the case where x ∈ K ◦ <strong>and</strong> y ∈ R n \K ◦ . By Lemma C.2.1, thereis z on the boundary of R n \ K ◦ such that d(x, V \ K ◦ ) = d(x, z). Now, the boundaryof R n \ K ◦ is the same as ∂K, so z ∈ ∂K. K is closed <strong>and</strong> there<strong>for</strong>e ∂K ⊆ K, sof(z) = 1. Since also x ∈ K ◦ ⊆ K, f(x) = 1 <strong>and</strong> we find|f(x) − f(y)| ≤ |f(x) − f(z)| + |f(z) − f(y)| = |f(y) − f(z)|.We know that f is Lipschitz on R n \ K ◦ <strong>and</strong> that y ∈ R n \ K ◦ . Since z ∈ ∂(R n \ K ◦ ),there is a sequence z n ∈ R n \ K ◦ such that z n converges to z. This yields|f(y) − f(z)| = limn|f(y) − f(z n )| ≤ limncd(y, z n ) = cd(y, z),<strong>and</strong> we thus recover |f(x) − f(y)| ≤ cd(y, z). As in the previous step, d(x, z) ≤ d(x, y)<strong>and</strong> so d(y, z) ≤ d(y, x) + d(x, z) ≤ 2d(x, y), yielding |f(x) − f(y)| ≤ 2cd(y, z). Thisshows the Lipschitz property.Step 4: Estimating the Lipschitz constant. We now know that f is Lipschitzcontinuous with Lipschitz constant2c = 2c 1 + 2c 2C 2 = 2 sup x∈V \K ◦ d(x, K) + 2 sup x∈V \K ◦ d(x, V c )(inf x∈V \K ◦ d(x, K) + d(x, V c )) 2 .This expression is somewhat cumbersome. By estimaing it upwards, we obtain aweaker result, but more manageable. We will leave the denominator untouched <strong>and</strong>only consider the numerator. Our goal is to showsup d(x, K) ≤ sup d(x, y)x∈V \K ◦ x∈∂V,y∈∂Ksup d(x, V c ) ≤ sup d(x, y).x∈V \K ◦ x∈∂V,y∈∂KConsider the first equation. Obviously, to prove the inequality, we can assume thatsup x∈V \K ◦ d(x, K) is nonzero. First note that since the mapping x ↦→ d(x, V c ) iscontinuous <strong>and</strong> V \ K ◦ is compact, there is z ∈ V \ K ◦ such thatsup d(x, V c ) = d(z, V c ).x∈V \K ◦


C.2 A Lipschitz version of Urysohn’s Lemma 301And by Lemma C.2.1, there is w ∈ ∂V c = ∂V such that d(z, V c ) = d(z, w). Wewant to prove z ∈ ∂K. First assume that z is in the interior of V \ K ◦ , we seek acontradiction. Let ε > 0 be such that B ε (z) ⊆ V \ K ◦ . There is 0 < λ ≤ 1 such thatλw + (1 − λ)z ∈ B ε (z), <strong>and</strong> we then findd(λw + (1 − λ)z, w) = ‖(1 − λ)(z − w)‖ = (1 − λ)‖z − w‖ < d(z, w).But z was the infimum of d(y, w) over y ∈ V \ K ◦ . Thus, we have obtained a contradiction<strong>and</strong> conclude that z is not in the interior of V \ K ◦ . Since V \ K ◦ is closed, weconclude z ∈ ∂(V \ K ◦ ) ⊆ ∂V ∪ ∂K. If z ∈ ∂V , we obtain d(z, V c ) = 0, inconsistentwith our assumption that sup x∈V \K ◦ d(x, K) is nonzero. There<strong>for</strong>e, z ∈ ∂K. We maynow concludesup d(x, K) = d(z, w) ≤ sup d(x, y).x∈V \K ◦ x∈∂V,y∈∂KWe can use exacly the same arguments to prove the other inequality. We can there<strong>for</strong>eall in all conclude that f is Lipschitz with Lipschitz constantas desired.4 sup x∈∂V,y∈∂K d(x, y)(inf x∈V \K ◦ d(x, K) + d(x, V c )) 2 ,Comment C.2.3 The Lipschitz constant could possibly be improved by instead consideringthe mapping{ d(x, V c })f(x) = mind(K, V c ) , 1 .◦


302 Auxiliary results


Appendix DList of SymbolsGeneral probability• (Ω, F, F T , P ) - the background probability space.• F ∞ - The σ-algebra induced by ∪ t≥0 F t .• W - Generic Brownian motion.• τ, σ - Generic stopping times.• Σ A - The σ-algebra of measurability <strong>and</strong> adaptedness.• Σ π - The progressive σ-algebra.• M - Generic martingale.• M ∗ t- The maximum of M over [0, t].• M ∗ - The maximum of M over [0, ∞).• SP - The space of real stochastic processes.• M - The space of martingales.• M 0 - The space of martingales starting at zero.• cM 2 0 - The space of continuous square-integrable martingales starting at zero.


304 List of Symbols• cM L 0 - The space of continuous local martingales starting at zero.• P−→ - Convergence in probability.• −→ Lp- Convergence in L p .• a.s.−→ - Almost sure convergence.• φ - The st<strong>and</strong>ard normal density.• φ n - The n-dimensional st<strong>and</strong>ard normal density.<strong>Stochastic</strong> integration• bE - The space of elementary processes.• L 2 (W ) - The space of progressive processes X with E ∫ ∞0X 2 t dt finite.• L 2 (W ) - The space of processes L 0 -locally in L 2 (W ).• cM L W - The processes ∑ nk=1∫ t0 Y k dW k t , Y ∈ L 2 (W ) n .• µ M - The measure induced by M ∈ cM L W .• L 2 (M) - The L 2 space induced by µ M .• L 2 (M) - The space of processes L 0 -locally in L 2 (M).• cFV - The space of continuous finite variation processes.• cFV 0 - The space of continuous finite variation processes starting at zero.• S - The space of st<strong>and</strong>ard processes.• L(X) - The space of integr<strong>and</strong>s <strong>for</strong> X ∈ S.• [X] - The quadratic variation of a process X ∈ S.• Q π (M) - The quadratic variation of M over the partition π.• E(X) - The Doléans-Dade exponential of X ∈ S.• X, Y - Generic elements of S.• H, K - Generic elements of bE.


305The Malliavin calculus• D - The Malliavin derivative.• S - The space of smooth variables, Cp ∞ (R n ) trans<strong>for</strong>mations of W coordinates.• L p (Π) - The space L p ([0, T ] × Ω, B[0, T ] ⊗ F T , λ ⊗ P ).• D 1,p - The domain of the extended Malliavin derivative.• F - Generic element of D 1,p .• X - Generic element of L 2 (Π).• θ - The Brownian stochastic integral operator on [0, T ].• (e n ) - Generic countable orthonormal basis of L 2 [0, T ].• 〈·, ·〉 [0,T ] - The inner product on L 2 [0, T ].• 〈·, ·〉 Π - The inner product on L 2 (Π).• H n - The subspace of L 2 (F T ) based on the n’th Hermite polynomial.• H n,H ′ n ′′ - Dense subspaces of H n .• H n - Orthonormal basis of H n .• H n (Π) - The subspace of L 2 (Π) based on H n .• H n(Π), ′ H n(Π) ′′ - Dense subspaces of H n (Π).• H n (Π) - Orthonormal basis <strong>for</strong> H n (Π).• P n - Orthogonal projection onto H n .• P Π n- Orthogonal projection onto H n (Π).• Φ a - Generic element of H n .• P n - Polynomials of degree n in any number of variables.• P n - Subspace of L 2 (F T ) based on P n .• P n ′ - Dense subspace of P n .• P n - Total subset of P n .


306 List of Symbols• δ - The Skorohod integral.<strong>Mathematical</strong> finance• Q - A generic T -EMM.• S - Generic asset price process.• r - Generic short rate process.• B - Generic risk-free asset price process.• S ′ - Generic normalized asset price process.• B ′ - Generic normalized risk-free asset price process.• M - Generic financial market model.• (h 0 , h S ) - Generic portfolio strategy.• V h - Value process of the portfolio strategy h.• µ - Generic drift vector process.• σ - Generic volatility matrix process.• ρ - Generic correlation.Measure theory• B - The Borel σ-algebra on R.• B k - The Borel σ-algebra on R k .• B(M) - The Borel σ-algebra on a pseudoemetric space (M, d).• λ - The Lebesgue measure.• [f] - The equivalence class corresponding to f.• C([0, ∞), R n ) - The space of continuous mappings from [0, ∞) to R n .• Xt ◦ - The t’th projection on C([0, ∞), R n ).


307• C([0, ∞), R n ) - The σ-algebra on C([0, ∞), R n ) induced by the projections.<strong>Analysis</strong>• C 1 (R n ) - The space of continuous differentiably mappings from R n to R.• C ∞ (R n ) - The infinitely differentiable elements of C 1 (R n ).• C ∞ p (R n ) - The elements of C ∞ (R n ) with polynomial growth <strong>for</strong> the mapping<strong>and</strong> its partial derivatives.• C ∞ c(R n ) - The elements of C ∞ (R n ) with compact support.• C 2 (U) - The space of twice continuously differentiable mappings from U to R n .• ‖ · ‖ ∞ - The uni<strong>for</strong>m norm.• H n - The n’th Hermite polynomial.• M ⊕ N - orthonormal sum of two closed subspaces.• V F - Variation of F .• µ F - Signed measure induced by a finite variation mapping F .• f ∗ g - Convolution of f <strong>and</strong> g.• (ψ ε ) - Dirac family.• K - Generic compact set.• U, V - Generic open sets.


308 List of Symbols


BibliographyL. Andersen: Simple <strong>and</strong> efficient simulation of the Heston stochastic volatility model,Journal of Computational <strong>Finance</strong>, Vol. 11, No. 3, Spring 2008.E. Alòs & C. Ewald: Malliavin Differentiability of the Heston Volatility <strong>and</strong> Applicationsto Option Pricing, Advances in Applied Probability 40, p. 144-162.R. B. Ash: Measure, Integration <strong>and</strong> Functional <strong>Analysis</strong>, Academic Press, 1972.T. M. Apostol: Calculus, Volume 1, Blaisdell Publishing Company, 1964.V. Bally: An elementary introduction to Malliavin calculus, INRIA, 2003,http://perso-math.univ-mlv.fr/users/bally.vlad/RR-4718.pdf.R. F. Bass: Diffusions <strong>and</strong> Elliptic Operators, Springer-Verlag, 1998.D. R. Bell: The Malliavin Calculus, Longman Scientific & Technical, 1987.C. Berg & T. G. Madsen: Mål- og integralteori, Matematisk afdeling, KU, 2001.P. Billingsley: Convergence of Probability Measures, 2nd edition, 1999.E. Benhamou: Smart Monte Carlo: Various tricks using Malliavin calculus, 2001,http://www.iop.org/EJ/article/1469-7688/2/5/301/qf2501.pdf.E. Benhamou: Optimal Malliavin Weighting function <strong>for</strong> the computation of thegreeks, <strong>Mathematical</strong> <strong>Finance</strong>, Vol. 13, No. 1, 37-53.T. Björk: Arbitrage Theory in Continuous Time, Ox<strong>for</strong>d University Press, 2nd edition,2004N. L. Carothers: Real <strong>Analysis</strong>, Cambridge University Press, 2000.


310 BIBLIOGRAPHYN. Chen & P. Glasserman: Malliavin Greeks without Malliavin Calculus, ColumbiaUniversity, 2006, http://www.cfe.columbia.edu/pdf-files/Glasserman 03.pdf.P. Cheridito, D. Filipovic & R. L. Kimmel: Market Price of Risk Specifications <strong>for</strong>Affine Models: Theory <strong>and</strong> Evidence, 2005, http://ssrn.com/abstract=524724.K. L. Chung & J. L. Doob: Fields, Optionality <strong>and</strong> measurability, American Journalof Mathematics, 87, 397-424, 1965.D. L. Cohn: Measure Theory, Birkhäuser, 1980.R. Cont, P. Tankov: Financial Modeling with Jump Processes, Chapman & Hall, 2004.J. C. Cox, J. E. Ingersoll & S. A. Ross: A Theory of the Term Structure of InterestRates, Econometrica, Vol. 53, No. 2, p. 385-407.C. Dellacherie & P. A. Meyer: Probabilities <strong>and</strong> Potential A, North-Holl<strong>and</strong>, 1975.R. M. Dudley: Real <strong>Analysis</strong> <strong>and</strong> Probability, Cambridge University Press, 2002.F. Delbaen & W. Schachermayer: Non-Arbitrage <strong>and</strong> the Fundamental Theorem ofAsset Pricing: Summary of Main Results, Proceedings of Symposia in Applied Mathematics,1997, http://www.fam.tuwien.ac.at/ wschach/pubs/preprnts/prpr0094.ps.F. Delbaen & W. Schachermayer: The Mathematics of Arbitrage, Springer, 2008.J. L. Doob: <strong>Stochastic</strong> Processes, Wiley, 1953.A. A. Dragulescu & V. M. Yakovenko: Probability Distribution of returns in the Hestonmodel with stochastic volatility, Quantitative <strong>Finance</strong> 2002, Vol. 2, pp. 443-453.R. J. Elliott: <strong>Stochastic</strong> Calculus <strong>and</strong> Applications, Springer-Verlag, 1982.S. N. Ethier & T. G. Kurtz: Markov Processes: Characterization <strong>and</strong> Convergence,Wiley, 1986.W. Feller: Two Singular Diffusion Problems, The Annals of Mathematics, SecondSeries, Vol. 54, No. 1, pp. 173-182.E. Fournié et al.: Applications of Malliavin Calculus to Monte Carlo methods infinance, <strong>Finance</strong> <strong>and</strong> <strong>Stochastic</strong>s, 1999, no.3. p. 391-412.E. Fournié et al.: Applications of Malliavin Calculus to Monte Carlo methods in financeII, <strong>Finance</strong> <strong>and</strong> <strong>Stochastic</strong>s, 2002, no.5. p. 201-236.


BIBLIOGRAPHY 311C. P. Fries & M. S. Yoshi: Partial Proxy simulation schemes <strong>for</strong> generic <strong>and</strong> robustMonte Carlo Greeks, Journal of Computational <strong>Finance</strong>, Vol. 11, No. 3, 2008.P. K. Friz: An Introduction to Malliavin Calculus, Courant <strong>Institut</strong>e of <strong>Mathematical</strong>Sciences, New York University, 2002, http://www.statslab.cam.ac.uk/ peter/malliavin/Malliavin2005/mall.pdf.P. Glasserman: Monte Carlo Methods in Financial Engineering, Springer-Verlag, 2003.E. Hansen: S<strong>and</strong>synlighedsregning på Målteoretisk Grundlag, Afdeling <strong>for</strong> anvendtmatematik og statistik, 2004.E. Hansen: Measure Theory, Afdeling <strong>for</strong> anvendt matematik og statistik, 2004.V. L. Hansen: Functional <strong>Analysis</strong>: Entering Hilbert Space, World Scientific, 2006.S. L. Heston: A Closed-Form Solution <strong>for</strong> Options with <strong>Stochastic</strong> Volatility withApplications to Bond <strong>and</strong> Currency Options, The Review of Financial Studies, Vol.6, No. 2, pp. 327-343.N. Ikeda & S. Watanabe: <strong>Stochastic</strong> Differential Equations <strong>and</strong> Diffusion Processes,2nd edition, North-Holl<strong>and</strong>, 1989.M. Jacobsen: Forelæsningsnoter om stokastiske integraler, 1989.M. Jacobsen: Markov Processer, 1972.M. Jacobsen: Videregående S<strong>and</strong>synlighedsregning, Universitetsbogladen København,3. udgave, 2003.J. Jacod, A. N. Shiryaev: Limit Theorems <strong>for</strong> <strong>Stochastic</strong> Processes, Springer-Verlag,1987.S. T. Jensen: Eksponentielle Familier, 2. udgave, <strong>Institut</strong> <strong>for</strong> Matematisk Statistik,1992.I. Karatzas, S. E. Shreve: Brownian Motion <strong>and</strong> <strong>Stochastic</strong> Calculus, Springer-Verlag,1988.I. Karatzas, S. E. Shreve: Methods of <strong>Mathematical</strong> <strong>Finance</strong>, Springer-Verlag, 1988.S. Kaden, J. Potthoff: Progressive <strong>Stochastic</strong> Processes <strong>and</strong> an Application to the Itôintegral, Stoch. Anal. <strong>and</strong> Appl, 2004, Vol 22, number 4, pages 843-866.H. Kunita: <strong>Stochastic</strong> flows <strong>and</strong> stochastic differential equations, Cambridge UniversityPress, 1990.


312 BIBLIOGRAPHYS. Levental & A. V. Skorohod: A Necessary <strong>and</strong> Sufficient Condition <strong>for</strong> Absence ofArbitrage with Tame Portfolios, The Annals of Applied Probability, Vol. 5, No. 4,pp. 906-925.R. Lord, R. Koekkoek & D. van Dijk: A Comparison of Biased Simulation Schemes<strong>for</strong> <strong>Stochastic</strong> Volatility Models, Tinbergen <strong>Institut</strong>e Discussion Paper, 2006.P. Malliavin: <strong>Stochastic</strong> <strong>Analysis</strong>, Springer-Verlag, 1997.P. Malliavin & A. Thalmaier: <strong>Stochastic</strong> Calculus of Variations in <strong>Mathematical</strong> <strong>Finance</strong>,Springer, 2005.R. Meise & D. Vogt: Introduction to Functional <strong>Analysis</strong>, Clarendon Press, Ox<strong>for</strong>d,1997.J. R. Munkres: Topology, 2nd edition, Prentice Hall, 2000.M. Montero & A. Kohatsu-Higa: Malliavin Calculus applied to <strong>Finance</strong>, 2002,http://www.recercat.net/bitstream/2072/817/1/672.pdf.G. Di Nunno, B. Øksendal & F. Proske: Malliavin Calculus <strong>for</strong> Lévy Processes withApplications to <strong>Finance</strong>, Universitext, Springer 2008 (to appear).D. Nualart: The Malliavin Calculus <strong>and</strong> Related Topics, 2nd edition, Springer-Verlag,2006.P. Protter: <strong>Stochastic</strong> integration <strong>and</strong> differential equations, 2nd edition, Springer-Verlag, 2005.P. Protter & K. Shimbo: No Arbitrage <strong>and</strong> General Semimartingales, 2006,http://people.orie.cornell.edu/ protter/WebPapers/na-girsanov8.pdf.D. Revuz & M. Yor: Continuous Martingales <strong>and</strong> Brownian Motion, 3rd edition,Springer-Verlag, 1998.L. C. G. Rogers & D. Williams: Diffusions, Markov Processes <strong>and</strong> Martingales, Volume1: Foundations, 2nd edition, Cambridge University Press, 2000.L. C. G. Rogers & D. Williams: Diffusions, Markov Processes <strong>and</strong> Martingales, Volume2: Itô Calculus, 2nd edition, Cambridge University Press, 2000.W. Rudin: Real <strong>and</strong> Complex <strong>Analysis</strong>, McGraw-Hill, 3rd edition, 1987.M. Sanz-Solé: Malliavin Calculus with applications to stochastic partial differentialequations, EPFL Press, 2005.


BIBLIOGRAPHY 313M. Schechter: Principles of Functional <strong>Analysis</strong>, 2nd edition, American <strong>Mathematical</strong>Society, 2002.T. Schröter: Malliavin Calculus in <strong>Finance</strong>, Trinity College Special Topic Essay 2007,http://people.maths.ox.ac.uk/schroter/mall.pdf.M. Sharpe: General Theory of Markov Processes, Academic Press, 1988.S. E. Shreve: <strong>Stochastic</strong> Calculus <strong>for</strong> <strong>Finance</strong> II: Continuous-Time Models, Springer,2004.A. Sokol: Stokastiske integraler og Differentialligninger, 2005.A. Sokol: Markov Processer, 2007.J. M. Steele: <strong>Stochastic</strong> Calculus <strong>and</strong> Financial Applications, Springer-Verlag, 2000.C. M. Stein: Estimation of the Mean of a Multivariate Normal Distribution, TheAnnals of Statistics, 1981, Vol. 9, No. 6, pp. 1135-1151.G. Szegö: Orthogonal Polynomials, American <strong>Mathematical</strong> Society, 1939.H. Zhang: The Malliavin Calculus, 2004, http://janroman.dhis.org/finance/MalliavinCalculus/Malliavin Calculus.pdf.H. Zou, T. Hastie & R. Tibshirani: On the “Degrees of Freedom” of the Lasso, TheAnnals of Statistics, 2007, Vol. 35, No. 5, pp. 2173-2192.B. Øksendal: <strong>Stochastic</strong> Differential Equations, 6th edition, Springer-Verlag, 2005.B. Øksendal: An introduction to Malliavin calculus with applications to economics,1997.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!