23.01.2014 Views

Estimation in Financial Models - RiskLab

Estimation in Financial Models - RiskLab

Estimation in Financial Models - RiskLab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Estimation</strong> <strong>in</strong> F<strong>in</strong>ancial <strong>Models</strong><br />

RISKLAB<br />

Anja Go<strong>in</strong>g<br />

ETH Zurich<br />

Department of Mathematics<br />

CH-8092 Zurich-Switzerland<br />

Telephone: 41-1-632 34 58<br />

Fax: 41-1-632 10 85<br />

E-Mail: goe<strong>in</strong>g@math.ethz.ch<br />

January 1996


Contents<br />

Introduction 3<br />

1 Stochastic Volatility <strong>Models</strong> 5<br />

1.1 Cont<strong>in</strong>uous time : : : : : : : : : : : : : : : : : : : : : : : : : : 5<br />

1.2 Discrete time : : : : : : : : : : : : : : : : : : : : : : : : : : : 7<br />

2 Approximations 8<br />

2.1 Diusion models by discrete time models : : : : : : : : : : : : 8<br />

2.2 Discrete time models by diusion models : : : : : : : : : : : : 14<br />

3 Parameter <strong>Estimation</strong> 16<br />

3.1 Diusion models : : : : : : : : : : : : : : : : : : : : : : : : : 16<br />

3.1.1 Cont<strong>in</strong>uous observations : : : : : : : : : : : : : : : : : 16<br />

3.1.2 Discrete observations : : : : : : : : : : : : : : : : : : : 22<br />

3.2 Discrete models : : : : : : : : : : : : : : : : : : : : : : : : : : 42<br />

3.2.1 AR models : : : : : : : : : : : : : : : : : : : : : : : : 42<br />

3.2.2 ARCH and GARCH models : : : : : : : : : : : : : : : 45<br />

3.2.3 The Bayesian estimation method : : : : : : : : : : : : 50<br />

4 Nonparametric <strong>Estimation</strong> 56<br />

4.1 Diusion models : : : : : : : : : : : : : : : : : : : : : : : : : 56<br />

4.2 Discrete models : : : : : : : : : : : : : : : : : : : : : : : : : : 58<br />

1


5 Some Diusion <strong>Models</strong> with Explicit Solutions 60<br />

5.1 L<strong>in</strong>ear stochastic dierential equations : : : : : : : : : : : : : 60<br />

5.2 Nonl<strong>in</strong>ear stochastic dierential equations : : : : : : : : : : : 63<br />

A The Kalman-Bucy Filter 70<br />

A.1 The cont<strong>in</strong>uous case : : : : : : : : : : : : : : : : : : : : : : : 70<br />

A.1.1 The general lter<strong>in</strong>g problem : : : : : : : : : : : : : : 70<br />

A.1.2 The l<strong>in</strong>ear lter<strong>in</strong>g problem : : : : : : : : : : : : : : : 71<br />

A.2 The discrete case : : : : : : : : : : : : : : : : : : : : : : : : : 72<br />

B Numerical Methods 73<br />

B.1 The Euler Scheme : : : : : : : : : : : : : : : : : : : : : : : : : 73<br />

B.2 The Milste<strong>in</strong> Scheme : : : : : : : : : : : : : : : : : : : : : : : 75<br />

C The It^o Formula 76<br />

C.1 The one-dimensional case : : : : : : : : : : : : : : : : : : : : : 76<br />

C.2 The multi-dimensional case : : : : : : : : : : : : : : : : : : : 76<br />

D The Radon-Nikodym Theorem 78<br />

Bibliography 79<br />

2


Introduction<br />

Over the last few years various new derivative <strong>in</strong>struments have emerged <strong>in</strong><br />

nancial markets lead<strong>in</strong>g to a demand for versatile estimation methods for<br />

relevant model parameters. Typical examples <strong>in</strong>clude volatility, covariances<br />

and correlations. In this paper we give a survey on statistical estimation<br />

methods for both discrete as well as cont<strong>in</strong>uous time stochastic models.<br />

The text is organized as follows: <strong>in</strong> Chapter 1 we rst motivate a model <strong>in</strong><br />

which volatility ofa price process is assumed to follow astochastic process.<br />

Out of the variety ofcont<strong>in</strong>uous time stochastic volatility models <strong>in</strong>troduced<br />

<strong>in</strong> the literature wechoose two empirically relevant ones, that is an arithmetic<br />

Ornste<strong>in</strong>-Uhlenbeck process and a square root diusion model. Those two<br />

models serve as reference models <strong>in</strong> some of the later chapters. As for discrete<br />

time stochastic volatility models, we concentrate on log-AR(p) processes,<br />

ARCH(q) and GARCH(p; q) processes all of which will be discussed <strong>in</strong> detail<br />

<strong>in</strong> Section 3.2.<br />

Approximations of diusion models by discrete time models and vice versa<br />

are described <strong>in</strong> Chapter 2. The convergence result on which these approximations<br />

are based is stated. As applications the diusion limits of GARCH(1,1)-<br />

type processes and of AR(1) E-ARCH processes are derived. In addition, we<br />

present strategies for approximat<strong>in</strong>g diusions and briey compare dierent<br />

discretizations of a diusion model.<br />

In Section 3.1 estimation of an unknown parameter <strong>in</strong> a general diusion<br />

process is discussed. For cont<strong>in</strong>uously observed processes we develop the classical<br />

theory of maximum likelihood estimation <strong>in</strong>clud<strong>in</strong>g properties like consistency<br />

and asymptotic normality. In the case of discrete observations the<br />

maximum likelihood estimator reta<strong>in</strong>s all the 'good' properties if the transition<br />

densities of the process are known. However, <strong>in</strong> most cases relevant for<br />

nance, we do not have explicit expressions for the underly<strong>in</strong>g transition densities<br />

and the use of approximate likelihood functions leads to <strong>in</strong>consistent<br />

estimators when the time between observations is bounded away from zero.<br />

We describe alternative estimation methods, as there are mart<strong>in</strong>gale estimat-<br />

3


<strong>in</strong>g functions or methods based on approximat<strong>in</strong>g the transition densities. As<br />

a result, we are able to obta<strong>in</strong> consistent and asymptotically normal estimators.<br />

The methods <strong>in</strong>troduced will be tested on some examples. In Section<br />

3.2 discrete AR, ARCH and GARCH models and their asymptotic properties<br />

are discussed. F<strong>in</strong>ally, a brief description of Bayesian estimation is given.<br />

Concern<strong>in</strong>g nonparametric estimation <strong>in</strong> diusions, <strong>in</strong> Section 4.1 we discuss<br />

estimation of a probability density and estimation of an unknown signal. Section<br />

4.2 presents two dierent nonparametric techniques for discrete models,<br />

namely kernel estimators and Fourier type estimators.<br />

In Chapter 5 we show how to solve l<strong>in</strong>ear stochastic dierential equations<br />

explicitly and give some classes of nonl<strong>in</strong>ear stochastic dierential equations<br />

that can be reduced to l<strong>in</strong>ear ones.<br />

Some necessary background material is briey <strong>in</strong>troduced <strong>in</strong> the appendices.<br />

An extensive bibliography guides the <strong>in</strong>terested reader to further published<br />

material.<br />

Acknowledgements:<br />

In work<strong>in</strong>g out the material presented, I was fortunate to have been able<br />

to discuss with various people aspects of the text. I am especially grateful<br />

to Prof. P. Embrechts for his <strong>in</strong>valuable advice and constant encouragement.<br />

I also take great pleasure <strong>in</strong> thank<strong>in</strong>g A.N. Shiryaev, A. Dassios, G. Parker<br />

and H. Schurz for their much appreciated help. I thank M. Kafetzaki Boulamatsis<br />

very much for all the helpful discussions and her support dur<strong>in</strong>g the<br />

preparation of the manuscript. The fruitful discussions with<strong>in</strong> Risklab also<br />

were a constant source of <strong>in</strong>spiration.<br />

4


Chapter 1<br />

Stochastic Volatility <strong>Models</strong><br />

1.1 Cont<strong>in</strong>uous time<br />

The value of a stock price S is supposed to follow the process<br />

dS t = S t ( t dt + t dW t ); (1.1)<br />

where W t is a Wiener process and and are functions of t. Consider<strong>in</strong>g<br />

the logarithm of the stock price H ln S and us<strong>in</strong>g It^o's formula we derive<br />

the process followed by H<br />

dH t =( t , 2 t<br />

2 ) dt + t dW t : (1.2)<br />

In the follow<strong>in</strong>g consider the process H t <strong>in</strong>stead of the equivalent process S t .<br />

For many purposes this leads to a more tractable dierential equation. Also<br />

from a statistical po<strong>in</strong>t of view do the <strong>in</strong>crements of H t (i.e. the so-called<br />

log-returns) behave <strong>in</strong> a 'nicer' way as the <strong>in</strong>crements of S t .<br />

In the case t and t , S t follows geometric Brownian motion. This<br />

is assumed <strong>in</strong> the Black-Scholes option pric<strong>in</strong>g model which has been used<br />

as an eective tool for the pric<strong>in</strong>g of options for more than two decades.<br />

When compar<strong>in</strong>g the calculated option values us<strong>in</strong>g the Black-Scholes model<br />

with the option prices there is usually a dierence. Among these biases <strong>in</strong><br />

model prices the well-known \smile"-eect is important: the Black-Scholes<br />

option pric<strong>in</strong>g formula tends to underprice out-of-the-money-options and to<br />

overprice at-the-money-options, that means implied volatility changes with<br />

the strik<strong>in</strong>g price (see Ball [1]). This eect arises from the assumption <strong>in</strong><br />

the Black-Scholes model that volatility is a known constant (\I sometimes<br />

5


wonder why people still use the Black-Scholes formula, s<strong>in</strong>ce it is based on<br />

such simple assumptions { unrealistically simple assumptions". Black [11]).<br />

In real life volatility is not constant at all. It is non-uniform, that means on<br />

days with major economic events volatility is usually higher than on other<br />

days (especially on non-trad<strong>in</strong>g days). In addition, sometimes stock prices<br />

jump which can be thought of as a sudden large <strong>in</strong>crease <strong>in</strong> the stock's<br />

volatility. Furthermore many economic time series have a mean-reversion<br />

tendency: when the value of a random variable reaches a very high level then<br />

it will rather go down than up and vice versa. Put dierently, it tends away<br />

from extremely high or low values and reverts to some long-term mean.<br />

All these observations lead to the assumption that volatility is a random variable<br />

and a lot of stochastic volatility models have been recently <strong>in</strong>troduced<br />

<strong>in</strong> the literature.<br />

Now the question arises <strong>in</strong> which way stochastic volatility 2 t = 2 (t; !) can<br />

be modeled. The standard framework assumes the volatility specication<br />

dv = m(v) dt + k(v) d ~W t ; (1.3)<br />

with v = 2 t or v = ln 2 t and ~ W t a Wiener process. Out of the variety of<br />

stochastic volatility models we will consider the follow<strong>in</strong>g two:<br />

I: dH t = ( t , 2 t<br />

2 ) dt + t dW t<br />

dv t = ( , v t ) dt + d~W t ; with v t =ln 2 t (1.4)<br />

II: dH t = ( t , 2 t<br />

2 ) dt + t dW t<br />

dv t = b (a , v t ) dt + c p v t d ~W t ; with v t = 2 t (1.5)<br />

where ; ; ; a; b and c are xed constants, W t and W ~ t are <strong>in</strong>dependent<br />

Wiener processes.<br />

In the model (1.4) volatility, ormore exactly ln 2 , is governed by an arithmetic<br />

Ornste<strong>in</strong>-Uhlenbeck (or rst order autoregressive) process with a meanrevert<strong>in</strong>g<br />

tendency. A large and grow<strong>in</strong>g literature treats this case as an<br />

empirically relevant one (see Bollerslev, Engle and Nelson [14], Ste<strong>in</strong> and<br />

Ste<strong>in</strong> [72], Wigg<strong>in</strong>s [75]). In (1.5) a square root diusion model for stochastic<br />

volatility is suggested. For this model see Ball [1] and literature given there.<br />

One further reason why we have restricted our attention to the models (1.4)<br />

and (1.5) above is to be able to show explicitly how certa<strong>in</strong> statistical estimation<br />

procedures are to be implemented and also compared and contrasted<br />

<strong>in</strong> a specic nance context. Most of the techniques <strong>in</strong>troduced do apply to<br />

much more general set-ups.<br />

6


1.2 Discrete time<br />

In the previous section we dealt with cont<strong>in</strong>uous time stochastic volatility<br />

models based on systems of stochastic dierential equations. In the discrete<br />

time approach we consider time series models which are systems of stochastic<br />

dierence equations. In analogy to the cont<strong>in</strong>uous time approach we assume<br />

the stock price S n , S n > 0, to follow the process<br />

S n = S n ( n + n z n ); (1.6)<br />

with and dependent on time and z n N (0; 1). We consider the logarithm<br />

of the stock price H n ln S n . With the notation<br />

we derive the process<br />

H n = y 1 + :::+ y n ;<br />

H n = y n =( n , 2 n<br />

2 )+ nz n : (1.7)<br />

We will concentrate on three dierent discrete stochastic volatility models.<br />

First we consider a rather simple model <strong>in</strong> which the conditional variance of<br />

the time series fh n g follows a logarithmic AutoRegressive (log-AR) process<br />

(see Jacquier, Polson and Rossi [42]).<br />

I. log-AR(p)/stochastic volatility:<br />

v n = 0 + 1 v n,1 + :::+ p v n,p + ~z n ; (1.8)<br />

with v n =ln 2 n and (z n ; ~z n ) <strong>in</strong>dependent N (0; 1).<br />

The famous AutoRegressive Conditional Heteroskedastic (ARCH) model proposed<br />

by Engle [22] will be the second model to look at. The key <strong>in</strong>sight offered<br />

by the ARCH model lies <strong>in</strong> the dist<strong>in</strong>ction between the conditional and<br />

the unconditional second order moments. While the unconditional covariance<br />

matrix for the variables of <strong>in</strong>terest may be time <strong>in</strong>variant, the conditional<br />

variances and covariances often depend non-trivially on the past.<br />

II. ARCH(q):<br />

2 n = 0 + 1 y 2 n,1 + :::+ qy 2 n,q : (1.9)<br />

A long lag length q and a large number of parameters are often needed <strong>in</strong> empirical<br />

applications of ARCH(q). To avoid this problem Bollerslev <strong>in</strong>troduced<br />

the Generalized ARCH (GARCH) model [12].<br />

III. GARCH(p; q):<br />

2 n = 0 + 1 y 2 n,1<br />

+ :::+ q y 2 n,q + 1 2 n,1<br />

+ :::+ p 2 n,p: (1.10)<br />

For a discussion of AR, ARCH and GARCH models see section 3.2.<br />

7


Chapter 2<br />

Approximations<br />

2.1 Diusion models by discrete time models<br />

The theory of nance is ma<strong>in</strong>ly treated <strong>in</strong> terms of stochastic dierential<br />

equations, whereas <strong>in</strong> practice observations can be made only at discrete time<br />

<strong>in</strong>tervals. We want to bridge this gap by approximat<strong>in</strong>g diusion models by<br />

discrete time models and vice versa.<br />

In this section we will deal with discrete time models as diusion approximations.<br />

The advantage of approximat<strong>in</strong>g a diusion model by a sequence<br />

of discrete time models ma<strong>in</strong>ly lies <strong>in</strong> estimat<strong>in</strong>g and forecast<strong>in</strong>g. Whereas<br />

the likelihood function of a discrete time model is easy to compute and to<br />

maximize, there may arise problems <strong>in</strong> deriv<strong>in</strong>g the likelihood of a discretely<br />

observed nonl<strong>in</strong>ear stochastic dierential equation system, e.g. when the diffusion<br />

coecient depends on an unknown parameter or when there are unobservable<br />

state variables like conditional variance (see [54], [14]). In the case<br />

of a diusion model with cont<strong>in</strong>uous observations, see 3.1.1 for the denition<br />

of the maximum likelihood estimator, its properties and the mentioned<br />

problems like parameter dependence. When the diusion model is observed<br />

at discrete time po<strong>in</strong>ts and the transition densities are unknown, discretiz<strong>in</strong>g<br />

the cont<strong>in</strong>uous time log-likelihood leads to the problem of <strong>in</strong>consistent estimators.<br />

This problem is described <strong>in</strong> 3.1.2 where <strong>in</strong> addition three dierent<br />

approaches are given to overcome this problem. The latter also <strong>in</strong> the case<br />

of <strong>in</strong>complete observations.<br />

In summary rather than estimat<strong>in</strong>g and forecast<strong>in</strong>g with a diusion model<br />

observed at discrete time po<strong>in</strong>ts it may bemuch easier to use a discrete time<br />

model directly, e.g. an ARCH model, as basic approximation.<br />

8


First of all, as a simple though important example we approximate a Wiener<br />

process us<strong>in</strong>g the central limit theorem. Let 1 ; 2 ;::: be a sequence of iid<br />

random variables dened on a probability space (; B;P). Suppose E n =0,<br />

Var n = 2 and S n = P n<br />

i=1<br />

i . Via the central limit theorem we have that the<br />

distribution of S n =( p n) converges <strong>in</strong> distribution to the normal distribution<br />

as n tends to <strong>in</strong>nity. By means of the partial sums S n we dene piecewise<br />

l<strong>in</strong>ear functions X n on the <strong>in</strong>terval [0; 1]. For each n and each ! the function<br />

X n (;!) is l<strong>in</strong>ear on each <strong>in</strong>terval [(i , 1)=n; i=n], 1 i n and has the<br />

value S i (!)=( p n) at the po<strong>in</strong>t i=n with start<strong>in</strong>g po<strong>in</strong>t S 0 (!) = 0. Hence,<br />

we construct a function X n of the form<br />

X n (t; !) = 1<br />

p n S i,1(!)+<br />

t , (i , 1)=n<br />

1=n<br />

1<br />

p n i(!);<br />

for t 2 h i,1<br />

n ; i ni<br />

. For each !, Xn (;!) is a cont<strong>in</strong>uous function on [0; 1], i.e.<br />

X n (;!) 2 C[0; 1]. Denote by P n the distribution of X n (;!)<strong>in</strong>C[0; 1]. Then<br />

P n converges weakly to a Wiener measure W<br />

P n ,! W<br />

(see [10], x2). Note that this convergence result can be viewed <strong>in</strong> two dierent<br />

ways: on the one hand X n can be seen as an approximation to a Wiener<br />

process, and on the other hand the Wiener process as an approximation to<br />

X n .We will come back to this po<strong>in</strong>t later.<br />

The ma<strong>in</strong> statement of this chapter is a convergence theorem, where general<br />

conditions are given for a sequence of discrete time Markov processes<br />

to converge weakly to an It^o process. These conditions were developed by<br />

Stroock and Varadhan (see [73], x11.2). As mentioned <strong>in</strong> the example above,<br />

note that given a convergence result like this, we may use this fact to tackle<br />

the approximation problem of cont<strong>in</strong>uous time by discrete models and vice<br />

versa (see section 2.2). For a presentation of the convergence theorem and<br />

the proof we also refer to Nelson [54].<br />

The Convergence Theorem<br />

For h > 0 arbitrarily, consider a discrete time Markov process X 0 ;X h ,<br />

X 2h ;:::;X kh denoted by fX kh g, where X kh takes values <strong>in</strong> IR n for all k.<br />

Assume that we know the transition probabilities of fX kh g and the distribution<br />

of the <strong>in</strong>itial random po<strong>in</strong>t X 0 .We construct the cont<strong>in</strong>uous time process<br />

fX (h)<br />

t g from the discrete time process fX kh g by mak<strong>in</strong>g X (h)<br />

t a step function<br />

with jumps at times h; 2h; 3h::: and values X (h)<br />

t = X kh almost surely for<br />

kh t


1) fX kh g: the family of discrete time processes fX kh g depend<strong>in</strong>g on h and<br />

on the discrete time <strong>in</strong>dex kh, k 2 IN.<br />

2) fX (h)<br />

t g: the family of cont<strong>in</strong>uous time processes fX (h)<br />

t g that are step<br />

functions constructed from the discrete time process fX kh g as described<br />

above. Observe that fX (h)<br />

t g depends both on h and on the cont<strong>in</strong>uous<br />

time <strong>in</strong>dex t 0.<br />

3) fX t g: the limit<strong>in</strong>g process fX t g, to which under some conditions as will<br />

be shown <strong>in</strong> the theorem below, the sequence of processes fX (h)<br />

t g for<br />

h # 0weakly converges.<br />

Instead of giv<strong>in</strong>g the explicit mathematical conditions needed <strong>in</strong> the follow<strong>in</strong>g<br />

theorem (for details see Nelson [54], pp.10-15) we briey describe<br />

and <strong>in</strong>terpret them. Functions a h (x; t) and b h (x; t) are dened as measures<br />

of the second moment and the drift, respectively, and are required to converge<br />

uniformly on compact sets to well-behaved cont<strong>in</strong>uous functions a(x; t)<br />

and b(x; t). Moreover, there shall exist a cont<strong>in</strong>uous function (x; t) with<br />

a(x; t) = (x; t)(x; t) T . The sample paths of the limit process X t are assumed<br />

to be cont<strong>in</strong>uous with probability one. We require the probability<br />

measures of the <strong>in</strong>itial po<strong>in</strong>ts X (h)<br />

0 to converge to a limit measure 0 as h # 0<br />

and thus have determ<strong>in</strong>ed the <strong>in</strong>itial distribution 0 of X t . F<strong>in</strong>ally, certa<strong>in</strong><br />

conditions are needed so that 0 , a(x; t) and b(x; t) uniquely dene the distribution<br />

of the limit process X t .<br />

Theorem 1 Under the assumptions <strong>in</strong>dicated above the family fX (h)<br />

t g converges<br />

weakly as h # 0 to the process fX t g dened by the stochastic dierential<br />

equation<br />

X t = X 0 +<br />

Z t<br />

0<br />

b(X s ;s) ds +<br />

Z t<br />

0<br />

(X s ;s) dW (n)<br />

s ;<br />

where W (n)<br />

t is an n-dimensional Wiener process <strong>in</strong>dependent of X 0 , and fX t g<br />

has <strong>in</strong>itial distribution 0 . The process fX t g exists, is distributionally unique<br />

and rema<strong>in</strong>s with probability one nite <strong>in</strong> nite time <strong>in</strong>tervals.<br />

We remark that the distribution of X t does not depend on the choice of<br />

, see assumptions above. Moreover, note that convergence <strong>in</strong> distribution<br />

means convergence regard<strong>in</strong>g the whole sample path, that means the probability<br />

laws generat<strong>in</strong>g the sample paths fX (h)<br />

t g converge to the probability<br />

law generat<strong>in</strong>g the sample path of fX t ,0 t T g for any 0 T < 1. Furthermore,<br />

we remark that Nelson [54] shows the same result based on simpler<br />

10


conditions for the cont<strong>in</strong>uity of the sample paths of X t and the denitions<br />

for a h and b h . Later we will refer to these conditions as Nelson-conditions.<br />

Now as an application of Theorem 1 Nelson [54] nds and analyzes the diffusion<br />

limit of GARCH(1,1)-type processes.<br />

The GARCH(1,1)-M process of Engle and Bollerslev (see [13]) is dened as<br />

where f" t gN(0; 1) iid.<br />

Y t = Y t,1 + c 2 t + t " t ; (2.1)<br />

2 t+1<br />

= + 2 t<br />

h<br />

+ "<br />

2<br />

t<br />

i<br />

; (2.2)<br />

Now our purpose is to reduce the length of the time <strong>in</strong>tervals more and more.<br />

The parameters , and of the system may depend on h. The drift term<br />

<strong>in</strong> (2.1) and the variance of " t are made proportional to h:<br />

Y kh = Y (k,1)h + h ckh 2 + kh " kh ; (2.3)<br />

<br />

(k+1)h 2 = h + kh<br />

2 h + <br />

h<br />

h "2 kh ; (2.4)<br />

where f" kh gN(0;h) iid and as for the <strong>in</strong>itial distribution (k =0)wehave<br />

P h (Y 0 ; 2 0) 2 A i = h (A)<br />

and (h) 2<br />

t are con-<br />

for all A 2 B(IR 2 ). The cont<strong>in</strong>uous time processes Y (h)<br />

t<br />

structed by<br />

Y (h)<br />

t Y kh and (h) 2<br />

t <br />

2<br />

kh<br />

for kh t 0:<br />

h#0<br />

11


By means of Theorem 1we now obta<strong>in</strong> a diusion limit of the form<br />

d Y t<br />

2 t<br />

!<br />

=<br />

!<br />

!<br />

c 2<br />

dt + 2 0<br />

d W !<br />

1;t<br />

; (2.5)<br />

, 2 0 2 4 W 2;t<br />

where W 1;t and W 2;t are <strong>in</strong>dependent Wiener processes, <strong>in</strong>dependent of<br />

(Y 0 ; 2 0) and with <strong>in</strong>itial distribution<br />

for all A 2B(IR 2 ), see [54], p.17.<br />

P [(Y 0 ; 2 0) 2 A] = 0 (A)<br />

At this po<strong>in</strong>t we see that the two sections 2.1 and 2.2 are closely related to<br />

each other. There exists no closed form for the stationary distribution of the<br />

system (2.1,2.2) <strong>in</strong> discrete time, but <strong>in</strong> cont<strong>in</strong>uous time we are able to derive<br />

the stationary distribution of 2 t <strong>in</strong> (2.5) by us<strong>in</strong>g the results of Wong [76]<br />

(see Nelson [54]). The idea is to draw a conclusion concern<strong>in</strong>g the stationary<br />

distribution from cont<strong>in</strong>uous time to discrete time by means of Theorem 1,<br />

that is, we use Theorem 1 <strong>in</strong> the other direction. This is the ma<strong>in</strong> subject of<br />

section 2.2 and therefore regard<strong>in</strong>g <strong>in</strong>ference from cont<strong>in</strong>uous time to discrete<br />

time we refer to the next section, where we also consider another example, a<br />

so called E-ARCH model, proposed by Nelson.<br />

Cont<strong>in</strong>u<strong>in</strong>g this section, we come to the subject of strategies for approximat<strong>in</strong>g<br />

diusions. Approximation schemes like the standard Euler approximation<br />

(see Appendix B.1) or the Milste<strong>in</strong> scheme (see Appendix B.2) are known<br />

strategies for approximat<strong>in</strong>g diusions. As an example we apply the standard<br />

Euler approximation scheme to the model (1.4), which is written here aga<strong>in</strong><br />

for the reader's convenience<br />

d h ln 2 t<br />

dY t = t , 2 t<br />

2<br />

i<br />

!<br />

dt + t dW 1;t<br />

= h , ln 2 t<br />

i<br />

dt + dW2;t : (2.6)<br />

The standard Euler approximation scheme for (2.6), respectively (1.4), is<br />

given by<br />

!<br />

p<br />

y t+h = y t + t , 2 t<br />

h + t h"1;t+h<br />

2<br />

ln 2 t+h<br />

= ln<br />

<br />

<br />

2<br />

t<br />

<br />

+ h<br />

h<br />

, ln<br />

<br />

<br />

2<br />

t<br />

i<br />

+ <br />

p<br />

h"2;t+h ; (2.7)<br />

for t = h; 2h; 3h;:::, with (y 0 ; 0 ) xed and " 1;t , " 2;t <strong>in</strong>dependent, N(0; 1).<br />

The diusion (2.6), respectively (1.4), does not satisfy global Lipschitz conditions<br />

and cannot be transformed <strong>in</strong> order to satisfy them. Thus, the known<br />

12


theorems on strong convergence of the Euler approximation (see Appendix<br />

B.1) do not apply. But by means of the convergence theorem above the weak<br />

convergence of the system (2.7) to the diusion model (2.6) can be veried<br />

(for the proof we refer to [14]). Hence, with the standard Euler approximation<br />

scheme we arrive ataway to approximate model (1.4).<br />

As a further remark, consider a sequence of processes fX n;t g dened by<br />

dX n;t = b(X n;t ;t) dt + (X n;t ;t) dW n;t :<br />

Wong and Hajek [76], x4.5, show that, under reasonable conditions, as n<br />

tends to <strong>in</strong>nity, the processes fX n;t g converge to the process fX t g<br />

dX t = b(X t ;t) dt + 1 2 (X t;t) 0 (X t ;t) dt + (X t ;t) dW t ;<br />

where 0 =(@)=(@x). Note the presence of the additional term 1 2 0 which<br />

results from an application of the It^o formula (see Appendix C). Intuitively,<br />

one would expect a limit without the term 1 2 0 , and note that this is the limit<br />

<strong>in</strong> the case where the diusion coecient only depends on time, (x; t) <br />

(t).<br />

Another <strong>in</strong>terest<strong>in</strong>g problem is to nd the <strong>in</strong> some sense best discretization<br />

of a cont<strong>in</strong>uous time model. In the case of the centered Cox-Ingersoll-Ross<br />

model<br />

dr t = ,kr t dt + p r t + dW t ; (2.8)<br />

(see also model (1.5)), we refer to Deelstra and Parker [20] for a discussion<br />

of two dierent discretizations. Besides the 'simple' discretization of (2.8)<br />

r t = r t,1 + "<br />

p<br />

rt,1 + " t ; (2.9)<br />

where " t N (0; 1) i.i.d. and ; " > 0, Deelstra and Parker [20] consider<br />

another discrete representation of (2.8)<br />

r t = r t,1 + "<br />

s<br />

2<br />

1+ r t,1 + " t ; (2.10)<br />

where " t N (0; 1) i.i.d. and ; " > 0. They establish the parametric relations<br />

between the cont<strong>in</strong>uous time model and the discrete models. Whereas <strong>in</strong><br />

the simple model (2.9), the mean and the stationary variance are the same as<br />

<strong>in</strong> the cont<strong>in</strong>uous time model (2.8), the discretization (2.10) has <strong>in</strong> addition<br />

the same covariance as model (2.8) at all sampl<strong>in</strong>g times, that is both the<br />

rst and the second moment are equal. Therefore the covariance equivalent<br />

discretization (2.10) is suggested <strong>in</strong> [20] to be used as a discrete representation<br />

of the cont<strong>in</strong>uous time model (2.8). It is also illustrated <strong>in</strong> [20] that<br />

although model (2.9) looks a lot like model (2.10) both models can produce<br />

signicantly dierent estimates for the parameters.<br />

13


2.2 Discrete time models by diusion models<br />

Approximation of discrete time models by diusion models is a way to simplify<br />

the analysis of discrete models. For <strong>in</strong>stance, the properties of discrete<br />

time models such as consistency and asymptotic normality of maximum likelihood<br />

estimates are rather dicult, and often distributional results are available<br />

for the diusion limit of a sequence of discrete processes that are not<br />

available for the discrete models themselves. In such cases we may be able to<br />

use a convergence theorem as <strong>in</strong> section 2.1 and approximate discrete time<br />

processes, especially ARCH processes, by diusion processes.<br />

We pick up aga<strong>in</strong> the GARCH(1,1) model (2.1,2.2) dealt with <strong>in</strong> the previous<br />

section, where the diusion limit (2.5) of the system (2.1,2.2) was obta<strong>in</strong>ed<br />

via the convergence theorem (with Nelson-conditions). As mentioned, there<br />

exists no closed form for the stationary distribution of the system (2.1,2.2)<br />

<strong>in</strong> discrete time, but we are able to derive the stationary distribution of 2 t <strong>in</strong><br />

the diusion limit (2.5) by us<strong>in</strong>g the results of Wong [76] (see Nelson [54]).<br />

Nelson shows that the stationary distribution of 2 t is an <strong>in</strong>verted gamma and<br />

uses this knowledge <strong>in</strong> cont<strong>in</strong>uous time to obta<strong>in</strong> distributional results for the<br />

discrete time. While the <strong>in</strong>novation process kh " kh , see (2.3), is conditionally<br />

normal distributed, we obta<strong>in</strong> that it is (unconditionally) approximately<br />

distributed as a Student t, <strong>in</strong> the case when the time length between the<br />

observations, i.e. h, is small and kh is large (see [54], p.18f).<br />

Consider a (slightly dierent) model (1.4)<br />

d h ln 2 t<br />

dY t = 2 t dt + t dW 1;t<br />

i<br />

= h , ln 2 t<br />

i<br />

dt + dW2;t ; (2.11)<br />

where W 1;t and W 2;t are Brownian motions with<br />

!<br />

!<br />

dW 1;t<br />

1 C<br />

(dW 1;t dW 2;t )=<br />

12<br />

dt;<br />

dW 2;t C 12 C 22<br />

and C 22 C 2 12. As mentioned <strong>in</strong> the example <strong>in</strong> the previous section (see p.12)<br />

the system (2.11) does not satisfy global Lipschitz conditions and hence the<br />

standard convergence theorems of the Euler approximation do not apply. Nelson<br />

developes a class of discrete time models based on the Exponential ARCH<br />

(E-ARCH) model that converge weakly to a diusion, see [54] pp.20-23. We<br />

will consider such a diusion approximation for the model (2.11). S<strong>in</strong>ce ln( 2 t )<br />

follows a cont<strong>in</strong>uous time AR(1) process, <strong>in</strong> the discrete time model the conditional<br />

variance process is also assumed to be an AR(1) process, that means<br />

14


we assume that it follows an AR(1) E-ARCH process:<br />

h i<br />

ln [Y kh ] = ln Y (k,1)h + hh kh 2 + kh " kh ;<br />

i h h i<br />

= ln khi 2 + , ln <br />

2<br />

kh h + C12 " kh<br />

ln h 2 (k+1)h<br />

+ h j" kh j,(2h=) 1=2i ; (2.12)<br />

where [(C 22 , C 2 12)=(1 , 2=)] 1=2 and " kh iid, N(0;h).<br />

If ln( 2 0) is normally distributed, then the ln( 2 t ) process <strong>in</strong> (2.11) is Gaussian.<br />

Otherwise, if > 0, a Gaussian stationary limit distribution for ln( 2 t ) exists.<br />

Thus, the conditional variance <strong>in</strong> cont<strong>in</strong>uous time is lognormal, and as <strong>in</strong><br />

the GARCH(1,1) example above, from this <strong>in</strong>formation Nelson [54] <strong>in</strong>fers<br />

the distribution of the <strong>in</strong>novation process <strong>in</strong> the discrete time model (2.12).<br />

He shows that <strong>in</strong> the discrete time model (with short time <strong>in</strong>tervals) the<br />

distribution of the <strong>in</strong>novations is approximately a normal-lognormal mixture.<br />

Hence, we derived the approximate distributions of GARCH(1,1) and AR(1)<br />

E-ARCH models for small sampl<strong>in</strong>g <strong>in</strong>tervals by us<strong>in</strong>g the distributional results<br />

available for the diusion limit.<br />

15


Chapter 3<br />

Parameter <strong>Estimation</strong><br />

3.1 Diusion models<br />

Consider the general type of a diusion process X =(X t ) t0 dened as the<br />

solution to the stochastic dierential equation<br />

dX t = b(t; X t ; ) dt + (t; X t ; ) dW t ; X 0 = x 0 ; t 0; (3.1)<br />

where W is an r-dimensional Wiener process, 2 IR p , b(; ; ) :[0; 1)<br />

IR d 7! IR d and (; ; ) : [0; 1) IR d 7! M dr are "nice" 1 functions where<br />

M dr denotes the set of real d r matrices.<br />

The situation will be discussed where a realization of the process X is observed,<br />

but the parameter is unknown to the observer. Hence, we have<br />

to construct suciently good estimators of and exam<strong>in</strong>e their properties.<br />

Deriv<strong>in</strong>g estimates of there are two dierent k<strong>in</strong>ds of observations to dist<strong>in</strong>guish:<br />

cont<strong>in</strong>uous observations of X as considered <strong>in</strong> section 3.1.1, and<br />

discrete observations that will be considered <strong>in</strong> section 3.1.2.<br />

3.1.1 Cont<strong>in</strong>uous observations<br />

Suppose that X satises<br />

dX t = b(; X t )dt + (; X t )dW t ; X 0 = x 0 ; t 0; (3.2)<br />

where for convenience X and W are one-dimensional processes, b and are<br />

smooth functions, and the parameter 2 IR p is to be estimated by<br />

1 b and are Lipschitz cont<strong>in</strong>uous and satisfy a growth condition. Then there exists a<br />

unique strong solution of (3.1), see [65], p.128 and p.136.<br />

16


cont<strong>in</strong>uous observations of X. X is a Markov process. P denotes the law<br />

of the process X on the canonical space = C(IR + ; IR) with the canonical<br />

ltration F t = (X s js t). Denote by P ;t := P jF t the restriction of P to<br />

F t .<br />

First consider the case where (; x) does not depend on . If does not<br />

vanish, the measures P ;t and P 1 ;t for any ; 1 2 , t0, and the process (X t ) is observed <strong>in</strong> the<br />

time <strong>in</strong>terval [0;T]. Then the likelihood process (3.3) <strong>in</strong> T with 1 = 0 equals<br />

"Z T b(X<br />

L ;0<br />

s )<br />

T L T () = exp<br />

0 2 (X s ) dX s , 1 Z T<br />

#<br />

2 b 2 (X s )<br />

2 0 2 (X s ) ds ; (3.5)<br />

see e.g. [51], x17, or [43], p.76, or [39]. In the follow<strong>in</strong>g consider a constant<br />

diusion term , say 1. For convenience we take the logarithm <strong>in</strong> (3.5)<br />

and obta<strong>in</strong> the logarithmic likelihood (log-likelihood) function<br />

ln L T () =<br />

Z T<br />

0<br />

b(X s )dX s , 1 2<br />

Z T<br />

0<br />

2 b 2 (X s )ds: (3.6)<br />

Maximiz<strong>in</strong>g L T , or equivalently ln L T , with respect to we obta<strong>in</strong> the so<br />

called Maximum Likelihood Estimator (MLE) ^ T based on cont<strong>in</strong>uous observations<br />

of X <strong>in</strong> the <strong>in</strong>terval 0 t T . We remark that by denition of the<br />

likelihood function the measure P 1 , that dom<strong>in</strong>ates the measure P (that<br />

is P


which <strong>in</strong> the case (3.6) has the form<br />

@<br />

@ (ln L T ()) =<br />

Z T<br />

0<br />

b(X s )dX s , <br />

Solv<strong>in</strong>g @ @ (ln L T ()) = 0, we obta<strong>in</strong> the MLE<br />

Z T<br />

0<br />

b 2 (X s )ds:<br />

^ T =<br />

R T<br />

R<br />

0<br />

T<br />

0<br />

b(X s )dX s<br />

b 2 (X s )ds<br />

and hence, us<strong>in</strong>g equality (3.4), we have<br />

^ T = 0 +<br />

where 0 denotes the true value of the parameter.<br />

Now we derive some properties of the MLE ^ T .<br />

R T<br />

0<br />

b(X s )dW R<br />

s<br />

T<br />

0<br />

b 2 (X s )ds ; (3.7)<br />

First, we treat the problem of the bias of an estimator. The bias <strong>in</strong> ^ T as an<br />

estimator of 0 is dened as<br />

b^ T<br />

( 0 )=E 0 (^ T , 0 );<br />

where E 0<br />

denotes the expectation with respect to P 0 . By (3.7) we have<br />

"R T<br />

#<br />

0<br />

b(X s )dW s<br />

E 0<br />

^T = 0 + E 0 : (3.8)<br />

b 2 (X s )ds<br />

R T<br />

0<br />

Note that <strong>in</strong> the case of a constant drift coecient b(X t ) b<br />

E <br />

bW T<br />

b 2 T = 1<br />

bT E W T =0;<br />

thus E 0<br />

^T = 0 , so that ^ T is unbiased. In all other cases the estimator is<br />

biased and <strong>in</strong> the case (3.6) we have under some regularity conditions the<br />

equality<br />

b^ T<br />

( 0 )=<br />

d Z T<br />

,1<br />

E 0 b 2 (X t )dt!<br />

;<br />

d 0 0<br />

(see [51], x17.2 and x17.3, Theorem 17.2 and 17.3).<br />

Now we turn to the properties of consistency and asymptotic normality.<br />

Assum<strong>in</strong>g some natural regularity conditions (see [38], p.83 and p.90, or [17],<br />

p.500f, or [48], p.95), the stochastic dierential equation (3.4) has an ergodic<br />

solution. For the ergodic theory we refer to e.g. [32], x18. Then the stationary<br />

18


density ~p of the process solves the (determ<strong>in</strong>istic) stationary Fokker-Planck<br />

equation, which <strong>in</strong> the above case reduces to<br />

Ergodicity implies<br />

and<br />

1<br />

lim<br />

T !1 T<br />

d 2<br />

d<br />

dx [b(x)~p(x)] , 1 ~p(x) =0:<br />

2 dx2 1<br />

Z T<br />

lim b(X t )dW t =0 P 0 a.s.<br />

T !1 T 0<br />

Z T<br />

0<br />

b 2 (X t )dt =<br />

Z 1<br />

,1 b2 (x)~p(x)dx P 0 a.s.<br />

Hence we conclude that it is possible to achieve ultimate arbitrary precision<br />

of the MLE ^ T <strong>in</strong> (3.7) by <strong>in</strong>nitely <strong>in</strong>creas<strong>in</strong>g T , that means the MLE ^ T is<br />

weakly consistent:<br />

h i<br />

lim <br />

T !1 0 j^T , 0 j >" =0 (3.9)<br />

for all ">0.<br />

Furthermore, under these regularity conditions the MLE is asymptotically<br />

normal, that means as T !1the dierence p T (^ T , 0 ) is asymptotically<br />

normal with parameters (0; 2 ( 0 )), where<br />

Z 1<br />

,1<br />

2 ( 0 )=<br />

,1 b2 (x)~p(x)dx ;<br />

(see e.g. [48], p.95f, or [45], p.243).<br />

We remark that this convergence is uniformly for all on a compact set<br />

K , (see e.g. [48], p.94f). Moreover, note that under the same regularity<br />

conditions the MLE is also consistent and asymptotic normal if the drift term<br />

is nonl<strong>in</strong>ear (aga<strong>in</strong> see e.g. [48], p.94f). F<strong>in</strong>ally, we remark that 2 ( 0 ) equals<br />

I ,1 ( 0 ), where I denotes the so called Fisher <strong>in</strong>formation<br />

I() = Var<br />

" @<br />

@ ln L T ()<br />

2<br />

= E 4 @ @ ln L T ()<br />

#<br />

! 2<br />

3<br />

5 ;<br />

which can alternatively be computed by<br />

" @<br />

2<br />

I() =,E<br />

@ ln L 2 T ()<br />

19<br />

#<br />

;


(see [49], p.249f).<br />

Asymptotic normality can be used to determ<strong>in</strong>e condence <strong>in</strong>tervals for <br />

and to determ<strong>in</strong>e a 'suitable' value of T .<br />

Example 1. a) The regularity conditions are satised <strong>in</strong> the case of observations<br />

com<strong>in</strong>g from<br />

dX t = ,X t dt + dW t ; X 0 =0; 0 t T; (3.10)<br />

where 2 (; ), >0, that is , 0, the process has no<br />

stationary distribution. Nevertheless we can show that the MLE ^ T is consistent.<br />

Furthermore, e 0T (^ T , 0 ) is asymptotically normal with parameters<br />

(0; 40) 2 (see [48], p.100).<br />

c) In the case = 0 the MLE ^ T is consistent, but not asymptotically normal<br />

(see [48], p.100).<br />

We remark that <strong>in</strong> the discrete case (section 3.2) an analogous dist<strong>in</strong>ction<br />

regard<strong>in</strong>g parameter values is made, see p.43.<br />

After hav<strong>in</strong>g treated the special case of a drift term l<strong>in</strong>ear <strong>in</strong> we give a<br />

short remark to the general case<br />

dX t = b(; X t )dt + dW t ;<br />

where b(; x) may be a nonl<strong>in</strong>ear function. By Taylor expansion for we have<br />

b(; x) =b( 0 ;x)+( , 0 ) d d b( 0;x)+O(( , 0 ) 2 );<br />

where 0 aga<strong>in</strong> denotes the true value of parameter . Hence b(; x) can be<br />

approximated by a term l<strong>in</strong>ear <strong>in</strong> . S<strong>in</strong>ce <strong>in</strong> practice a 'good guess' of 0<br />

often is available, then with the above considerations it is sucient only to<br />

consider the already treated case: a drift term l<strong>in</strong>ear <strong>in</strong> . This closes our<br />

considerations about ML estimation.<br />

20


In the follow<strong>in</strong>g we give some remarks to the case where the diusion term<br />

depends on an unknown parameter . As mentioned before <strong>in</strong> this case we<br />

cannot estimate by us<strong>in</strong>g Maximum Likelihood theory (see p.17).<br />

As a rst estimat<strong>in</strong>g problem we deal with the process<br />

dX t = ()dW t ; 0 t T: (3.11)<br />

In order to estimate we discretize the process and derive its limit. Consider<br />

partitions n = t (n)<br />

0 ;:::;t m(n) of [0;T] constructed <strong>in</strong> such a way that<br />

P 1n=1<br />

sup k jt (n)<br />

k+1<br />

, t (n)<br />

k j < 1 for n ,! 1. Then for the discretized Wiener<br />

process W we have the convergence result<br />

mX<br />

k=1<br />

jW t<br />

(n)<br />

k<br />

, W (n) t<br />

j 2 ,! T;<br />

k,1<br />

<strong>in</strong> probability for n ,! 1(see [15], p.262f). Hence for (3.11) we obta<strong>in</strong><br />

mX<br />

k=1<br />

jX t<br />

(n)<br />

k<br />

, X (n) t<br />

j 2 ,! T 2 ();<br />

k,1<br />

<strong>in</strong> probability for n ,! 1, thus we are able to give an arbitrarily precise<br />

estimate of 2 (). Notice that we do not obta<strong>in</strong> a direct estimate of .<br />

Next, if the unknown parameter splits <strong>in</strong>to two parts ( 1 ; 2 ) <strong>in</strong> the follow<strong>in</strong>g<br />

way<br />

dX t = b(t; 1 )dt + ( 2 )dW t ; (3.12)<br />

wewant to estimate the part 2 <strong>in</strong> the diusion coecient. Denot<strong>in</strong>g X (n)<br />

k<br />

X (n) t<br />

, X (n)<br />

k t<br />

and W (n)<br />

k W (n)<br />

k,1<br />

t<br />

, W (n)<br />

k t<br />

we obta<strong>in</strong> by discretiz<strong>in</strong>g (3.12)<br />

k,1<br />

mX<br />

k=1<br />

<br />

X<br />

(n)<br />

k<br />

2 <br />

, (2 )W (n) 2<br />

<br />

k<br />

=<br />

mX<br />

k=1<br />

+2<br />

<br />

b 2 (t k ; 1 ) t (n) 2<br />

k<br />

mX<br />

k=1<br />

b(t k ; 1 )( 2 )W (n)<br />

k t (n)<br />

k ;<br />

which tends to 0 <strong>in</strong> probability as n ,! 1. Thus the drift term is <strong>in</strong>significant<br />

as n ,! 1 and we are able to estimate ( 2 ) just as <strong>in</strong> the previous<br />

case. Then by treat<strong>in</strong>g 2 as known we can estimate 1 via the ML method.<br />

F<strong>in</strong>ally, for the general problem with a diusion coecient also depend<strong>in</strong>g<br />

on X<br />

dX t = (; X t )dW t ; (3.13)<br />

<br />

21


a convergence <strong>in</strong> probability result is given by<br />

mX<br />

k=1<br />

jX t<br />

(n)<br />

k<br />

, X (n) t<br />

j 2 ,!<br />

k,1<br />

Z T<br />

0<br />

2 (; X s )ds; (3.14)<br />

for n ,! 1 (see [24], [51], x4). A h<strong>in</strong>t towards the correctness of (3.14) is<br />

given by the well-known equality<br />

E(X 2 t )=<br />

Z t<br />

0<br />

E( 2 (; X s ))ds:<br />

Aga<strong>in</strong><br />

R<br />

note that we cannot estimate directly but are only able to estimate<br />

T<br />

0<br />

2 (; X t )dt.<br />

3.1.2 Discrete observations<br />

In this section we deal with the (more realistic) situation where the diusion<br />

process X (3.1) has only been observed at not necessarily equally spaced<br />

discrete time po<strong>in</strong>ts 0 = t 0 < t 1 < ::: < t n . In our discussion below, we<br />

basicly follow Kloeden, Schurz, Platen and Srensen [46], Bibby [4, 5, 6, 7],<br />

Bibby and Srensen [8] and Pedersen [58, 59, 60, 61, 62, 63].<br />

For the estimation of from discrete observations of X wehave to dist<strong>in</strong>guish<br />

between two cases: the transition densities of X are known or unknown.<br />

If the transition densities p(s; x; t; y; ) of X are known, e.g. <strong>in</strong> the case of<br />

an Ornste<strong>in</strong>-Uhlenbeck process (see p.36, Example 1), an obvious choice of<br />

an estimator for is the Maximum Likelihood Estimator (MLE) ^ n which<br />

maximizes the likelihood function<br />

L n () =<br />

nY<br />

i=1<br />

or equivalently the log-likelihood function<br />

l n () ln L n () =<br />

p(t i,1 ;X ti,1 ;t i ;X ti ; );<br />

nX<br />

i=1<br />

log (p(t i,1 ;X ti,1 ;t i ;X ti ; )) (3.15)<br />

for , see e.g. [2], p.14. In the case of time-equidistant observations (t i =<br />

i; i = 0; 1 :::;n for some xed > 0) Dacunha-Castelle and Florens-<br />

Zmirou [19] prove consistency and asymptotic normality of ^ n as n ! 1,<br />

<strong>in</strong>dependent of the value of . Unfortunately <strong>in</strong> general the transition densities<br />

of X are unknown.<br />

22


When the transition densities of X are unknown, the usual alternative estimator<br />

is dened by approximat<strong>in</strong>g the log-likelihood function for based on<br />

cont<strong>in</strong>uous observations of X. For this log-likelihood function to be dened,<br />

the diusion coecient (t; x; ) =(t; x) has to be known (see Section 3.1.1,<br />

p.17). Under some assumptions (see [51] x7) the log-likelihood function for <br />

based on cont<strong>in</strong>uous observations of X <strong>in</strong> [0;t n ] can be written <strong>in</strong> terms of<br />

<strong>in</strong>tegrals<br />

l c t n<br />

() =<br />

Z tn<br />

0<br />

, 1 2<br />

b (s; X s ; ) T (s; X s )(s; X s ) T ,1<br />

dXs<br />

Z tn<br />

0<br />

b (s; X s ; ) T (s; X s )(s; X s ) T ,1<br />

b (s; Xs ; ) ds; (3.16)<br />

and the usual approximation of these <strong>in</strong>tegrals leads to the approximate loglikelihood<br />

function for based on discrete observations of X<br />

~ ln () =<br />

nX<br />

i=1<br />

, 1 2<br />

with the notation<br />

b (t i,1 ;X ti,1 ; ) T i,1 (X ti , X ti,1 )<br />

nX<br />

i=1<br />

b (t i,1 ;X ti,1 ; ) T i,1 b (t i,1 ;X ti,1 ; )(t i , t i,1 ); (3.17)<br />

i,1 (t i,1 ;X ti,1 ) (t i,1 ;X ti,1 ) T ,1<br />

:<br />

When the diusion coecient also depends on an unknown parameter, the<br />

question arises how the parameter is to be estimated. If divides <strong>in</strong>to two<br />

parts =( 1 ; 2 ), such that b(; ; ) =b(; ; 1 ) and (; ; ) is known up to<br />

the scalar factor 2 , that means (; ; ) = 2 ~(; ), we may avoid the problem<br />

of parameter dependence. In this case 2 2<br />

can be estimated <strong>in</strong> advance by<br />

a quadratic variance-like formula (see Florens-Zmirou [25]), and by <strong>in</strong>sert<strong>in</strong>g<br />

this estimate <strong>in</strong> (; ; ) = 2 ~(; ), the diusion term can be assumed<br />

"known". Then the estimate of the approximate log-likelihood function ~ l n<br />

can be used to estimate 1 .<br />

In the case where the diusion coecient (; ; ) depends on more generally,<br />

Hutton and Nelson [37] show that the discretized score function correspond<strong>in</strong>g<br />

to lt c n<br />

can under certa<strong>in</strong> regularity conditions still be used to estimate<br />

. That means <strong>in</strong> this case we are able to estimate based on discrete<br />

observations of X as well.<br />

However, estimation methods for discrete observations that arise from the<br />

theory of cont<strong>in</strong>uous observations have the undesirable property that the<br />

estimators are strongly biased unless max 1<strong>in</strong> jt i , t i,1 j is "small". If the<br />

23


time between observations is bounded away from zero, especially <strong>in</strong> the case<br />

of time-equidistant observations with xed, Florens-Zmirou [25] shows that<br />

the estimator ~ n of obta<strong>in</strong>ed by maximiz<strong>in</strong>g the approximate log-likelihood<br />

function ~ l n () is <strong>in</strong>consistent.<br />

To overcome the diculties regard<strong>in</strong>g parameter dependence and the dependence<br />

of ~ l n ()onmax 1<strong>in</strong> jt i ,t i,1 j,we will propose three dierent estimation<br />

approaches. The basic idea of the rst two approaches is to nd good approximations<br />

to the transition densities. In the third approach we construct<br />

mart<strong>in</strong>gale estimat<strong>in</strong>g functions.<br />

Approximation to the transition densities of X by a sequence of<br />

transition densities of approximat<strong>in</strong>g Markov processes<br />

The follow<strong>in</strong>g approach is proposed by Pedersen [58, 60].<br />

We shall derive a sequence (l n;N ()) 1 N=1<br />

of approximations to l n (), that<br />

builds a connection between ~ l n () (see (3.17)) and l n () <strong>in</strong> the follow<strong>in</strong>g sense:<br />

the approximation l n;1 () is a generalization of ~ l n () with no restrictions<br />

on (; ; ) regard<strong>in</strong>g parameter dependence, each l n;N () for N 2 is an<br />

improvement of l n;1 () and as N tends to <strong>in</strong>nity l n;N () converges for each<br />

<strong>in</strong> probability tol n ().<br />

The crucial po<strong>in</strong>t of the approach is to approximate the transition densities<br />

p(s; x; t; y; ) of X by a sequence of transition densities (p N (s; x; t; y; )) 1 N=1<br />

of approximat<strong>in</strong>g Markov processes which converges to p(s; x; t; y; ) as N<br />

tends to <strong>in</strong>nity, and then to dene the approximate log-likelihood functions<br />

l n;N () =<br />

nX<br />

i=1<br />

log (p N (t i,1 ;X ti,1 ;t i ;X ti ; )): (3.18)<br />

In the follow<strong>in</strong>g we will derive the approximat<strong>in</strong>g densities p N (s; x; t; y; ).<br />

We remark that we may relax the Lipschitz assumption made <strong>in</strong> (3.1) for b<br />

and a little bit, namely we only assume b and to be locally Lipschitz<br />

cont<strong>in</strong>uous as dened below.<br />

Under the follow<strong>in</strong>g conditions (A1), (A2) and (A3) which must hold for all<br />

2 , the stochastic dierential equation (3.1) has a weak solution for all x 0<br />

and and has the pathwise-uniqueness property which implies the uniqueness<br />

<strong>in</strong> law (see [65], p.132 and p.151; <strong>in</strong> this context see also [73], x5-x8).<br />

(A1) b and are cont<strong>in</strong>uous <strong>in</strong> t for all x 2 IR d .<br />

24


(A2) b and are local Lipschitz cont<strong>in</strong>uous:<br />

for all 0


For xed 0 s < t, x 2 IR d , 2 and N 2 IN, we consider for k =<br />

0; 1;:::;N the Euler approximation (see Appendix B.1):<br />

k = s + k t , s<br />

N<br />

Y (N)<br />

s = x<br />

Y (N)<br />

k<br />

= Y (N)<br />

k,1<br />

+ t , s<br />

N<br />

b ( k,1;Y (N)<br />

k,1<br />

; )+a ( k,1 ;Y (N)<br />

k,1<br />

; ) 1=2 (W ;s<br />

k<br />

, W ;s<br />

k,1<br />

) :<br />

Under (A1)-(A4) we have<br />

Y (N)<br />

N<br />

<strong>in</strong> L 1 (P ;s;x )asN ,! 1(see [45], x10.2).<br />

Y (N)<br />

t ,! X t (3.21)<br />

Theorem 2 For xed 0 s < t, x 2 IR d , 2 and N 2 IN the distribution<br />

of Y (N)<br />

t under P ;s;x has a density p N (s; x; t; ; ) with respect to the<br />

d-dimensional Lebesgue measure d . For N=1<br />

and for N 2<br />

p 1 (s; x; t; y; ) =(2(t , s)) ,d=2 [det(a(s; x; ))] ,1=2<br />

(<br />

exp , 1 [y , x , (t , s)b(s; x; )]T<br />

2(t , s)<br />

a(s; x; ) ,1 [y , z , (t , s)b(s; x; )]<br />

<br />

; (3.22)<br />

p N (s; x; t; y; ) =E P;s;x<br />

<br />

p1 ( N,1 ;Y (N)<br />

N,1<br />

;t;y; ) : (3.23)<br />

For the proof we refer to Pedersen [58].<br />

Our aim is to show that l n;N () will <strong>in</strong>deed be close to l n () for large values of<br />

N, that is we give a result on the convergence of the approximat<strong>in</strong>g densities<br />

p N (s; x; t; y; ) top(s; x; t; y; ) asN ,! 1.<br />

Theorem 3 In addition to (A2) and (A3) assume for all 2 that<br />

(1) b is cont<strong>in</strong>uous <strong>in</strong> t and <strong>in</strong> x, and<br />

(2) a(t; x; ) a() is positive denite.<br />

26


Then p(s; x; t; y; ) exists, and for all 0 s


After these theoretical considerations we deal with the actual calculation of<br />

l n;N () for large values of N.<br />

In order to maximize l n;N () numerical algorithms usually require the value<br />

of l n;N () <strong>in</strong> a nite numberofpo<strong>in</strong>ts . For N =1 l n;N () is explicitly given,<br />

because p 1 has a closed expression, but for N 2 this is <strong>in</strong> general not the<br />

case.<br />

For calculat<strong>in</strong>g l n;N () for N 2, we have to know how to calculate<br />

p N (s; x; t; y; ) for all 0 s < t, x; y 2 IR d and 2 . Consider<strong>in</strong>g aga<strong>in</strong><br />

equation (3.23)<br />

p N (s; x; t; y; ) =E P;s;x<br />

<br />

p1 ( N,1 ;Y (N)<br />

N,1<br />

;t;y; ) ;<br />

the idea is to calculate p N (s; x; t; y; ) by nd<strong>in</strong>g a good approximation to<br />

the right hand side. Denote by (Uk m ) N,1;M<br />

k=1;m=1<br />

an i.i.d. sample from the r-<br />

dimensional standard normal distribution. Then (Y m ) M m=1<br />

=(YN,1) m M m=1<br />

given<br />

by the Euler approximation<br />

Y m<br />

0<br />

= x; m =1;:::;M<br />

Yk m = Y m<br />

k,1<br />

+ t , s<br />

N<br />

b( k,1;Yk,1; m )+<br />

s<br />

t , s<br />

N<br />

( k,1;Y m<br />

k,1; ) U m k<br />

for k =1;:::;N, 1 and m =1;:::;M, has the same distribution as an i.i.d.<br />

sample of Y (N)<br />

N,1<br />

under P ;s;x .Thus we are able to approximate the right hand<br />

side of (3.23), while we calculate<br />

1<br />

M<br />

MX<br />

m=1<br />

p 1 ( N,1 ;Y m ;t;y; ) (3.24)<br />

by means of the sequence (U m k ) N,1;M<br />

k=1;m=1, for M chosen suciently large, and<br />

thereby we are able to calculate p N (s; x; t; y; ) with any given accuracy.<br />

From a practical po<strong>in</strong>t of view it is convenient to simulate the sample<br />

(Uk m ) N,1;M<br />

k=1;m=1<br />

once (see Kloeden and Platen [45]) and store it. Then it can<br />

be used to calculate p N (s; x; t; y; ) for all values of 0 s


where X 0 = 0 and t 0, the log-likelihood function l n () is unknown. The<br />

approximate log-likelihood functions (l n;N ()) 1 N=1<br />

can be used to estimate ,<br />

because (3.25) has a weak solution for all x 0 and for all >0 which is unique<br />

<strong>in</strong> law and (A4) is satised for all >0.<br />

With 0 = 5, n = 1000 and = 0:1 the Milste<strong>in</strong>-scheme (see Appendix<br />

B.2) with time-step =1000 is used to simulate (X i ) n i=0. For such a simulation<br />

the approximate log-likelihood functions l n;N () are calculated for<br />

N =1; 25; 50; 1000. The estimates that are obta<strong>in</strong>ed are shown <strong>in</strong> Table 3.1.<br />

N 1 25 50 1000<br />

^ n;N 3.98 4.76 5.20 5.04<br />

Table 3.1: Example 1. The estimates correspond<strong>in</strong>g to the functions l n;N ().<br />

Approximation of the transition densities of X by means of the<br />

Kalman lter<br />

The follow<strong>in</strong>g approach is proposed by Pedersen [59, 63].<br />

We consider the stochastic system<br />

X i = D i X i,1 + S i + " i ; i =1;:::;n; (3.26)<br />

where (X i ) n i=0<br />

are random d 1 vectors, (D i ) n i=0<br />

are non-random d d matrices,<br />

(S i ) n i=1<br />

are non-random d 1vectors, X 0 N d (x 0 ;V 0 ), " i N d (0;V i ),<br />

i = 1;:::;n and X 0 , " 1 ;:::, " n are stochastically <strong>in</strong>dependent. The nonrandom<br />

elements <strong>in</strong> (3.26) <strong>in</strong>clud<strong>in</strong>g x 0 and (V i ) n i=0<br />

are given up to the parameter<br />

2 IR p .<br />

Though this chapter is about diusion models, we nevertheless look here at<br />

stochastic dierence equations of the type (3.26), s<strong>in</strong>ce as a particular case<br />

of (3.26) we will consider later discretely observed diusion processes given<br />

as solutions to l<strong>in</strong>ear stochastic dierential equations <strong>in</strong> the narrow sense.<br />

These are equations of the follow<strong>in</strong>g k<strong>in</strong>d<br />

dX t =(A t X t + a t )dt + B t dW t ; X 0 = x 0 ; t 0; (3.27)<br />

where A :[0; 1) 7! M dd , a :[0; 1) 7! IR d and B :[0; 1) 7! M dm (d m)<br />

are determ<strong>in</strong>istic functions of t and W is an m-dimensional Wiener process.<br />

Why (3.27) is a particular case of (3.26), will be expla<strong>in</strong>ed below.<br />

29


Consider<strong>in</strong>g the system (3.26) we dist<strong>in</strong>guish two cases depend<strong>in</strong>g on whether<br />

X can be observed completely or only partially, i.e. wether all or only a few<br />

coord<strong>in</strong>ates of X can be observed. First, if the complete observations of<br />

X 0 ;X 1 ;:::;X n are given, we can use the log-likelihood function to estimate<br />

the parameter , see (3.15).<br />

However, the case where X 0 ;X 1 ;:::;X n can only be observed partially and<br />

possibly with measurement errors often arises <strong>in</strong> practice. As mentioned, with<br />

'partially observed' we do not mean partially <strong>in</strong> time but <strong>in</strong> coord<strong>in</strong>ates of<br />

X i . We assume the observable quantities are Y 0 ;Y 1 ;:::;Y n given by<br />

Y i = T i X i + U i + e i ; i =0; 1;:::;n; (3.28)<br />

where (T i ) n i=0<br />

are non-random kd matrices (k d), (U i ) n i=0<br />

are non-random<br />

k 1 vectors, e i N k (0;W i ) and X 0 ;" 1 ;:::;" m ;e 0 ;e 1 ;:::;e n are stochastically<br />

<strong>in</strong>dependent, i = 0; 1;:::;n. The matrices (T i ) specify the observable<br />

parts of (X i ), the vectors (U i ) are other <strong>in</strong>puts and the vectors (e i ) are measurement<br />

errors. It is obvious that both the case of complete observations,<br />

Y i = X i , and the case of partial observations without measurement errors are<br />

conta<strong>in</strong>ed <strong>in</strong> the general case (3.28).<br />

As an application to the case of <strong>in</strong>complete observations th<strong>in</strong>k of a stochastic<br />

volatility model that can be seen as a multi-dimensional process with the<br />

volatility process as the unobservable coord<strong>in</strong>ates.<br />

In the case where all non-random elements <strong>in</strong> (3.26) and (3.28) <strong>in</strong>clud<strong>in</strong>g x 0 ,<br />

(V i ) n i=0<br />

and (W i ) n i=0<br />

are known, we want to obta<strong>in</strong> the, <strong>in</strong> some sense, best<br />

predictions of X 0 ;X 1 ;:::;X n from given observations of Y 0 ;Y 1 ;:::;Y n . The<br />

conditional expectations (E(X i jY i )) n i=0, respectively (E(X i jY i,1 )) n i=1, are the<br />

best predictors of X i given Y i (Y T<br />

0<br />

;:::;Yi<br />

T ) T , respectively given Y i,1 <br />

(Y T<br />

0<br />

;:::;Yi,1) T T , <strong>in</strong> the sense of m<strong>in</strong>imal variance (see Appendix A.1). They<br />

can be calculated by means of an iterative procedure called the Kalman-Bucy<br />

lter.<br />

The Kalman-Bucy lter and lter<strong>in</strong>g theory <strong>in</strong> general are well-studied<br />

for cont<strong>in</strong>uous time stochastic processes with cont<strong>in</strong>uous observations (see<br />

Liptser and Shiryaev [51], Kallianpur [43], ksendal [56] and Appendix A).<br />

Here we consider discrete time stochastic processes, and discretely observed<br />

cont<strong>in</strong>uous time stochastic processes <strong>in</strong> particular. Belowwe give the iterative<br />

procedure used for calculat<strong>in</strong>g (E(X i jY i )) and (E(X i jY i,1 )).<br />

The important po<strong>in</strong>t is that the Kalman-Bucy lter gives besides the predictors<br />

(E(X i jY i )) and (E(X i jY i,1 )) the density p iji,1 for the conditional distribution<br />

of Y i given Y i,1 . That means we are able to calculate the transition<br />

30


densities of Y and thus the log-likelihood function for based on observations<br />

of Y 0 ;Y 1 ;:::;Y n , from which an estimation of can be obta<strong>in</strong>ed. As<br />

Pedersen [59] po<strong>in</strong>ts out, the unknown vector x 0 2 IR d should be treated as<br />

a part of and is not to be chosen at random. The <strong>in</strong>dependence of X 0 and<br />

e 0 implies Y 0 N k (T 0 x 0 + U 0 ;T 0 V 0 T T 0<br />

+ W 0 ). Hence we can calculate x 0 by<br />

x 0 = T ,1<br />

0 (Y 0 ,U 0 ) a.s. if and only if k = d; V 0 = W 0 = 0 and T 0 is of full rank.<br />

Furthermore, the start<strong>in</strong>g po<strong>in</strong>ts of the iterative procedure used to calculate<br />

the densities p iji,1 depend on x 0 .<br />

The Kalman-Bucy lter<br />

Under some technical assumptions (see Pedersen [59], pp. 4{5), we have for<br />

given observations y 0 ;y 1 ;:::;y n of Y 0 ;Y 1 ;:::;Y n<br />

X i jY i = y i N d<br />

<br />

i (y i ); i<br />

<br />

; (3.29)<br />

X i jY i,1 = y i,1 N d<br />

<br />

Di i,1 (y i,1 )+S i ;R i<br />

<br />

; (3.30)<br />

Y i jY i,1 = y i,1 N d<br />

<br />

Ti (D i i,1 (y i,1 )+S i )+U i ;T i R i T T<br />

i + W i<br />

<br />

; (3.31)<br />

where R i = D i i,1 D T i<br />

+ V i is positive denite, and where<br />

0 (y 0 ) = x 0 + V 0 T T 0<br />

<br />

T0 V 0 T T 0<br />

+ W 0<br />

,1<br />

(y0 , T 0 x 0 , U 0 ); (3.32)<br />

0 = V 0 , V 0 T T 0<br />

i (y i ) = D i i,1 (y i,1 )+S i + R i T T<br />

i<br />

<br />

T0 V 0 T T ,1<br />

0<br />

+ W 0 T0 V 0 ; (3.33)<br />

<br />

Ti R i Ti T ,1<br />

+ W i<br />

(y i , T i<br />

<br />

Di i,1 (y i,1 )+S i<br />

<br />

, Ui ) ; (3.34)<br />

i = R i , R i T T<br />

i (T i R i T T<br />

i + W i ) ,1 T i R i : (3.35)<br />

By means of the Kalman-Bucy lter we are now able to calculate l n () for<br />

every xed 2 for given observations y 0 ;y 1 ;:::;y n of Y 0 ;Y 1 ;:::;Y n .<br />

The iterative procedure (by means of the Kalman-Bucy lter)<br />

(0) Calculate 0 (y 0 ) and 0 by means of formula (3.32) and (3.33).<br />

(1) Given i,1 (y i,1 ) and i,1 the conditional distribution of Y i given Y i,1<br />

is known, and so we can calculate the transition density p iji,1 (y i jy i,1 ; ).<br />

(2) Calculate i (y i ) and i by means of formula (3.34) and (3.35) and<br />

return to (1).<br />

31


An Application<br />

As an application of the above theory, we consider the l<strong>in</strong>ear stochastic differential<br />

equation (3.27). The functions A, a, and B are assumed to be cont<strong>in</strong>uous<br />

and given up to the unknown parameter 2 IR p . Under these<br />

assumptions the stochastic dierential equation (3.27) has for every xed<br />

x 0 2 IR d and 2 a unique solution given by<br />

X t = t<br />

x 0 +<br />

Z t<br />

0<br />

,1<br />

u a u du +<br />

where is the determ<strong>in</strong>istic d d matrix process solv<strong>in</strong>g<br />

If<br />

for all t>0 then<br />

Z t<br />

0<br />

<br />

,1<br />

u B u dW u ; t 0 ; (3.36)<br />

d t = A t t dt; 0 = I d ; t 0: (3.37)<br />

A t<br />

Z t<br />

0<br />

Z t<br />

<br />

A s ds = A s ds A t<br />

0<br />

Z t<br />

<br />

t = exp<br />

0<br />

A s ds<br />

is the unique solution to (3.37). We want to estimate from possibly <strong>in</strong>complete<br />

discrete observations of X at time-po<strong>in</strong>ts 0 = t 0


We can use the previous results to estimate from possibly <strong>in</strong>complete observations<br />

of fX ti g n i=0<br />

of the type given by (3.28).<br />

Note that <strong>in</strong> many applications, the non-random elements D ti (3.40), S ti<br />

(3.41) and V ti (3.39), can not be calculated exactly. The solution to (3.37)<br />

may be unknown. In that case fX ti g n i=0<br />

can be approximated by the Euler approximation<br />

(see Appendix B.1). But even if the solution to (3.37) is known,<br />

S ti and V ti often can not be calculated exactly, and hence have to be approximated;<br />

this can be done by dierent methods depend<strong>in</strong>g on the concrete<br />

application.<br />

Mart<strong>in</strong>gale estimat<strong>in</strong>g functions<br />

The follow<strong>in</strong>g approach is proposed by Bibby and Srensen [8].<br />

We consider one-dimensional diusion processes dened by the stochastic<br />

dierential equations<br />

dX t = b(X t ; ) dt + (X t ; ) dW t ; (3.42)<br />

where X 0 = x 0 and t 0. Besides the usual assumptions on b and <strong>in</strong> (3.1),<br />

such that (3.42) has a unique solution for all <strong>in</strong> an open subset IR, the<br />

functions b and are supposed to be twice cont<strong>in</strong>uously dierentiable with<br />

respect to both arguments and is assumed to be positive. In contrast to<br />

(3.1), for convenience here we only consider the time-homogeneous case. To<br />

simplify the exposition further, assume that we can observe fX t g at discrete<br />

equidistant time po<strong>in</strong>ts, say ; 2;:::;n. Later we will give an extension<br />

to the case where X and are multi-dimensional.<br />

Our goal is to estimate the parameter from these discrete observations<br />

X ;X 2 ;:::;X n of fX t g. Inference from discrete time observations can<br />

be based on an approximation of the score function of the cont<strong>in</strong>uous loglikelihood<br />

function l t (). Denote this approximation by ~ _ ln (). For the denition<br />

of the cont<strong>in</strong>uous log-likelihood see 3.1.1, p.17. In the case where does<br />

not depend on the cont<strong>in</strong>uous time log-likelihood function is<br />

l t () =<br />

Z t<br />

0<br />

b(X s ; )<br />

2 (X s ) dX s , 1 2<br />

Z t<br />

0<br />

b 2 (X s ; )<br />

2 (X s )<br />

ds; (3.43)<br />

see also (3.5). If we replace the Lebesgue <strong>in</strong>tegrals and the It^o <strong>in</strong>tegrals by<br />

Riemann-It^o sums and dierentiate with respect to we get the approximate<br />

score function<br />

nX _b(X (i,1) ; )<br />

nX<br />

_~l n () =<br />

<br />

i=1<br />

2 (X (i,1) ) (X b(X (i,1) ; ) b(X<br />

i , X (i,1) ) , <br />

_ (i,1) ; )<br />

:<br />

<br />

i=1<br />

2 (X (i,1) )<br />

(3.44)<br />

33


If depends on we use the same estimat<strong>in</strong>g function<br />

nX<br />

_b(X (i,1) ; )<br />

_~l n () =<br />

<br />

i=1<br />

2 (X (i,1) ; ) (X b(X (i,1) ; ) b(X<br />

i , X (i,1) ) , <br />

_ (i,1) ; )<br />

:<br />

<br />

i=1<br />

2 (X (i,1) ; )<br />

(3.45)<br />

By us<strong>in</strong>g this approach for the estimation of the problem of <strong>in</strong>consistency<br />

arises as already mentioned <strong>in</strong> the <strong>in</strong>troduction of this section, p.23. To avoid<br />

this problem we can use mart<strong>in</strong>gale estimat<strong>in</strong>g functions of which we will<br />

construct four dierent types. The idea is to modify the discretized scorefunction<br />

~ _ ln () <strong>in</strong> suchaway that a zero-mean P -mart<strong>in</strong>gale is obta<strong>in</strong>ed. Then<br />

the estimator can be shown to be consistent and asymptotically normal.<br />

(1) Our rst approach is to compensate _ ~ ln , so that a mart<strong>in</strong>gale ~G n is<br />

obta<strong>in</strong>ed.<br />

By substract<strong>in</strong>g from _ ~ ln () its compensator we get a zero-mean P -mart<strong>in</strong>gale<br />

with respect to the ltration dened by F i = (X ;:::;X i ), i = 1; 2;:::.<br />

The compensator is:<br />

nX <br />

E _<br />

<br />

~ li () , ~ _ <br />

li,1 ()jF i,1<br />

i=1<br />

where<br />

=<br />

nX<br />

i=1<br />

,<br />

nX<br />

_b(X (i,1) ; )<br />

2 (X (i,1) ; ) (F (X (i,1); ) , X (i,1) )<br />

nX<br />

i=1<br />

b(X (i,1) ; ) b(X _ (i,1) ; )<br />

; (3.46)<br />

2 (X (i,1) ; )<br />

F (X (i,1) ; ) E (X i jX (i,1) ): (3.47)<br />

Thus we obta<strong>in</strong> a zero-mean mart<strong>in</strong>gale estimat<strong>in</strong>g function of the form<br />

~G n () =<br />

nX<br />

i=1<br />

_b(X (i,1) ; )<br />

2 (X (i,1) ; )<br />

<br />

Xi , F (X (i,1) ; ) : (3.48)<br />

(2) Alternatively we consider the general class of zero-mean P -<br />

mart<strong>in</strong>gale estimat<strong>in</strong>g functions<br />

G n () =<br />

nX<br />

g i,1 (X (i,1) ; ) X i , F (X (i,1) ; ) ; (3.49)<br />

i=1<br />

where for i =1;:::;n, the function g i,1 is F i,1 -measurable and cont<strong>in</strong>uously<br />

dierentiable <strong>in</strong> . The optimal estimat<strong>in</strong>g function with<strong>in</strong> the class (3.49)<br />

<strong>in</strong> the asymptotic sense of Godambe and Heyde [33] is<br />

G n() =<br />

nX<br />

i=1<br />

_ F (X (i,1) ; )<br />

(X (i,1) ; )<br />

34<br />

<br />

Xi , F (X (i,1) ; ) ; (3.50)


where<br />

(X (i,1) ; ) =E <br />

h<br />

(Xi , F (X (i,1) ; )) 2 jX (i,1)<br />

i<br />

; i =1;:::;n: (3.51)<br />

The function G n() is with<strong>in</strong> the class (3.49) <strong>in</strong> some sense "closest" to the<br />

score function based on the usually unknown exact likelihood function.<br />

It should be mentioned here that for small the mart<strong>in</strong>gale estimat<strong>in</strong>g<br />

function ~ G n (), dened <strong>in</strong> (3.48), is a rst order approximation <strong>in</strong> of G n,<br />

that means ~G n () is approximately optimal.<br />

(3) In some cases there are possibly numerical problems <strong>in</strong> calculat<strong>in</strong>g<br />

F _ (x; ) <strong>in</strong> (3.50). One way to solve these problems <strong>in</strong>volves approximat<strong>in</strong>g<br />

F _ (x; ) up to the order O( 2 ). That leads to a third mart<strong>in</strong>gale estimat<strong>in</strong>g<br />

function<br />

G + n () =<br />

nX <br />

_b(X(i,1) ; ) + 1 _<br />

2 2 b(X(i,1) ; )b 0 (X (i,1) ; )<br />

i=1<br />

+b(X (i,1) ; ) _ b 0 (X (i,1) ; )+ 1 2 (_2 (X (i,1) ; )b 00 (X (i,1) ; )<br />

+ 2 (X (i,1) ; ) b _ i 00 (X i , F (X (i,1) ; ))<br />

(X (i,1) ; )) : (3.52)<br />

(X (i,1) ; )<br />

Altogether we have now found expressions for three dierent zero-mean P -<br />

mart<strong>in</strong>gale estimat<strong>in</strong>g functions.<br />

As for the multi-dimensional case, suppose is k-dimensional, fX t g and<br />

b(X t ; ) are d-dimensional, is a dm-dimensional matrix with T positive<br />

denite and the Wiener process fW t g is m-dimensional. Then the k 1-<br />

dimensional mart<strong>in</strong>gale estimat<strong>in</strong>g functions ~G n and G n have the form<br />

and<br />

~G n () =<br />

G n() =<br />

nX<br />

_b(X (i,1) ; ) T (X (i,1) ; )((X (i,1) ; ) T ,1<br />

i=1<br />

X i , F (X (i,1) ; ) <br />

nX<br />

F _<br />

(X (i,1) ; ) T (X (i,1) ; ) ,1 X i , F (X (i,1) ; ) ;<br />

i=1<br />

where is assumed to be positive denite and _ b and _ F denote the d k-<br />

dimensional matrices of partial derivatives with respect to the components<br />

of .<br />

35


The asymptotic properties of the estimator ^ n we obta<strong>in</strong> from the mart<strong>in</strong>gale<br />

estimat<strong>in</strong>g functions (3.48), (3.50) and (3.52), or more generally from the<br />

class of mart<strong>in</strong>gale estimat<strong>in</strong>g functions G n of the form (3.49), are discussed<br />

by Bibby and Srensen [8]. Under natural regularity conditions (see Bibby<br />

and Srensen [8], pp. 7{9) we have<br />

Theorem 4 An estimator ^ n , which solves the equation<br />

G n (^ n )=0;<br />

exists with probability tend<strong>in</strong>g to one as n ,! 1 under P 0 . Moreover, as<br />

n ,! 1,<br />

^ n ,! 0<br />

<strong>in</strong> probability under P 0 and ^ n is asymptotically normal <strong>in</strong> distribution under<br />

P 0 .<br />

For the proof we refer to [8].<br />

As a rst example we consider the Ornste<strong>in</strong>-Uhlenbeck process where the<br />

transition densities are well-known.<br />

Example 1<br />

The Ornste<strong>in</strong>-Uhlenbeck process is the solution of the stochastic dierential<br />

equation<br />

dX t = X t dt + dW t ; (3.53)<br />

with X 0 = x 0 . In this case the drift coecientisb(x; ) =x and the diusion<br />

coecient (x; ) is assumed to be known. The transition probability is<br />

normal with mean F (x; ) =xe and variance () = 2<br />

2 (e2 , 1). Hence<br />

the estimat<strong>in</strong>g function ~G n has the form<br />

~G n () = 1 2 nX<br />

i=1<br />

and we obta<strong>in</strong> ^ n as solution of ~G n () =0:<br />

X (i,1) (X i , X (i,1) e ) ;<br />

^ n = 1 P ni=1<br />

log X (i,1) X i<br />

Pni=1 :<br />

X 2 (i,1)<br />

The estimators ^ n we obta<strong>in</strong> from the mart<strong>in</strong>gale estimat<strong>in</strong>g functions G +<br />

and G are the same because G + and G are proportional to ~G.<br />

In the next example we consider a wider class of stochastic processes.<br />

36


Example 2<br />

The solutions of the stochastic dierential equation<br />

dX t =( + X t ) dt + (X t ) dW t ; (3.54)<br />

where X 0 = x 0 and the function takes positive values <strong>in</strong> IR, are called<br />

mean-revert<strong>in</strong>g processes (see also model (1.5)). The unknown parameters<br />

are and . Our aim is to be able to calculate the mart<strong>in</strong>gale estimat<strong>in</strong>g<br />

functions ~G n and G n.<br />

Lemma 1 The function<br />

f(t) E ; (X t jX 0 )<br />

solves<br />

f 0 (t) = + f(t): (3.55)<br />

Proof: Write (3.54) <strong>in</strong> <strong>in</strong>tegral form<br />

X t = X 0 +<br />

Condition<strong>in</strong>g on X 0 we have<br />

and equivalently<br />

Z t<br />

0<br />

( + X s )ds +<br />

Z t<br />

0<br />

(X s )dW s :<br />

Z t<br />

<br />

E ; (X t jX 0 ) = E ; (X 0 jX 0 )+E ; ( + X s )dsjX 0<br />

0<br />

<br />

(X s )dW s jX 0 ;<br />

0<br />

| {z }<br />

=0<br />

+E ;<br />

Z t<br />

E ; (X t jX 0 )=X 0 + t + <br />

Z t<br />

0<br />

E ; (X s jX 0 )ds:<br />

We conclude<br />

dE ; (X t jX 0 )<br />

= + E ; (X t jX 0 );<br />

dt<br />

and the claim follows. Note that the function f r (t) =E ; (X t jX r ), 0 r t,<br />

also solves (3.55), for the proof stays the same apart from<br />

E ;<br />

Z t<br />

0<br />

(X s )dW s jX r<br />

<br />

=<br />

Z r<br />

0<br />

(X s )dW s ;<br />

which is <strong>in</strong>dependent of t and thus plays no role <strong>in</strong> the derivative. 2<br />

37


Corollary 2 For<br />

we have<br />

F (X (i,1) ; ; ) =E ;<br />

<br />

Xi jX (i,1)<br />

<br />

F (x; ; ) =xe + (e , 1): (3.56)<br />

Proof: The solution of<br />

f 0 (t) = + f(t); f(t 0 )=f 0 ; t t 0 ;<br />

is<br />

f(t) =f 0 e (t,t 0) + (e(t,t 0) , 1):<br />

Hence we have for E ; (X ti jX ti,1 ) f(t i ) with constant t i , t i,1 for<br />

all i<br />

E ; (X ti jX ti,1 ) = E(X ti,1 jX ti,1 ) e + (e , 1)<br />

and the claim follows. 2<br />

= X ti,1 e + (e , 1);<br />

With (3.56) we are now able to calculate ~G n and G n <strong>in</strong> the follow<strong>in</strong>g way<br />

"<br />

X n <br />

1<br />

~G n (; ) = 2<br />

X i , X (i,1) e + <br />

(X (i,1) )<br />

(1 , e ) ;<br />

nX<br />

i=1<br />

nX<br />

i=1<br />

G n(; ) =<br />

i=1<br />

X (i,1)<br />

2<br />

(X (i,1) )<br />

" n X<br />

i=1<br />

<br />

X i , X (i,1) e + (1 , e )# T<br />

;<br />

<br />

e , 1<br />

X i , X (i,1) e + <br />

(X (i,1) ; ; )<br />

(1 , e ) ;<br />

e (X (i,1) + )+ 2 (1 , e )<br />

(X (i,1) ; ; )<br />

Consider<strong>in</strong>g G + is not <strong>in</strong>terest<strong>in</strong>g s<strong>in</strong>ce F is known.<br />

<br />

X i , X (i,1) e + (1 , e )# T<br />

:<br />

The estimation equation ~G n (; ) = 0 can be solved explicitly. Abbreviat<strong>in</strong>g<br />

2<br />

i,1<br />

2 (X (i,1) )we obta<strong>in</strong><br />

e ~ n<br />

=<br />

<br />

Pni=1 X (i,1)<br />

2<br />

i,1<br />

<br />

Pni=1 X i<br />

2<br />

i,1<br />

<br />

Pni=1 X (i,1)<br />

2<br />

i,1<br />

<br />

,<br />

<br />

Pni=1 X (i,1) X i<br />

2<br />

i,1<br />

2<br />

,<br />

<br />

Pni=1 X 2 (i,1)<br />

2<br />

i,1<br />

38<br />

<br />

Pni=1<br />

1<br />

<br />

Pni=1<br />

<br />

1<br />

2<br />

i,1<br />

<br />

2<br />

i,1<br />

(3.57)


and<br />

~ n =<br />

~ n<br />

1 , e ~ n<br />

<br />

Pni=1 X (i,1)<br />

2<br />

i,1<br />

<br />

e ~ n<br />

,<br />

<br />

Pni=1 X i<br />

2<br />

i,1<br />

<br />

Pni=1<br />

1<br />

2<br />

i,1<br />

<br />

: (3.58)<br />

As regard<strong>in</strong>g the mart<strong>in</strong>gale estimat<strong>in</strong>g function G n we are able to nd a<br />

closed expression for only <strong>in</strong> a few cases where the diusion coecient is<br />

rather simple. For <strong>in</strong>stance, if (x) = p x (as <strong>in</strong> the square root diusion<br />

model (1.5), the Cox Ingersoll Ross model) we have with a similar argument<br />

as for F (that is, the conditional second moment solves an ord<strong>in</strong>ary<br />

dierential equation)<br />

(x; ; ) = 2<br />

2 2 <br />

( +2x)e<br />

2<br />

, 2( + x)e + :<br />

As an extension we consider the mean-revert<strong>in</strong>g process where > 0 enters<br />

the diusion coecient<br />

q<br />

dX t = ,X t dt + + Xt 2 dW t :<br />

With the same argument as above we obta<strong>in</strong> the conditional variance<br />

(x; ; ) =x 2 e ,2 (e , 1) +<br />

<br />

2 , 1 (1 , e(1,2) ):<br />

We remark that <strong>in</strong> practical applications the estimat<strong>in</strong>g equations correspond<strong>in</strong>g<br />

to G n can be solved us<strong>in</strong>g a generalization of Newton's method.<br />

Furthermore, if no explicit expressions for the conditional mean F and the<br />

conditional variance are known, then F and can be approximated by the<br />

sample mean and sample variance of a large number of simulated realizations<br />

of the diusion process at the relevant time po<strong>in</strong>t.<br />

(4) As an extension we shall comb<strong>in</strong>e mart<strong>in</strong>gale estimat<strong>in</strong>g functions<br />

<strong>in</strong> an 'optimal' way.<br />

The follow<strong>in</strong>g approach is proposed by Bibby [5].<br />

If the unknown parameter is multi-dimensional and a part of is only found<br />

<strong>in</strong> the diusion coecient, then us<strong>in</strong>g the mart<strong>in</strong>gale estimat<strong>in</strong>g functions<br />

generated by the conditional mean, ~G n and G n, leads to fewer estimation<br />

equations than parameters. That is <strong>in</strong> this case neither G nor ~G can be<br />

used to estimate the part of that only enters the diusion coecient. This<br />

39


disadvantage of ~G n and G n motivates us to consider mart<strong>in</strong>gale estimat<strong>in</strong>g<br />

functions generated by higher order conditional moments for<br />

example by the conditional variance:<br />

H n () =<br />

nX <br />

2<br />

h(X (i,1) ; ) Xi , F (X (i,1) ; ) , (X(i,1) ; )<br />

i=1<br />

; (3.59)<br />

with given by (3.51) and F given by (3.47). In analogy to (3.50) we obta<strong>in</strong><br />

the optimal estimat<strong>in</strong>g function with<strong>in</strong> the class (3.59), <strong>in</strong> the sense of<br />

Godambe and Heyde [33]. This optimal function takes the form<br />

H n() =<br />

nX<br />

i=1<br />

_(X (i,1) ; )<br />

(X (i,1) ; )<br />

<br />

Xi , F (X (i,1) ; ) 2<br />

, (X(i,1) ; )<br />

; (3.60)<br />

where is assumed to be dierentiable <strong>in</strong> and is the fourth conditional<br />

cumulant<br />

(X (i,1) ; ) =E <br />

<br />

Xi , F (X (i,1) ; ) 4<br />

jX(i,1)<br />

<br />

, (X (i,1) ; ) 2 ;<br />

where i =1;:::;n.<br />

Comb<strong>in</strong><strong>in</strong>g the functions G n and H n leads to further mart<strong>in</strong>gale estimat<strong>in</strong>g<br />

functions that may have better properties. Follow<strong>in</strong>g the optimal way of<br />

comb<strong>in</strong><strong>in</strong>g G n and H n described <strong>in</strong> Heyde [35], the function Kn is obta<strong>in</strong>ed:<br />

" nX _()() , F () ()<br />

Kn() =<br />

(X<br />

() () , 2 i , F ())<br />

()<br />

i=1<br />

F<br />

+ _ ()() , ()() _ # (Xi , F ()) 2 , () ; (3.61)<br />

() () , 2 ()<br />

where for abbreviation the rst argument of all functions on the right hand<br />

side, that is X (i,1) , has been left out and where denotes the third conditional<br />

central moment<br />

(X (i,1) ; ) =E <br />

h<br />

(Xi , F (X (i,1) ; )) 3 jX (i,1)<br />

i<br />

; i =1;:::;n:<br />

For K n, as for G n , it can be shown that an estimator obta<strong>in</strong>ed from the estimat<strong>in</strong>g<br />

equation K n() =0exists, is consistent and asymptotically normal.<br />

Under some regularity conditions (see Bibby [5], pp. 4{6) we have<br />

Theorem 5 An estimator ^ n exists for every n, which on a set C n solves<br />

the equation<br />

K n (^ n )=0;<br />

40


where P 0 (C n ) ,! 1 as n ,! 1. Moreover, as n ,! 1,<br />

^ n ,! 0<br />

<strong>in</strong> probability under P 0 and ^ n is asymptotically normal <strong>in</strong> distribution under<br />

P 0 .<br />

For the proof we refer to [5, 8].<br />

41


3.2 Discrete models<br />

In section 1.2 we dealt with three k<strong>in</strong>ds of stochastic volatility models, follow<strong>in</strong>g<br />

an AR(p)-process, an ARCH or even a GARCH process. Now we<br />

discuss Maximum Likelihood (ML) estimation for these models and derive<br />

the properties of the estimators. F<strong>in</strong>ally, we treat the Bayesian analysis.<br />

3.2.1 AR models<br />

In our representation of estimation theory for the AR(1)-process we follow<br />

[67] x5. The parameter estimation problem for l<strong>in</strong>ear time series (i.e. ARMA<br />

processes) is to be found <strong>in</strong> many textbooks, see for <strong>in</strong>stance [16], [34], [64].<br />

Below we discuss a slightly dierent approach which will turn out to be<br />

useful <strong>in</strong> more general models. The AR(1) case should therefore be seen as a<br />

pedagogic example with respect to this more general methodology.<br />

Consider the autoregressive AR(1) model<br />

X n = X n,1 + " n ; n 1; (3.62)<br />

where 2 IR is the parameter to be estimated, X 0 is a random variable<br />

with E(X 0 ) = 0 and f" n ;n 1g is `noise', where " n N (0; 2 ) i.i.d. with<br />

known 2 .<br />

From equation (3.62) we conclude<br />

X n = " n + " n,1 + :::+ n,1 " 1 + n X 0 ;<br />

and thus the probabilistic properties of fX n g essentially depend on the jo<strong>in</strong>t<br />

distribution of X 0 ;" 1 ;" 2 ;:::.We have<br />

and<br />

Var X n = 2 (1 + 2 + :::+ 2(n,1) )+ 2n Var X 0 (3.63)<br />

E X n X n,k = 2 k (1 + 2 + :::+ 2(n,k,1) )+ k 2(n,k) Var X 0 : (3.64)<br />

In the case jj < 1, we obta<strong>in</strong> from (3.63) and (3.64) by choos<strong>in</strong>g X 0<br />

<br />

N (0; 2<br />

)<br />

1, 2<br />

<br />

E X n =0; Var X n = 2<br />

1 , 2 ; E X n X n,k = 2 k<br />

1 , 2 ;<br />

and fX n g has a stationary distribution.<br />

42


In the case jj 1 we have Var X n ,!1as n ,! 1, that is the process<br />

explodes.<br />

In both cases = 1, fX n g reduces to a random walk.<br />

From these considerations we suppose that as for probabilistic properties of<br />

estimators of , e.g. asymptotic behaviour, we have to dist<strong>in</strong>guish between<br />

the cases jj > 1, jj < 1 and jj =1.<br />

In the follow<strong>in</strong>g we assume for the AR(1) model (3.62) X 0 = 0 and 2 = 1<br />

for convenience.<br />

Denot<strong>in</strong>g by p (X 1 ;:::;X n ) the jo<strong>in</strong>t density of X 1 ;:::;X n , the Maximum<br />

Likelihood Estimator (MLE) ^ n is dened to be a value such that the jo<strong>in</strong>t<br />

density p reaches a maximum that is<br />

^ n = argmax p (X 1 ;:::;X n );<br />

(for the denition of the MLE <strong>in</strong> cont<strong>in</strong>uous time see x3.1.1, p.17). We know<br />

the jo<strong>in</strong>t density p of X 1 ;:::;X n<br />

"<br />

p (X 1 ;:::;X n )=(2) , n 2 exp , 1 #<br />

nX<br />

(X i , X i,1 ) 2 ;<br />

2<br />

and hence are able to calculate ^ n by solv<strong>in</strong>g d d p =0<br />

Insert<strong>in</strong>g (3.62) for X i we obta<strong>in</strong><br />

i=1<br />

P ni=1<br />

X i,1 X i<br />

^ n = Pni=1 :<br />

Xi,1<br />

2<br />

^ n = +<br />

P ni=1<br />

X i,1 " i<br />

Pni=1<br />

X 2 i,1<br />

P a:s: (3.65)<br />

Denot<strong>in</strong>g<br />

M n =<br />

nX<br />

i=1<br />

X i,1 " i ;<br />

we see immediately that the process M n is a mart<strong>in</strong>gale with respect to the<br />

ltration F n = (X 1 ;:::;X n ) under P for any . The mart<strong>in</strong>gale M n is<br />

square <strong>in</strong>tegrable, i.e. EMn<br />

2 < 1, n 0, and we know that the stochastic<br />

sequence Mn 2 is a submart<strong>in</strong>gale (see [66] x7.1, p.455). By means of the<br />

Doob decomposition (see e.g. [66], p.454) there is a mart<strong>in</strong>gale m n and a<br />

predictable 2 <strong>in</strong>creas<strong>in</strong>g 3 sequence hMi n such that<br />

M 2 n = m n + hMi n :<br />

2 A process X n is predictable if X n is F n,1 measurable.<br />

3 A process X n is <strong>in</strong>creas<strong>in</strong>g if X 0 =0,X n X n+1 P a.s.<br />

43


The sequence hMi n is called the square characteristic or the quadratic variation<br />

of M n . Here hMi n is<br />

see [66], p.455.<br />

hMi n =<br />

nX<br />

i=1<br />

Hence, equation (3.65) can be written as<br />

^ n , =<br />

X 2 i,1;<br />

M n<br />

hMi n<br />

: (3.66)<br />

Recall the denition of the Fisher <strong>in</strong>formation <strong>in</strong> the cont<strong>in</strong>uous time case<br />

(see 3.1.1, p.19). Here <strong>in</strong> discrete time the Fisher <strong>in</strong>formation equals<br />

Direct calculation shows<br />

hence<br />

"<br />

#<br />

I n () =E , @2 ln p (x 1 ;:::;x n )<br />

:<br />

@ 2<br />

I n () =E <br />

X<br />

X<br />

2<br />

i,1<br />

;<br />

I n () =E hMi n ;<br />

and therefore hMi n is often called the stochastic Fisher <strong>in</strong>formation. Calculations<br />

based on (3.63) show that for large n the Fisher <strong>in</strong>formation is<br />

approximately<br />

n<br />

; jj < 1;<br />

1, 2<br />

S<strong>in</strong>ce<br />

I n () <br />

8<br />

><<br />

>:<br />

; jj =1;<br />

;<br />

( 2 ,1) 2 jj > 1:<br />

n 2<br />

2<br />

2n<br />

hMi n ,!1<br />

P a:s:;<br />

we can apply the `law of large numbers for square <strong>in</strong>tegrable mart<strong>in</strong>gales'<br />

and obta<strong>in</strong><br />

M n<br />

,! 0 P a:s:;<br />

hMi n<br />

see [66], p.487, Theorem 4. Thus, with (3.66) we conclude that the estimator<br />

is strongly consistent, that is<br />

as n tends to <strong>in</strong>nity.<br />

^ n ,! <br />

P a.s.<br />

As for asymptotic behaviour of the estimator we only state the results and<br />

refer to [67] x5 for a detailed treatment.<br />

44


Theorem 6 Depend<strong>in</strong>g on the value of we have the follow<strong>in</strong>g asymptotic<br />

behaviour for the normalized deviation q I n ()(^ n , ):<br />

q<br />

<br />

lim P I<br />

n,!1 n ()(^ n , ) z =<br />

8<br />

><<br />

>:<br />

(z); jj < 1;<br />

H (z); jj =1;<br />

Ch(z); jj > 1;<br />

where (z) is the standard normal distribution, Ch(z) is the Cauchy distribution<br />

and H (z) is the distribution of the random variable<br />

<br />

2 3 2<br />

where W denotes a Wiener process.<br />

W 2 (1) , 1<br />

R 1<br />

0<br />

W 2 (s)ds ;<br />

Hence, <strong>in</strong> the stationary case jj < 1 the normalized deviation q I n ()(^ n ,)<br />

is asymptotically normally distributed, whereas <strong>in</strong> the other cases it has<br />

as limit distribution the Cauchy distribution or the quite unexpected H <br />

distribution. The question arises wether we can reduce the number of limit<br />

distributions by modify<strong>in</strong>g the normaliz<strong>in</strong>g factor q I n (). Indeed, <strong>in</strong>stead of<br />

us<strong>in</strong>g the Fisher <strong>in</strong>formation I n () =EhMi n , choos<strong>in</strong>g the stochastic Fisher<br />

<strong>in</strong>formation hMi n as normaliz<strong>in</strong>g factor we obta<strong>in</strong><br />

Theorem 7<br />

lim<br />

n,!1 P <br />

q<br />

(<br />

(z); jj 6=1;<br />

hMi n (^ n , ) z =<br />

H (z); jj =1:<br />

Summariz<strong>in</strong>g: the MLE ^ n is (strongly) consistent and the normalized deviations<br />

q I n ()(^ n , ) and q hMi n (^ n , ) are asymptotically distributed as<br />

shown <strong>in</strong> both theorems.<br />

3.2.2 ARCH and GARCH models<br />

In the follow<strong>in</strong>g we concentrate on ARCH and GARCH models and especially<br />

on estimation <strong>in</strong> these models. We refer to the fundamental papers by<br />

Engle [22] and Bollerslev [12] and to Bollerslev, Chou and Kroner [13] and<br />

Bollerslev, Engle and Nelson [14].<br />

Conventional econometric time series models assume constantvariance. However,<br />

over a decade ago risk and uncerta<strong>in</strong>ty considerations lead to the development<br />

of new econometric time series models that allow for the model<strong>in</strong>g<br />

45


of time vary<strong>in</strong>g variances and covariances. Engle [22] <strong>in</strong>troduced such a new<br />

class of stochastic processes, called the class of AutoRegressive Conditional<br />

Heteroscedastic (ARCH) processes where the conditional variances and covariances<br />

depend on the past. These are discrete time stochastic processes<br />

f" t g of the form<br />

" t = z t t (3.67)<br />

with z t i. i. d. , E(z t )=0, Var(z t )=1, and with time-vary<strong>in</strong>g positive t which<br />

is time (t , 1) measurable. Assume <strong>in</strong> the follow<strong>in</strong>g z t N(0; 1) and " t is a<br />

scalar process. The extension to the multivariate case is straightforward, see<br />

e.g. [13].<br />

First we focus directly on the process f" t g (3.67) and assume that " t is itself<br />

observable. However, note that <strong>in</strong> many applications " t corresponds to the<br />

<strong>in</strong>novations for some other stochastic process y t , where y t = f(b; x t )+" t with<br />

" t conditionally distributed N (0; 2 ), f a function of x t which is time (t , 1)<br />

measurable and of a parameter vector b. In this context we will consider<br />

later the ARCH regression model, see (3.75), where f(b; x t ) x 0 tb, that is<br />

the conditional mean of y t is given as x 0 tb.<br />

The conditional variance of " t equals t 2 . We will concentrate <strong>in</strong> the follow<strong>in</strong>g<br />

on some frequently used models for t 2 . Engle [22] suggests one possible<br />

parametrization for t<br />

2 as a l<strong>in</strong>ear function of the past q squared values of<br />

the process<br />

2 t = 0 +<br />

qX<br />

i=1<br />

i " 2 t,i; (3.68)<br />

where for the model to be well-dened 0 > 0 and i 0 for i = 1;:::;q.<br />

This model is known as the l<strong>in</strong>ear ARCH(q) model.<br />

Nowwe discuss Maximum Likelihood (ML) estimation <strong>in</strong> the l<strong>in</strong>ear ARCH(q)<br />

model. For a discussion of the ML estimation see section 3.1.1. Apart from<br />

some constants, the log-likelihood of the tth observation is<br />

l t = , 1 2 log 2 t , 1 2 "2 t = 2 t ; (3.69)<br />

where the term , 1 log 2 2 t arises from the transformation from z t to " t . The<br />

log-likelihood function for the full sample " T ;" T ,1 ;:::;" 1 is<br />

L = 1 T<br />

TX<br />

t=1<br />

l t : (3.70)<br />

To estimate the unknown parameters 0 ( 0 ; 1 ;:::; q ) we maximize the<br />

log-likelihood function, that is the rst order derivatives are set equal to zero.<br />

46


Denot<strong>in</strong>g u 0 t (1;" 2 t,1;:::;" 2 t,q) and h t t<br />

2 we abbreviate (3.68) by<br />

h t = u 0 t:<br />

With this notation the rst order derivatives are<br />

@l t<br />

@ = 1 u t<br />

2h t<br />

" 2 t<br />

h t<br />

, 1<br />

!<br />

: (3.71)<br />

In order to obta<strong>in</strong> the limit<strong>in</strong>g distributions for the normalized deviations of<br />

the estimators <strong>in</strong> (3.76) below, we have to estimate the Fisher <strong>in</strong>formation<br />

matrix F, see x3.1.1, p.19, which is the negative expectation of the Hessian<br />

averaged over all observations. Therefore we calculate the Hessian<br />

@ 2 l t<br />

@@ 0 = , 1<br />

2h 2 t<br />

u t u 0 t<br />

! "<br />

" 2 t "<br />

2<br />

+ t<br />

, 1<br />

h t h t<br />

# @<br />

@ 0 1<br />

2h t<br />

u t<br />

<br />

: (3.72)<br />

S<strong>in</strong>ce the conditional expectation of the factor " 2 t =h t is one and of the second<br />

term <strong>in</strong> (3.72) is zero, the Fisher <strong>in</strong>formation matrix is given by<br />

F = X t<br />

and consistently estimated by<br />

^F = 1 T<br />

1<br />

2T E " 1<br />

h 2 t<br />

X<br />

t<br />

" 1<br />

2h 2 t<br />

u t u 0 t<br />

u t u 0 t<br />

Later we will discuss the properties of the estimators, see p.49.<br />

In many applications with the l<strong>in</strong>ear ARCH(q) model a large number of parameters<br />

and a long lag length q are needed. These problems are avoided by an<br />

alternative, more general parametrization of h t <strong>in</strong>troduced by Bollerslev [12].<br />

This more general model is called the Generalized ARCH, or GARCH(p; q),<br />

model<br />

2 t = 0 +<br />

qX<br />

i=1<br />

i " 2 t,i +<br />

pX<br />

i=1<br />

#<br />

#<br />

:<br />

;<br />

i 2 t,i;<br />

where q > 0, p 0, 0 > 0, i 0 for i = 1;:::;q, and i 0 for<br />

i = 1;:::;p. The generalization <strong>in</strong> the GARCH(p; q) model <strong>in</strong> comparison<br />

to the ARCH(q) model is that beside past values of the process, also past<br />

conditional variances enter.<br />

Denote 0 ( 0 ; 1 ;:::; q ; 1 ;:::; p ), h t t<br />

2 and vt 0 (1;" 2 t,1;:::; " 2 t,q;<br />

h t,1 ;:::;h t,p ). As for the ARCH model we estimate by dierentiat<strong>in</strong>g the<br />

log-likelihood with respect to <br />

@l t<br />

@ = 1 !<br />

@h t " 2 t<br />

, 1 :<br />

2h t @ h t<br />

47


The Hessian is<br />

@ 2 l t<br />

@@ 0 = , 1 @h t @h t<br />

2h 2 t @ @ 0<br />

! " # " #<br />

" 2 t "<br />

2<br />

+ t @ 1 @h t<br />

, 1<br />

h t h t @ 0 ; (3.73)<br />

2h t @<br />

where<br />

@h t<br />

@ = v t +<br />

pX<br />

i=1<br />

i<br />

@h t,i<br />

@ : (3.74)<br />

The dierence from the ARCH model is the <strong>in</strong>clusion of the recursive part <strong>in</strong><br />

(3.74). As for the ARCH model, the Fisher <strong>in</strong>formation matrix is consistently<br />

estimated by the sample analogue of the rst term <strong>in</strong> (3.73).<br />

As mentioned earlier we consider a generalization of model (3.67), (3.68), the<br />

ARCH regression model<br />

" t = y t , x 0 tb;<br />

h t = u 0 t; (3.75)<br />

where " t is conditionally distributed as N (0;h t ), b is a parameter vector<br />

and x t is time (t , 1) measurable. As <strong>in</strong> the l<strong>in</strong>ear ARCH(q) model the<br />

conditional mean of " t is zero, but the conditional mean of y t is given as x 0 tb.<br />

The conditional variance for both " t and y t is h t .<br />

As for the l<strong>in</strong>ear ARCH(q) model we consider ML estimation for the ARCH<br />

regression model. In addition to the parameter estimated as <strong>in</strong> the l<strong>in</strong>ear<br />

ARCH(q) model wehave to estimate parameter b. The derivative with respect<br />

to b is given by<br />

The Hessian is<br />

@l t<br />

@b = " tx 0 t<br />

+ 1 @h t<br />

h t 2h t @b<br />

@ 2 l t<br />

@b@b 0 = , x0 tx t<br />

, 1 @h t @h t<br />

h t 2h 2 t @b @b 0<br />

, 2" tx 0 t<br />

h 2 t<br />

@h t<br />

@b +<br />

"2 t<br />

h t<br />

, 1<br />

" 2 t<br />

h t<br />

, 1<br />

!<br />

" 2 t<br />

h<br />

! t<br />

@<br />

!<br />

:<br />

" 1 @h t<br />

@b 0 2h t @b<br />

Tak<strong>in</strong>g conditional expectations, the last two terms of the Hessian vanish<br />

and " 2 t =h t becomes one, so that the part of the Fisher <strong>in</strong>formation matrix<br />

correspond<strong>in</strong>g to b is given by<br />

F bb = 1 T<br />

X<br />

t<br />

E<br />

" x<br />

0<br />

t x t<br />

+ 1 #<br />

@h t @h t<br />

h t 2h 2 t @b @b 0 ;<br />

#<br />

:<br />

48


and by substitut<strong>in</strong>g the l<strong>in</strong>ear variance function, F bb is consistently estimated<br />

by<br />

^F bb = 1 T<br />

2<br />

X<br />

t<br />

4 x0 tx t<br />

h t<br />

+2 X j<br />

2 j<br />

3<br />

" 2 t,j<br />

x 0<br />

h<br />

t,jx 2 t,j<br />

5 :<br />

t<br />

F<strong>in</strong>ally we give a remark to the ML estimation for the GARCH regression<br />

model<br />

" t = y t , x 0 tb;<br />

h t = v 0 t;<br />

with v t and as above. In order to estimate the mean parameters b we<br />

dierentiate with respect to b as shown for the ARCH regression model with<br />

the s<strong>in</strong>gle dierence<br />

@h t<br />

qX<br />

pX<br />

@b = ,2 @h t,j<br />

j x t,j " t,j + j<br />

@b :<br />

j=1<br />

Pay<strong>in</strong>g attention to this dierence, the part of the Fisher <strong>in</strong>formation matrix<br />

correspond<strong>in</strong>g to b is consistently estimated as <strong>in</strong> the ARCH regression model.<br />

We denote by F , respectively F , the part of the Fisher <strong>in</strong>formation matrix<br />

correspond<strong>in</strong>g to , respectively , and the elements <strong>in</strong> the o-diagonal<br />

block of the <strong>in</strong>formation matrix by F b , respectively by F b . The elements<br />

F b , respectively F b , may be shown to be zero. Because of this asymptotic<br />

<strong>in</strong>dependence , respectively , can be estimated without loss of asymptotic<br />

eciency based on a consistent estimate of b and vice versa. Us<strong>in</strong>g this fact<br />

Engle [22], x6, formulates a simple scor<strong>in</strong>g algorithm for the ML estimation<br />

of the parameters and b.<br />

As for the properties of the estimators, Bollerslev [14] remarks that for the<br />

general ARCH class of models the verication of sucient regularity conditions<br />

for the MLE to be consistent and asymptotically normally distributed<br />

is very dicult. A detailed proof is only worked out <strong>in</strong> a few cases. Normally<br />

one assumes that these regularity conditions are satised, such that<br />

the ML estimators ^, respectively ^, and ^b are consistent and asymptotically<br />

normally distributed with limit<strong>in</strong>g distribution<br />

p<br />

T (^ , ) ,! N(0; F<br />

,1<br />

); resp. p T (^ , ) ,! N(0; F ,1 );<br />

j=1<br />

and p T (^b , b) ,! N(0; F ,1<br />

bb ): (3.76)<br />

Clos<strong>in</strong>g our considerations about estimation <strong>in</strong> ARCH/GARCH models we<br />

remark that Geweke [31] developes Bayesian <strong>in</strong>ference procedures for ARCH<br />

models by us<strong>in</strong>g Monte Carlo methods to determ<strong>in</strong>e the a posteriori distribution.<br />

The Bayesian analysis is discussed <strong>in</strong> the next section.<br />

49


3.2.3 The Bayesian estimation method<br />

Suppose we have the data x = (x 1 ;:::;x n ) with distribution p(xj) where<br />

is the unknown parameter we want to estimate. The basic idea of the<br />

Bayesian approach is to treat the parameter as a random variable and to<br />

use a guess or an a priori knowledge of the distribution () of and then<br />

to estimate by calculat<strong>in</strong>g the a posteriori distribution (jx) of . For<br />

details about the Bayesian theory we refer to [3] and [55]. First of all the<br />

Bayesian method will be described <strong>in</strong> the case where the parameter is onedimensional.<br />

Furthermore, the k-dimensional case and the hierarchical model<br />

will be discussed.<br />

The one-dimensional case<br />

In the one-dimensional case the a posteriori distribution (jx) of is calculated<br />

by the so called Bayes formula us<strong>in</strong>g the a priori distribution () as<br />

follows<br />

(jx) = R p(xj)() ; (3.77)<br />

p(xj)()d<br />

where the denom<strong>in</strong>ator is a proportionality constant mak<strong>in</strong>g the total a posteriori<br />

probability equal to one. Now by us<strong>in</strong>g the a posteriori distribution<br />

(jx) the parameter can be estimated for example via the modus of the a<br />

posteriori distribution<br />

^ = arg max (jx)<br />

or by the mean<br />

or by the median<br />

^ =E[(jx)]<br />

^ = median [(jx)] :<br />

For the asymptotic properties of the estimator we refer to [3], x5.3 Asymptotic<br />

Analysis.<br />

Example (One-dimensional case)<br />

Suppose that x =(x 1 ;:::;x n ) and x i j N(; 2 ) i. i. d., where 2 is known.<br />

Choose an a priori distribution of N( 0 ;0). 2 With the Bayes formula<br />

( ) ( )<br />

,(xi , ) 2 ,( , 0 ) 2<br />

(jx i ) / exp<br />

exp<br />

2 2 20<br />

(<br />

2<br />

/ exp , 1 1<br />

2 + 1 !" 2 , 2 x i 2 + #)<br />

0 0 2<br />

+ ::: ;<br />

2 0<br />

2 2 + 0<br />

2<br />

50


where the terms and factors not written out do not <strong>in</strong>volve , we have<br />

!<br />

;<br />

and with<br />

we have<br />

where x = 1 n<br />

P<br />

xi .<br />

jx i N x i 2 0<br />

+ 0 2<br />

;<br />

2 + 0<br />

2<br />

nY<br />

(jx) / p(x i j) ()<br />

/<br />

i=1<br />

( ) ,(x , )<br />

2<br />

exp<br />

exp<br />

2 2 =n<br />

The multi-dimensional case<br />

1<br />

,2 + ,2<br />

0<br />

( ,( , 0 ) 2<br />

2 2 0<br />

!<br />

jx N xn2 0<br />

+ 0 2 1<br />

;<br />

;<br />

n0 2 + 2 n ,2 + ,2<br />

0<br />

In the multi-dimensional case = ( 1 ;:::; k ), the a posteriori distribution<br />

of can be calculated by the Bayes formula as follows<br />

(jx) =<br />

)<br />

;<br />

p(xj)()<br />

R :::<br />

R p(xj)() d1 :::d k<br />

: (3.78)<br />

By us<strong>in</strong>g the marg<strong>in</strong>al distributions ( i jx) of the jo<strong>in</strong>t a posteriori distribution<br />

(jx)<br />

Z Z<br />

( i jx) = ::: (jx) d 1 :::d i,1 d i+1 :::d k ; (3.79)<br />

we are able to estimate by the ways described <strong>in</strong> the one-dimensional case<br />

above.<br />

Usually problems arise <strong>in</strong> calculat<strong>in</strong>g the <strong>in</strong>tegrals <strong>in</strong> (3.79) which require<br />

approximation techniques. We will discuss simulations of distributions us<strong>in</strong>g<br />

so called Markov Cha<strong>in</strong> Monte Carlo (MCMC) methods. The key idea of<br />

the MCMC methods is described <strong>in</strong> the follow<strong>in</strong>g. For details we refer to<br />

[71], [3] p.353 or [55]. Suppose we want to generate a sample from an a<br />

posteriori distribution (jx) for 2 IR k , but cannot directly do this.<br />

However, suppose we are able to construct a Markov cha<strong>in</strong> with state space<br />

and with equilibrium distribution (jx). Then under suitable regularity<br />

conditions asymptotic results exist, show<strong>in</strong>g <strong>in</strong> which sense the sample output<br />

from such acha<strong>in</strong> with equilibrium distribution (jx) can be used to mimic<br />

a random sample from (jx) or to estimate the expected value of a function<br />

51


f() with respect to (jx). If 1 ; 2 ;:::; t ;:::is a realization from a suitable<br />

cha<strong>in</strong> then<br />

t ,! <br />

<strong>in</strong> distribution as t tends to <strong>in</strong>nity, (jx) and<br />

1<br />

t<br />

tX<br />

i=1<br />

f( i ) ,! E jx [f()] a.s.<br />

as t tends to <strong>in</strong>nity. Nowwe need algorithms to construct such cha<strong>in</strong>s with<br />

specied equilibrium distributions. We will discuss two particular forms of<br />

Markov cha<strong>in</strong> schemes, the Gibbs sampl<strong>in</strong>g algorithm and the Metropolis{<br />

Hast<strong>in</strong>gs algorithm.<br />

The Gibbs sampl<strong>in</strong>g algorithm<br />

Suppose = ( 1 ;:::; k ) is the vector of unknown quantities. We want to<br />

simulate ( i j x) for i = 1;:::;k. Denote by (j x) = ( 1 ;:::; k j x) the<br />

jo<strong>in</strong>t density and by ( i j x; j ;j 6= i) the so called <strong>in</strong>duced full conditional<br />

densities for each of the components i given values of the other components<br />

j , j 6= i, for i =1;:::;k. Suppose that we are able to sample from each of<br />

these one-dimensional distributions.<br />

The Gibbs sampl<strong>in</strong>g algorithm (see Geman and Geman [29]):<br />

1) choose arbitrary start<strong>in</strong>g po<strong>in</strong>ts 0 =( 0 1;:::; 0 k).<br />

2) make random draw<strong>in</strong>gs from the full conditional distribution as follows<br />

1 1<br />

from ( 1 jx; 0 j ;j 6= 1)<br />

1 2<br />

from ( 2 jx; 1 1; 0 3;:::; 0 k)<br />

1 3<br />

from ( 3 jx; 1 1; 1 2; 0 4;:::; 0 k)<br />

:::<br />

1 k from ( k jx; i j;j 6= k)<br />

This completes a transition from 0 =( 0 1;:::; 0 k) to 1 =( 1 1;:::; 1 k).<br />

3) iterat<strong>in</strong>g step 2) produces a sequence 0 ; 1 ;:::; t ;::: which is a realization<br />

of a Markov cha<strong>in</strong> with transition probability K from t to<br />

t+1 kY<br />

K( t ; t+1 )=<br />

l=1<br />

t+1<br />

l<br />

x; t j;j >l; t+1<br />

j ;j


The important property of the Gibbs sampl<strong>in</strong>g algorithm is that we only sample<br />

from the full conditional distributions. As t tends to <strong>in</strong>nity, ( t 1;:::; t k)<br />

tends <strong>in</strong> distribution to a random vector with jo<strong>in</strong>t density (jx), see e.g.<br />

[29]. In particular, t i tends <strong>in</strong> distribution to a random quantity with density<br />

( i jx). The above algorithm is assumed to be replicated m times <strong>in</strong>dependently,<br />

that means we have m replicates of t =( t 1;:::; t k). Then for large t<br />

the replicates ( t i1;:::; t im) are approximately a random sample from ( i jx).<br />

For m suciently large we obta<strong>in</strong> an estimate ^( i jx) for ( i jx) from<br />

^( i jx) = 1 m<br />

mX<br />

l=1<br />

For more details we refer to [55], p.226.<br />

The Metropolis{Hast<strong>in</strong>gs algorithm<br />

( i jx; t jl;j 6= i):<br />

In order to construct a Markov cha<strong>in</strong> 1 ;:::; t ;::: with state space and<br />

equilibrium distribution (jx) we nd the transition probability from t <br />

to the next state t+1 via the Metropolis{Hast<strong>in</strong>gs algorithm as follows:<br />

Denote by q(; 0 ) a transition probability function such that <strong>in</strong> the case<br />

t = , 0 drawn from q(; 0 ) is considered as a proposed possible value for<br />

t+1 .However, a further randomization takes place. With probability (; 0 )<br />

we accept t+1 = 0 , otherwise we reject the value generated from q(; 0 ) and<br />

set t+1 = .<br />

This construction denes a Markov cha<strong>in</strong> with transition probability given<br />

by<br />

(<br />

(; 0 q(; <br />

)=<br />

0 )(; 0 ); if 0 6= ;<br />

1 , P 00 q(; 00 )(; 00 ); if 0 = :<br />

With the denition<br />

(; 0 )=<br />

8<br />

><<br />

>:<br />

( )<br />

( 0 jx)q( 0 ;)<br />

m<strong>in</strong><br />

(jx)q(; 0 ) ; 1 ; if (jx) q(; 0 ) > 0;<br />

1; if (jx) q(; 0 )=0;<br />

provided that q(; 0 ) is chosen to be irreducible and aperiodic on a suitable<br />

state space, we have that (jx) is the equilibrium distribution of the<br />

constructed cha<strong>in</strong>. For more details see [71].<br />

Hierarchical models<br />

An hierarchical model forms a structure <strong>in</strong> the follow<strong>in</strong>g way: the distribution<br />

of the data x is written conditionally on parameters 1 as p(xj 1 ),<br />

and the distribution of 1 is written conditionally on 'hyperparameters' 2 as<br />

53


p( 1 j 2 ) and we have the a priori distribution of 2 , ( 2 ). This is the so called<br />

three-stage hierarchical model which is considered <strong>in</strong> many applications. The<br />

three stages are x, 1 and 2 as above, that means a three-stage model has<br />

the structure p(xj 1 ), p( 1 j 2 ), ( 2 ). We could cont<strong>in</strong>ue this process and<br />

write the distribution of 2 conditionally on other hyperparameters 3 as<br />

p( 2 j 3 ) and so on. We obta<strong>in</strong> the k-stage hierarchical model p(xj 1 ), p( 1 j 2 ),<br />

p( 2 j 3 );:::;p( k,2 j k,1 );( k,1 ).<br />

The model <strong>in</strong>cludes the assumption of conditionally <strong>in</strong>dependence: condition<strong>in</strong>g<br />

on j , the parameters x; 1 ; 2 ;:::; j,1 and the parameters j+1 ;:::; k,1<br />

are <strong>in</strong>dependent. Especially consider the three-stage model with p(xj 1 ),<br />

p( 1 j 2 ), ( 2 ). The distribution p(xj 1 ) is formally the distribution of x<br />

given 1 and 2 . However, if we know 1 then know<strong>in</strong>g 2 would not add<br />

any <strong>in</strong>formation about x, this means x and 2 are <strong>in</strong>dependent given 1 .<br />

By the Bayesian analysis of such a three-stage hierarchical model the special<br />

structure allows us to write the a posteriori distribution ( 1 ; 2 jx) <strong>in</strong> the<br />

form<br />

( 1 ; 2 jx) / p(xj 1 ; 2 )p( 1 ; 2 )<br />

/ p(xj 1 )p( 1 j 2 )( 2 );<br />

and <strong>in</strong>ference for 2 is given by its marg<strong>in</strong>al distribution<br />

( 2 jx) / p(xj 2 )( 2 ):<br />

The marg<strong>in</strong>al distributions to be used for <strong>in</strong>ference for the parameters can<br />

be calculated by MCMC methods.<br />

Example (Hierarchical model)<br />

An example of an hierarchical three-stage stochastic model can be found <strong>in</strong><br />

[42], where the conditional variance follows a log-AR(1) process:<br />

q<br />

y t = h t u t<br />

ln h t = + ln h t,1 + t ;<br />

where (u t ;v t ) N(0; 1) <strong>in</strong>dependent, or more generally we can consider the<br />

log-AR(p)-model<br />

q<br />

y t = h t u t<br />

ln h t = + 1 ln h t,1 + 2 ln h t,2 + :::+ p ln h t,p + t ;<br />

where (u t ;v t ) N(0; 1) <strong>in</strong>dependent. In this model the time series of the data<br />

y is generated from a probability model p(yjh) where h denotes the vector of<br />

54


volatilities. The volatilities h are unobserved and are assumed to be generated<br />

by p(hj) with T =(; ; ). Denote 1 = h, 2 = , T =( 1 ;:::; p ). Via<br />

the Bayes formula we have<br />

( 1 ; 2 jy) / p(yj 1 )p( 1 j 2 )( 2 ):<br />

The marg<strong>in</strong>al distributions p( 1 jy), p( 2 jy) can be calculated by us<strong>in</strong>g MCMC<br />

methods as Gibbs sampl<strong>in</strong>g or Metropolis{Hast<strong>in</strong>gs algorithm.<br />

55


Chapter 4<br />

Nonparametric <strong>Estimation</strong><br />

4.1 Diusion models<br />

Nonparametric estimation <strong>in</strong> general deals with estimat<strong>in</strong>g functionals <strong>in</strong> situations<br />

where the latter are not determ<strong>in</strong>ed by a nite number of parameters,<br />

e.g. estimat<strong>in</strong>g a probability density at some xed po<strong>in</strong>t, derivatives of a density,<br />

and others. For a thorough treatment of nonparametric estimation we<br />

refer to Ibragimov and Has'm<strong>in</strong>skii [38], x4 and x7.<br />

We consider estimation of a probability density at a po<strong>in</strong>t based on observations<br />

<strong>in</strong> IR. Suppose X 1 ;:::;X n is a sample from a population random<br />

variable X tak<strong>in</strong>g values <strong>in</strong> IR with unknown density f(x). If we only know<br />

that f(x) belongs to a class F of functions, estimat<strong>in</strong>g f(x) typically is an<br />

<strong>in</strong>nite dimensional nonparametric problem. Us<strong>in</strong>g the function<br />

(<br />

0; x


tion<br />

F (x) =<br />

Z x<br />

,1<br />

f(y)dy:<br />

The latter statement can be made precise for <strong>in</strong>stance through the Borel-<br />

Cantelli Lemma and its various renements (see [68]) .<br />

In order to construct a density estimator f n (x) start<strong>in</strong>g from the empirical<br />

distribution function F n (x) <strong>in</strong> (4.1), we have to smooth F n . One possibility<br />

is by us<strong>in</strong>g so-called kernel density estimators<br />

nX x , Xk<br />

f n (x) = 1<br />

nh n<br />

k=1<br />

where the kernel V :IR ,! IR + satises<br />

Z 1<br />

,1<br />

V<br />

V (x)dx =1;<br />

and where the so-called bandwidth sequence (h n )issuch that<br />

h n<br />

h n ,! 0; nh n ,!1:<br />

This class of estimators is referred to as the Parzen-Rosenblatt estimators.<br />

Under certa<strong>in</strong> restrictions on f(x) the sequence f n (x) converges <strong>in</strong> some<br />

probabilistic sense to f(x). See [38], x4.4, and [69] for a detailed discussion.<br />

Ibragimov and Has'm<strong>in</strong>skii [38], x7, present some <strong>in</strong>terest<strong>in</strong>g approaches to<br />

and examples of nonparametric estimators of a signal S(t), belong<strong>in</strong>g to a<br />

certa<strong>in</strong> functional space. Two models for signals are considered. A rst one<br />

has the form<br />

dX(t) =S(t)dt + "db(t); (4.2)<br />

with 0 t 1, (b(t)) a Wiener process, " small (typically " # 0) and X(t) is<br />

an observed signal on [0; 1]. A second model is of the form<br />

<br />

;<br />

dX(t) =S(t)dt + db(t); (4.3)<br />

with 0 t n, S(t) a one-periodic function and X(t) is observed over n<br />

time periods.<br />

As a basic example consider estimation of the functional F on S(t)<br />

F (S) =<br />

Z 1<br />

0<br />

f(t)S(t)dt;<br />

with S; f 2 L 2 (0; 1). The functional F has to be estimated based on observations<br />

of (4.2), respectively (4.3). The estimator<br />

^F 1 =<br />

Z 1<br />

0<br />

f(t)dX(t)<br />

57


for model (4.2) and the estimator<br />

^F 2 = 1 n<br />

Z n<br />

0<br />

f(t)dX(t)<br />

for model (4.3) both can be shown to have good properties. Both estimators<br />

are normally distributed with mean F (S) and variance " 2 jjfjj 2 , respectively<br />

jjfjj 2 =n. We refer to [38], x7.4 and x7.5, for more details.<br />

4.2 Discrete models<br />

In section 3.2 we dealt with parametrizations of volatility, e.g. with the<br />

ARCH(q) process f" t g (3.67), where " t jF t,1 N (0;t 2 ), F t be<strong>in</strong>g the <strong>in</strong>formation<br />

-algebra at time t, and<br />

2 t = 0 +<br />

qX<br />

i=1<br />

i " 2 t,i: (4.4)<br />

Instead of consider<strong>in</strong>g a parametric approach as (4.4) we now discuss nonparametric<br />

techniques for the estimation of the volatility 2 t .<br />

Basicly there are two dierent nonparametric approaches: kernel estimators<br />

and Fourier type estimators. For details we refer to Pagan and Schwert [57].<br />

In order to estimate s<br />

2 = E(" 2 sjF s,1 ), kernel methods essentially use a<br />

weighted sum over " 2 j, j =1;:::;T, j 6= s:<br />

^ 2 s =<br />

TX<br />

j=1<br />

j6=s<br />

w (s)<br />

j " 2 j; (4.5)<br />

where the sum over all (T , 1) weights w (s)<br />

j , j =1;:::;T, j 6= s, equals one.<br />

The aim is to obta<strong>in</strong> an estimate of E (" 2 sj" s,1 ;" s,2 ;:::;" s,m ) for m suitably<br />

chosen. If the preced<strong>in</strong>g values of " j , that is " j,1 ;" j,2 ;:::;" j,m , are similar<br />

to the preced<strong>in</strong>g values of " s , that is " s,1 ;" s,2 ;:::;" s,m , then " 2 j is expected<br />

to give useful <strong>in</strong>formation about E(" 2 sj" s,1 ;" s,2 ;:::;" s,m ). In this case, the<br />

weight w (s)<br />

j is large. If the values that preceded " j dier 'substantially' from<br />

the preced<strong>in</strong>g values of " s , then " 2 j is expected to give only little <strong>in</strong>formation<br />

about E (" 2 sj" s,1 ;" s,2 ;:::;" s,m ) and therefore w (s)<br />

j is close to zero. We remark<br />

that <strong>in</strong> the sum (4.5) the element w s<br />

(s) " 2 s is left out to avoid the situation where<br />

'outliers' <strong>in</strong> the data lead to a weight w s<br />

(s) close to unity, while all the other<br />

w (s)<br />

j are close to zero, so that only " 2 s determ<strong>in</strong>es ^ s.<br />

2<br />

58


There are many possible weight<strong>in</strong>g schemes. A popular one uses a Gaussian<br />

kernel. With the notation z 0 j =(" j,1 ;" j,2 ;:::;" j,m )we have<br />

w (s)<br />

j =<br />

where K() istheGaussian kernel<br />

K(z s , z j )=<br />

1<br />

q<br />

2jHj<br />

exp<br />

K(z s , z j )<br />

P Tr=1<br />

K(z r , z s ) ;<br />

<br />

, 1 <br />

2 (z s , z j ) 0 H (z s , z j ) ;<br />

with H = diag(h 1 ;:::;h m ) conta<strong>in</strong><strong>in</strong>g the bandwidths that were set to<br />

^ k T ,1=(4+m) , where ^ k is the sample standard deviation of " s,k , the kth<br />

component of z s , k =1;:::;m.<br />

Another approach to nonparametric estimation of volatility is an approximation<br />

us<strong>in</strong>g series expansion. The most frequently used series expansion <strong>in</strong><br />

economics is the Flexible Fourier Form (FFF) <strong>in</strong>troduced by Gallant [28],<br />

which leads to avolatility estimate of the form<br />

( mX j <br />

)<br />

2X<br />

^ t 2 = 0 + " t,j + j " 2 t,j + ( jk cos(k" t,j )+ jk s<strong>in</strong>(k" t,j )) ;<br />

j=1<br />

k=1<br />

that means t 2 is estimated by a sum of a low-order polynomial and trigonometric<br />

terms based on " t,j , j =1;:::;m. Note the disadvantage of the FFF<br />

that the estimates of t<br />

2 may benegative. We refer to [57] for more details.<br />

59


Chapter 5<br />

Some Diusion <strong>Models</strong> with<br />

Explicit Solutions<br />

In this chapter our representation follows [45], x4.2-4.4.<br />

5.1 L<strong>in</strong>ear stochastic dierential equations<br />

We rst consider the class of l<strong>in</strong>ear stochastic dierential equations. To simplify<br />

the exposition we concentrate on the scalar case and afterwards give<br />

the extension to the multivariate case.<br />

The general form of a scalar l<strong>in</strong>ear stochastic dierential equation is<br />

dX t =(a 1 (t)X t + a 2 (t)) dt +(b 1 (t)X t + b 2 (t)) dW t ; (5.1)<br />

where a 1 ;a 2 ;b 1 ;b 2 are "nice" 1 functions of time t or constants. As <strong>in</strong> the<br />

case of ord<strong>in</strong>ary dierential equations, the method of solution also <strong>in</strong>volves<br />

a fundamental solution of an associated homogeneous dierential equation.<br />

Here the homogeneous l<strong>in</strong>ear equation belong<strong>in</strong>g to the general equation (5.1)<br />

is<br />

dX t = a 1 (t)X t dt + b 1 (t)X t dW t : (5.2)<br />

In order to nd its fundamental solution t;t0 , that means the solution of<br />

(5.2) satisfy<strong>in</strong>g t0 ;t 0<br />

= 1, we rst look at the special case b 1 (t) 0, that<br />

means at the ord<strong>in</strong>ary dierential equation<br />

dX t<br />

dt<br />

= a 1 (t)X t : (5.3)<br />

1 They are Lebesgue measurable and bounded on an <strong>in</strong>terval 0 t T . Then a unique<br />

(strong) solution X t on t 0 t T exists for each 0 t 0


We know the fundamental solution of (5.3)<br />

or equivalently<br />

t;t0 = exp<br />

Z t<br />

t 0<br />

a 1 (s)ds<br />

<br />

; (5.4)<br />

d [ln t;t0 ]=a 1 (t)dt: (5.5)<br />

Equation (5.5) motivates us to consider ln t;t0 also <strong>in</strong> the general case with<br />

t;t0 solv<strong>in</strong>g equation (5.2). Apply<strong>in</strong>g the It^o formula 2 on ln t;t0 we obta<strong>in</strong><br />

<br />

d [ln t;t0 ]= a 1 (t) , 1 <br />

2 b2 1(t) dt + b 1 (t)dW t ; (5.6)<br />

thus, with t0 ;t 0<br />

=1,<br />

or<br />

ln t;t0 =<br />

Z t<br />

<br />

a 1 (s) , 1 Z t<br />

t 0 2 b2 1(s) ds + b 1 (s)dW s ;<br />

t 0<br />

Z t<br />

<br />

t;t0 = exp a 1 (s) , 1 Z t<br />

<br />

t 0 2 b2 1(s) ds + b 1 (s)dW s : (5.7)<br />

t 0<br />

Now that we derived the solution t;t0 of the homogeneous l<strong>in</strong>ear equation<br />

(5.2) we want to obta<strong>in</strong> the general solution X t of (5.1). In an analogous way<br />

to (5.6) we derive by means of the It^o formula 3<br />

d[ ,1<br />

t;t 0<br />

]= ,a 1 (t)+b 2 1(t) ,1<br />

t;t 0<br />

dt , b 1 (t) ,1<br />

t;t 0<br />

dW t : (5.8)<br />

As we will see later we have to consider the process ,1<br />

t;t 0<br />

X t <strong>in</strong> order to<br />

nd an explicit expression for X t . The processes ,1<br />

t;t 0<br />

and X t both <strong>in</strong>clude<br />

the same Wiener process W t , and (5.2) and (5.8) can be seen as a twodimensional<br />

stochastic dierential equation. In order to calculate d( ,1<br />

t;t 0<br />

X t )<br />

we therefore have to use the It^o formula for vector valued processes 4 with<br />

the transformation U(X t ; ,1<br />

t;t 0<br />

)= ,1<br />

t;t 0<br />

X t .We obta<strong>in</strong><br />

d[ ,1<br />

t;t 0<br />

X t ]=(a 2 (t) , b 1 (t)b 2 (t)) ,1<br />

t;t 0<br />

dt + b 2 (t) ,1<br />

t;t 0<br />

dW t :<br />

Hence with t0 ;t 0<br />

=1we get by <strong>in</strong>tegration<br />

,1<br />

t;t 0<br />

X t = X t0 +<br />

2 see Appendix C.1<br />

3 see Appendix C.1<br />

4 see Appendix C.2<br />

Z t<br />

t 0<br />

(a 2 (s) , b 1 (s)b 2 (s)) ,1<br />

s;t 0<br />

ds +<br />

Z t<br />

t 0<br />

b 2 (s) ,1<br />

s;t 0<br />

dW s ;<br />

61


and thus<br />

X t = t;t0<br />

X t0 +<br />

Z t<br />

t 0<br />

(a 2 (s) , b 1 (s)b 2 (s)) ,1<br />

s;t 0<br />

ds +<br />

Z t<br />

<br />

b 2 (s) ,1<br />

s;t 0<br />

dW s : (5.9)<br />

t 0<br />

Summariz<strong>in</strong>g: the solution X t of the general scalar l<strong>in</strong>ear stochastic dierential<br />

equation<br />

is<br />

X t = t;t0<br />

X t0 +<br />

dX t =(a 1 (t)X t + a 2 (t)) dt +(b 1 (t)X t + b 2 (t)) dW t ; (5.10)<br />

Z t<br />

with the fundamental solution<br />

t;t0 = exp<br />

t 0<br />

(a 2 (t) , b 1 (t)b 2 (t)) ,1<br />

t;t 0<br />

ds +<br />

Z t<br />

<br />

a 1 (s) , 1 <br />

t 0 2 b2 1(s)<br />

ds +<br />

Z t<br />

Z t<br />

<br />

b 2 (t) ,1<br />

s;t 0<br />

dW s ; (5.11)<br />

t 0<br />

<br />

b 1 (s)dW s : (5.12)<br />

t 0<br />

All the special cases { as only additive or only multiplicative noise, homogeneous<br />

or <strong>in</strong>homogeneous equations, constant or variable coecients { are<br />

conta<strong>in</strong>ed <strong>in</strong> the general case (5.10).<br />

At this po<strong>in</strong>t we give a short extension of the scalar to the multidimensional<br />

case. The general form of a d-dimensional l<strong>in</strong>ear stochastic dierential equation<br />

is<br />

dX t =(A(t)X t + a(t))dt +<br />

t 0<br />

,1<br />

s;t 0<br />

a(s) ,<br />

i=1<br />

mX<br />

i=1<br />

(B i (t)X t + b i (t))dWt i ; (5.13)<br />

where A(t);B 1 (t);B 2 (t);:::;B m (t) are d d-matrix functions, a(t), b 1 (t),<br />

b 2 (t) :::;b m (t) are d-dimensional vector functions and W = fW t ;t 0g is<br />

an m-dimensional Wiener process with components Wt 1 ;Wt 2 ;:::;Wt<br />

m , which<br />

are <strong>in</strong>dependent scalar Wiener processes. Us<strong>in</strong>g the same arguments as for<br />

the scalar case above we nd the solution of (5.13)<br />

Z t<br />

!<br />

mX<br />

mX Z t<br />

#<br />

X t = t;t0<br />

"X t0 +<br />

B i (s)b i (s) ds +<br />

;<br />

i=1<br />

t 0<br />

,1<br />

s;t 0<br />

b i (s)dW i s<br />

(5.14)<br />

where t;t0 is the d d fundamental matrix with t0 ;t 0<br />

= I and satisfy<strong>in</strong>g<br />

the homogeneous matrix stochastic dierential equation<br />

d t;t0 = A(t) t;t0 dt +<br />

mX<br />

i=1<br />

B i (t) t;t0 dW i<br />

t ; (5.15)<br />

which can be seen column vector by column vector as d vector stochastic differential<br />

equations. We remark that unlike <strong>in</strong> the case of scalar homogeneous<br />

l<strong>in</strong>ear equations we cannot generally solve (5.15) for t;t0 explicitly.<br />

62


5.2 Nonl<strong>in</strong>ear stochastic dierential equations<br />

In the previous section we saw how to solve a scalar l<strong>in</strong>ear stochastic dierential<br />

equation (5.10) explicitly. Under special conditions also some classes<br />

of nonl<strong>in</strong>ear stochastic dierential equations can be solved explicitly by a<br />

reduction to a l<strong>in</strong>ear equation. In the follow<strong>in</strong>g we will exam<strong>in</strong>e those classes<br />

and their reductions <strong>in</strong> the scalar case. We remark that the two models (1.4)<br />

and (1.5) considered <strong>in</strong> chapter 1 do not belong to these classes.<br />

Certa<strong>in</strong> nonl<strong>in</strong>ear stochastic dierential equations<br />

dX t = a(t; X t ) dt + b(t; X t ) dW t ; (5.16)<br />

with a(t; X t );b(t; X t ) dierentiable functions, can be reduced under special<br />

conditions to l<strong>in</strong>ear stochastic dierential equations<br />

dY t =(a 1 (t)Y t + a 2 (t))dt +(b 1 (t)Y t + b 2 (t)) dW t (5.17)<br />

by a transformation U(t; X t )=Y t .We knowby means of the Inverse Function<br />

Theorem that <strong>in</strong> case of @U (t; x) 6= 0 a local <strong>in</strong>verse x = V (t; y)ofy = U(t; x)<br />

@x<br />

exists.<br />

Our purpose is to determ<strong>in</strong>e conditions for a and b <strong>in</strong> (5.16), so that such<br />

a suitable transformation U(t; x) = y (with @U (t; x) 6= 0) and coecient<br />

@x<br />

functions a 1 (t);a 2 (t);b 1 (t) and b 2 (t) can be found. Then we solve the l<strong>in</strong>ear<br />

equation (5.17) explicitly (by means of (5.11)) and with X t = V (t; Y t )we get<br />

a solution of (5.16).<br />

In order to determ<strong>in</strong>e conditions for the reducibility of (5.16) to (5.17), that<br />

means conditions for a and b,we apply the It^o formula 5 to the transformation<br />

U(t; x)<br />

dU(t; X t )=<br />

" @U<br />

@t + a@U @x + 1 #<br />

@2 U<br />

2 b2 dt + b @U<br />

@x 2 @x dW t: (5.18)<br />

A transformation U exists, if equations (5.17) and (5.18) are the same, that<br />

means if the follow<strong>in</strong>g conditions for the coecients are satised:<br />

and<br />

@U<br />

(t; x)+a@U<br />

@t @x (t; x)+1 2 b2 (t; x) @2 U<br />

@x (t; x) =a 1(t)U(t; x)+a 2 2 (t) (5.19)<br />

5 see Appendix C.1<br />

b(t; x) @U<br />

@x (t; x) =b 1(t)U(t; x)+b 2 (t): (5.20)<br />

63


If we do not specialize at this po<strong>in</strong>t, we are not able to calculate much more<br />

further and therefore we restrict ourselves to two cases.<br />

First, we consider the case a 1 (t) b 1 (t) 0 and denote a 2 (t) =: (t)<br />

and b 2 (t) =: (t). In order to obta<strong>in</strong> conditions for a and b it is useful<br />

to dierentiate equation (5.19) with respect to x and equation (5.20) with<br />

respect to t:<br />

!<br />

@ 2 U @<br />

(t; x)+ a(t; x)@U<br />

@t@x @x @x (t; x)+1 2 b2 (t; x) @2 U<br />

@x (t; x) =0 (5.21)<br />

2<br />

and<br />

b(t; x) @2 U<br />

(t; x)+@b(t; x)@U<br />

@t@x @t @x (t; x) =0 (t): (5.22)<br />

By means of (5.21) and (5.22) we obta<strong>in</strong><br />

"<br />

0 1 @b(t; x)<br />

(t) =(t)<br />

(t; x) , b(t; x) @ a(t; x)<br />

b(t; x) @t<br />

@x b(t; x) , 1 !#<br />

@b<br />

2 @x (t; x) :<br />

(5.23)<br />

In (5.21) and hence <strong>in</strong> (5.23) we assumed that the left hand side of (5.19) is<br />

<strong>in</strong>dependent of x, <strong>in</strong> other words, we assumed that (t) can be determ<strong>in</strong>ed.<br />

S<strong>in</strong>ce is <strong>in</strong>dependent ofx we now follow the condition for the determ<strong>in</strong>ation<br />

of from equation (5.23)<br />

@f<br />

(t; x) =0; (5.24)<br />

@x<br />

where<br />

f(t; x) = 1<br />

b(t; x)<br />

@b(t; x)<br />

(t; x) , b(t; x) @<br />

@t<br />

@x<br />

a(t; x)<br />

b(t; x) , 1 !<br />

@b<br />

2 @x (t; x) : (5.25)<br />

If condition (5.24) is satised we are able to determ<strong>in</strong>e and . Thus condition<br />

(5.24) is sucient for the reducibility of equation (5.16) to the l<strong>in</strong>ear<br />

equation<br />

dX t = (t)dt + (t)dW t (5.26)<br />

by means of a transformation U. Integrat<strong>in</strong>g equations (5.20) and (5.23) we<br />

obta<strong>in</strong> the explicit expression<br />

U(t; x) =C exp<br />

with an arbitrary constant C.<br />

Z t<br />

0<br />

Z x<br />

f(s; x)ds<br />

0<br />

1<br />

dz; (5.27)<br />

b(t; z)<br />

64


The second case to consider is the time-<strong>in</strong>dependent case. We want to reduce<br />

the nonl<strong>in</strong>ear autonomous stochastic dierential equation<br />

to the l<strong>in</strong>ear stochastic dierential equation<br />

dX t = a(X t )dt + b(X t )dW t (5.28)<br />

dY t =(a 1 Y t + a 2 )dt +(b 1 Y t + b 2 )dW t ; (5.29)<br />

by means of a suitable transformation Y t = U(X t ).<br />

Equations (5.19) and (5.20) simplify to<br />

a(x) dU<br />

dx (x)+1 2 b2 (x) d2 U<br />

dx 2 (x) =a 1U(x)+a 2 (5.30)<br />

and<br />

b(x) dU<br />

dx (x) =b 1U(x)+b 2 : (5.31)<br />

We assume b(x) 6 0 and b 1 6= 0 and conclude from (5.31)<br />

with an arbitrary constant C and<br />

U(x) =C exp(b 1 h(x)) , b 2<br />

b 1<br />

(5.32)<br />

h(x) =<br />

Z x<br />

x 0<br />

ds<br />

b(s) : (5.33)<br />

As <strong>in</strong> the previous case we want to nd conditions for a and b. Therefore<br />

we substitute (5.32) <strong>in</strong> (5.30), dierentiate, multiply with<br />

,1<br />

dU<br />

dx =<br />

b(x)<br />

exp(,b 1 h(x)) and obta<strong>in</strong> an expression with only a 1 on the right hand<br />

b 1<br />

side. Dierentiat<strong>in</strong>g aga<strong>in</strong> we get the follow<strong>in</strong>g condition<br />

b 1<br />

dA<br />

dx (x)+ d<br />

dx<br />

!<br />

b(x)dA dx (x) =0 (5.34)<br />

with<br />

A(x) = a(x)<br />

b(x) , 1 db<br />

(x): (5.35)<br />

2 dx<br />

This condition is satised <strong>in</strong> the trivial case dA = 0 for any b dx 1, or <strong>in</strong> the case<br />

0 1<br />

d<br />

@ d<br />

dx b<br />

dA<br />

dx<br />

A =0; (5.36)<br />

dx<br />

dA<br />

dx<br />

65


where we assume the choice of b 1<br />

b 1 = ,<br />

d<br />

dx<br />

<br />

b<br />

dA<br />

dx<br />

dA<br />

dx<br />

<br />

: (5.37)<br />

If b 1 6= 0 then we may choose b 2 = 0 and use the transformation<br />

U(x) =C exp(b 1 h(x)): (5.38)<br />

If b 1 =0we use<br />

U(x) =b 2 h(x)+C: (5.39)<br />

Now we apply the derived theory to three groups of examples.<br />

Example 1 The stochastic dierential equation<br />

dX t = 1 2 g(X t)g 0 (X t )dt + g(X t )dW t ; (5.40)<br />

where g is a given dierentiable function, is reducible with the general solution<br />

X t = h ,1 (W t + h(X 0 )); (5.41)<br />

with<br />

h(x) =<br />

Z x<br />

x 0<br />

ds<br />

g(s) : (5.42)<br />

In the follow<strong>in</strong>g we show how to obta<strong>in</strong> the general solution (5.41). With<br />

the notation of the theory above we see that A(x) = 0, and hence (5.34) is<br />

satised for all b 1 . We choose b 1 =0,b 2 = 1 and obta<strong>in</strong><br />

U(x) =h(x)+C:<br />

Insert<strong>in</strong>g U <strong>in</strong> (5.30) gives a 1 (h(x)+C)+a 2 =0,and with a 1 =0,a 2 =0,<br />

(5.29) reduces to dY t = dW t with the solution Y t = Y 0 + W t : With C =0we<br />

have Y t = h(X t ), especially Y 0 = h(X 0 ), and hence obta<strong>in</strong> (5.41).<br />

We remark that <strong>in</strong> the special case (5.40) we may nd a solution <strong>in</strong> another<br />

more pleasantway. Equation (5.40) is equivalent to the Stratonovich stochastic<br />

dierential equation (for the Stratonovich <strong>in</strong>tegral see e.g. [56], p.16, or<br />

[45], x4.9)<br />

dX t = b(X t ) dW t :<br />

66


That means, <strong>in</strong>stead of reduc<strong>in</strong>g (5.40) to a l<strong>in</strong>ear stochastic dierential<br />

equation we may <strong>in</strong>tegrate the Stratonovich stochastic dierential equation<br />

directly as well, obta<strong>in</strong><strong>in</strong>g (5.41) at once.<br />

Applications of Example 1:<br />

dX t = 1 2 a2 X t dt + aX t dW t ;<br />

X t = X 0 exp(aW t ): (5.43)<br />

a(a , 1)X<br />

1,2=a<br />

t<br />

dX t = 1 2<br />

X t = W t + X 1=a<br />

0<br />

dt + aX 1,1=a<br />

t dW t ;<br />

a<br />

: (5.44)<br />

q<br />

dX t = 1dt +2<br />

X t =<br />

<br />

W t +<br />

X t dW t ;<br />

q<br />

X 0<br />

2<br />

: (5.45)<br />

dX t = 1 2 a2 mX 2m,1<br />

t dt + aX m t dW t ; m 6= 1;<br />

X t = X 1,m<br />

0<br />

, a(m , 1)W t<br />

1=(1,m)<br />

: (5.46)<br />

dX t = 1 2 X tdt +<br />

q<br />

X 2 t +1dW t ;<br />

X t = s<strong>in</strong>h(W t + arcs<strong>in</strong>hX 0 ): (5.47)<br />

Example 2 The stochastic dierential equation<br />

<br />

dX t = g(X t )+ 1 <br />

2 g(X t)g 0 (X t ) dt + g(X t )dW t ; (5.48)<br />

with a given dierentiable function g, is reducible with the general solution<br />

X t = h ,1 (t + W t + h(X 0 )); (5.49)<br />

67


where h is given by (5.42).<br />

Aga<strong>in</strong> we want to show how (5.49) can be derived. We see A(x) = , and<br />

hence b 1 can be chosen arbitrarily. We choose b 1 =0and b 2 = 1 and obta<strong>in</strong><br />

U(x) =h(x)+C:<br />

Insert<strong>in</strong>g U <strong>in</strong> (5.30) gives = h(x)a 1 + a 2 ; and hence a 1 =0,a 2 = . We<br />

have dY t = dt + dW t with its solution Y t = t + W t + Y 0 . With C =0we<br />

have Y t = h(x t ), especially Y 0 = h(X 0 ), and we obta<strong>in</strong> (5.49).<br />

As <strong>in</strong> the previous example we remark that (5.48) is equivalent to the<br />

Stratonovich stochastic dierential equation<br />

dX t = b(X t )dt + b(X t ) dW t ;<br />

which can be <strong>in</strong>tegrated directly <strong>in</strong> order to obta<strong>in</strong> the general solution (5.49).<br />

The applications of example 1 we gave can all be modied to represent applications<br />

to example 2. For <strong>in</strong>stance, consider<br />

dX t =<br />

1<br />

q q<br />

2 X t + Xt 2 +1 dt + Xt 2 +1dW t ;<br />

X t = s<strong>in</strong>h(t + W t + arcs<strong>in</strong>hX 0 ): (5.50)<br />

Example 3 The stochastic dierential equation<br />

<br />

dX t = g(X t )h(X t )+ 1 <br />

2 g(X t)g 0 (X t ) dt + g(X t )dW t ; (5.51)<br />

where g is a given dierentiable function and h is given by (5.42), can be<br />

reduced to the Langev<strong>in</strong> equation of the form<br />

dY t = ,aY t dt + bdW t ;<br />

with a = , and b = 1, and has the general solution<br />

X t = h ,1 <br />

e t h(X 0 )+e t Z t<br />

0<br />

e ,s dW s<br />

<br />

: (5.52)<br />

Aga<strong>in</strong> let us show how we derive the solution (5.52). We see A(x) = h(x)<br />

and by (5.37) we obta<strong>in</strong> b 1 =0.With b 2 =1we have<br />

U(x) =h(x)+C:<br />

68


Insert<strong>in</strong>g U <strong>in</strong> (5.30) gives h(x) =a 1 h(x)+a 2 and hence a 1 = and a 2 =0.<br />

We have dY t = Y t dt + dW t with its solution<br />

Z t<br />

Y t = e t Y 0 + e t e ,s dW s :<br />

With C =0wehave Y t = h(Y t ), especially Y 0 = h(X 0 ), and we obta<strong>in</strong> (5.52).<br />

The applications of example 1 we gave can all be modied to represent applications<br />

to example 3. For <strong>in</strong>stance, consider<br />

dX t =<br />

q<br />

2a<br />

X t = e at X 0 +<br />

X t X t +1<br />

Z t<br />

0<br />

0<br />

<br />

q<br />

dt +2 X t dW t ;<br />

e ,as dW s<br />

<br />

: (5.53)<br />

We remark that some examples of nonl<strong>in</strong>ear reducible stochastic dierential<br />

equations that are not <strong>in</strong>cluded <strong>in</strong> the preced<strong>in</strong>g three cases are listed <strong>in</strong> [45],<br />

p.124-126.<br />

F<strong>in</strong>ally, we aga<strong>in</strong> put stress on the fact that the two models (1.4) and (1.5)<br />

considered <strong>in</strong> chapter 1 are not reducible and cannot be solved explicitly.<br />

69


Appendix A<br />

The Kalman-Bucy Filter<br />

A.1 The cont<strong>in</strong>uous case<br />

For the Kalman-Bucy lter <strong>in</strong> the cont<strong>in</strong>uous case we refer to e.g. Kallianpur<br />

[43], x10. In our notation we follow ksendal [56], x4, pp. 46{68.<br />

A.1.1<br />

The general lter<strong>in</strong>g problem<br />

Consider<br />

dX t = b(t; X t ) dt + (t; X t ) dU t ; (System) (A.1)<br />

dZ t = c(t; X t ) dt + (t; X t ) dV t ; (Observations) (A.2)<br />

where U and V are <strong>in</strong>dependent Brownian Motions, U t p-dimensional, V t<br />

r-dimensional.<br />

The lter<strong>in</strong>g problem is the follow<strong>in</strong>g. Given the observations Z s satisfy<strong>in</strong>g<br />

(A.2) for 0 s t, what is the best estimate ^X t of the state X t of (A.1)<br />

based on these observations?<br />

To solve this problem we formulate it mathematically. Let G t be the -algebra<br />

generated by fZ s (); s tg, (; F;P) the probability space correspond<strong>in</strong>g<br />

to the (p + r)-dimensional Brownian motion (U t ;V t ) and<br />

K = fS : 7! IR n ; S 2 L 2 (; F;P); SG,measurableg :<br />

\Based on the observations Z s " means that the estimate ^X t is G t -measurable.<br />

^X t shall be \the best estimate based on the observations Z s " which means<br />

that<br />

E[jX t , ^X t j 2 ]= <strong>in</strong>f<br />

S2K fE[jX t , Sj 2 ]g:<br />

70


Let P Kt (X t ) denote the orthogonal projection from L 2 (; F;P)onto the subspace<br />

K t . The follow<strong>in</strong>g statement holds<br />

^X t = P Kt (X t )=E(X t jG) :<br />

We will concentrate on the l<strong>in</strong>ear case, which allows an explicit solution <strong>in</strong><br />

terms of a stochastic dierential equation for ^X t . This method is called the<br />

Kalman-Bucy lter.<br />

A.1.2<br />

The l<strong>in</strong>ear lter<strong>in</strong>g problem<br />

To focus on the ma<strong>in</strong> ideas consider only the one-dimensional case<br />

dX t = F (t)X t dt + C(t) dU t ; (System) (A.3)<br />

dZ t = G(t)X t dt + D(t) dV t ; (Observations) (A.4)<br />

where the real functions F (t), G(t), C(t), D(t) are bounded on bounded<br />

<strong>in</strong>tervals. Furthermore assume that D(t) is bounded away from 0 on bounded<br />

<strong>in</strong>tervals, Z 0 =0,X 0 is normally distributed and <strong>in</strong>dependent of fU t g, fV t g<br />

and E[X 0 ] = 0 (see ksendal [56], p. 48f). The ideas and techniques <strong>in</strong> the<br />

one-dimensional case can be extended <strong>in</strong> an analogous way to the multidimensional<br />

case.<br />

By means of some technical transformations of P Kt (X t ) and calculations we<br />

obta<strong>in</strong><br />

The Kalman-Bucy Filter<br />

The solution<br />

^X t = P Kt (X t )=E(X t jG)<br />

of the l<strong>in</strong>ear lter<strong>in</strong>g problem (A.3), (A.4) satises the stochastic dierential<br />

equation<br />

d ^X t =<br />

"<br />

#<br />

F (t) , G2 (t)S(t)<br />

D 2 (t)<br />

^X t dt + G(t)S(t)<br />

D 2 (t)<br />

dZ t ; ^X 0 =E[X 0 ] ;<br />

where S(t) =E[(X t , ^X t ) 2 ] satises the \Riccati-equation"<br />

(A.5)<br />

dS<br />

dt =2F (t)S(t) , G2 (t)<br />

D 2 (t) S2 (t)+C 2 (t) ;<br />

(A.6)<br />

with S(0) =E[(X 0 ,E[X 0 ]) 2 ].<br />

71


A.2 The discrete case<br />

For reasons of completeness we aga<strong>in</strong> note here the Kalman-Bucy lter for<br />

the discrete case, as already given <strong>in</strong> section 3.1.2.<br />

Consider the stochastic system<br />

X i = D i X i,1 + S i + " i ; i =1;:::;n; (System)<br />

(A.7)<br />

where fX i g n i=0<br />

are random d 1vectors, fD i g n i=0<br />

are non-random d d matrices,<br />

fS i g n i=1<br />

are non-random d1 vectors, X 0 N d (x 0 ;V 0 ), " i N d (0;V i ),<br />

i = 1;:::;n and X 0 , " 1 ;:::, " n are stochastically <strong>in</strong>dependent. Assume the<br />

observable quantities are Y 0 ;Y 1 ;:::;Y n given by<br />

Y i = T i X i + U i + e i ; i =0; 1;:::;n; (Observations)<br />

(A.8)<br />

where fT i g n i=0<br />

are non-random k d matrices (k d), fU i g n i=0<br />

are nonrandom<br />

k 1 vectors, e i N k (0;W i ) and X 0 ;" 1 ;:::;" m ;e 0 ;e 1 ;:::;e n are<br />

stochastically <strong>in</strong>dependent, i =0; 1;:::;n.<br />

The Kalman-Bucy lter<br />

Under some assumptions (see Pedersen [59], pp. 4{5), we have for given<br />

observations y 0 ;y 1 ;:::;y n of Y 0 ;Y 1 ;:::;Y n<br />

X i jY i = y i N d<br />

<br />

i (y i ); i<br />

<br />

; (A.9)<br />

X i jY i,1 = y i,1 N d<br />

<br />

Di i,1 (y i,1 )+S i ;R i<br />

<br />

; (A.10)<br />

Y i jY i,1 = y i,1 N d<br />

<br />

Ti (D i i,1 (y i,1 )+S i )+U i ;T i R i T T<br />

i<br />

+ W i<br />

<br />

;(A.11)<br />

where R i = D i i,1 D T i<br />

+ V i is positive denite, and where<br />

0 (y 0 ) = x 0 + V 0 T T 0<br />

<br />

T0 V 0 T T 0<br />

+ W 0<br />

,1<br />

(y0 , T 0 x 0 , U 0 ); (A.12)<br />

0 = V 0 , V 0 T T 0<br />

i (y i ) = D i i,1 (y i,1 )+S i + R i T T<br />

i<br />

<br />

T0 V 0 T T ,1<br />

0<br />

+ W 0 T0 V 0 ; (A.13)<br />

<br />

Ti R i Ti T ,1<br />

+ W i<br />

(y i , T i<br />

<br />

Di i,1 (y i,1 )+S i<br />

<br />

, Ui ) ; (A.14)<br />

i = R i , R i T T<br />

i (T i R i T T<br />

i + W i ) ,1 T i R i : (A.15)<br />

72


Appendix B<br />

Numerical Methods<br />

In our representation we follow [45], x9 and x10.<br />

B.1 The Euler Scheme<br />

The Euler approximation is a basic discrete time method to approximate an<br />

It^o process. Consider an It^o process X = fX(t);t 0 t T g follow<strong>in</strong>g the<br />

scalar stochastic dierential equation<br />

dX t = a(t; X t )dt + b(t; X t )dW t ;<br />

with t 0 t T and the <strong>in</strong>itial condition X t0 = X 0 . A discretization t 0 =<br />

t 0


we can write the Euler scheme (B.1) <strong>in</strong> the impressive form<br />

Y n+1 = Y n + a n + bW n<br />

(B.2)<br />

for n =0; 1;:::;N , 1.<br />

In order to compute the sequence fY n ;n = 0; 1;:::;N , 1g of values of<br />

the Euler approximation we have to generate the random <strong>in</strong>crements W n<br />

for n = 0; 1;:::;N , 1 of the Wiener process W = fW t ;t 0g. These<br />

<strong>in</strong>crements are <strong>in</strong>dependent Gaussian random variables with E(W n ) = 0<br />

and Var(W n )= n and can be generated by a random number generator<br />

(see e.g. [45], x1.3).<br />

For the multi-dimensional case of the Euler scheme see e.g. [45], x10.2.<br />

Note that when the diusion coecient b is identically zero the stochastic<br />

iterative scheme (B.2) reduces to the well-known determ<strong>in</strong>istic Euler scheme<br />

for the ord<strong>in</strong>ary dierential equation x 0 = a(t; x).<br />

We <strong>in</strong>troduce the notion of strong convergence.<br />

Denition 1 A time discrete approximation Y with maximum step size <br />

converges strongly to X at time T if<br />

lim E(jX T , Y (T )j) =0:<br />

#0<br />

The rate of strong convergence is crucial if we want to compare dierent time<br />

discrete approximation methods.<br />

Denition 2 A time discrete approximation Y converges strongly with<br />

order > 0 at time T , if there exists a constant C > 0, <strong>in</strong>dependent of ,<br />

and a 0 > 0 such that<br />

for each 2 (0; 0 ).<br />

E(jX T , Y (T )j) C <br />

Under some regularity assumptions the Euler scheme converges strongly<br />

with order =0:5 :<br />

Theorem 8 Suppose<br />

E(jX 0 j 2 ) < 1;<br />

E jX 0 + Y <br />

0<br />

j 2 1 2<br />

C 1 1 2 ;<br />

ja(t; x) , a(t; y)j + jb(t; x) , b(t; y)j C 2 jx , yj;<br />

ja(t; x)j + jb(t; x)j C 3 (1 + jxj)<br />

74


and<br />

ja(s; x) , a(t; x)j + jb(s; x) , b(t; x)j C 4 (1 + jxj)js , tj 1 2<br />

for all s; t 2 [0;T] and x; y 2 IR d . Then<br />

for the Euler approximation Y .<br />

E jX T , Y (T )j C 5 1 2<br />

For the proof see [45], x10.2. We will compare the Euler method with the<br />

Milste<strong>in</strong> method described <strong>in</strong> the next section.<br />

B.2 The Milste<strong>in</strong> Scheme<br />

By add<strong>in</strong>g to the Euler scheme (B.2) the term<br />

we obta<strong>in</strong> the Milste<strong>in</strong> scheme<br />

1<br />

2 bb0 [(W ) 2 , ]<br />

Y n+1 = Y n + a+bW + 1 2 bb0 [(W ) 2 , ]:<br />

(B.3)<br />

If a and b are smoothly enough the Milste<strong>in</strong> scheme can be shown to<br />

have order of strong convergence =1:0. In comparison with the Euler<br />

method the strong convergence order is <strong>in</strong>creased from = 0:5 to = 1:0<br />

and we conclude that the Milste<strong>in</strong> method is an improvement of the Euler<br />

method.<br />

We remark that <strong>in</strong> the case of the diusion term b =0,that is <strong>in</strong> the determ<strong>in</strong>istic<br />

case, the (determ<strong>in</strong>istic) Euler scheme has strong convergence order<br />

=1:0. Therefore, as to the order of strong convergence, the Milste<strong>in</strong> scheme<br />

can be seen as a generalization of the determ<strong>in</strong>istic Euler scheme.<br />

75


Appendix C<br />

The It^o Formula<br />

C.1 The one-dimensional case<br />

The function U :[0;T] IR 7! IR have cont<strong>in</strong>uous partial derivatives @U , @U<br />

@t @x<br />

and @2 U<br />

and X<br />

@x 2 t satisfy the one-dimensional It^o stochastic dierential equation<br />

dX t = a(t; !)dt + b(t; !)dW t ;<br />

q<br />

where jaj and b are <strong>in</strong> the space L 2 . Dene a process Y t by Y t = U(t; X t )<br />

for 0 t T . Then<br />

" @U<br />

dY t =<br />

@t (t; X @U<br />

t)+a t<br />

@x (t; X t)+ 1 #<br />

@ 2 U<br />

2 b2 t<br />

@x (t; X t) dt<br />

2<br />

+ b t<br />

@U<br />

@x (t; X t) dW t ;<br />

w. p. 1 for 0 t T , and with the notation a t = a(t; !);b t = b(t; !).<br />

C.2 The multi-dimensional case<br />

The process X t satisfy the d-dimensional It^o stochastic dierential equation<br />

dX t = a(t; !)dt + B(t; !)dW t ;<br />

where fW t ;t 0g is an m-dimensional Wiener process with <strong>in</strong>dependent<br />

components, W t =(Wt 1<br />

q ;Wt 2 ;:::;Wt<br />

m ), and a :[0;T] 7! IR d , B :[0;T] <br />

7! IR dm , satisfy<strong>in</strong>g ja k j and B k;j 2 L 2 for k =1;:::;d, j =1;:::;m.<br />

76


The function U :[0;T] IR d 7! IR have cont<strong>in</strong>uous partial derivatives @U<br />

@t , @U<br />

@x k<br />

and<br />

@2 U<br />

@x k @x i<br />

for k; i = 1; 2;:::;d. Dene a scalar process fY t ; 0 t T g by<br />

Y t = U(t; X t )=U(t; Xt 1 ;Xt 2 ;:::;Xt d ) w. p. 1. Then<br />

2<br />

3<br />

dY t = 4 @U dX<br />

@t + a k @U<br />

t + 1 mX dX<br />

Bt<br />

i;j B k;j @ 2 U<br />

5<br />

t dt<br />

@x k 2<br />

@x i @x k<br />

+<br />

mX<br />

dX<br />

j=1 i=1<br />

k=1<br />

Bt<br />

i;j @U<br />

dWt j ;<br />

@x i<br />

j=1 i;k=1<br />

w. p. 1 for 0 t T , where the partial derivatives are evaluated at (t; X t )<br />

and where we denote a k t = a k (t; !), B i;j<br />

t = B i;j (t; !).<br />

For an extension of the It^o formula to a wider class of non-smooth functions<br />

we refer to the recently published paper by Follmer, Protter and Shiryaev<br />

[26].<br />

77


Appendix D<br />

The Radon-Nikodym Theorem<br />

In our notation we follow Bill<strong>in</strong>gsley [9], x32.<br />

1) A probability measure on a eld F <strong>in</strong> is called nite, if() < 1.<br />

2) If = A 1 [ A 2 [ ::: for some nite or countable sequence of F-sets<br />

satisfy<strong>in</strong>g (A k ) < 1, then is -nite.<br />

3) A measure is absolutely cont<strong>in</strong>uous with respect to if for each A<br />

<strong>in</strong> F<br />

(A) =0) (A) =0:<br />

This relation is <strong>in</strong>dicated by


Bibliography<br />

[1] Ball, C.A., 1993, \A Review of Stochastic Volatility <strong>Models</strong> with Application<br />

to Option Pric<strong>in</strong>g", F<strong>in</strong>ancial Markets, Institutions and Instruments,<br />

Vol. 2, No. 5, pp. 55-71.<br />

[2] Basawa, I.V. and B.L.S. Prakasa Rao, 1980, \Statistical Inference for<br />

Stochastic Processes", Academic Press Inc.<br />

[3] Bernardo, J.M. and A.F.M. Smith, 1994, \Bayesian Theory", Wiley,<br />

New York.<br />

[4] Bibby, B.M, 1994, \A Two-Compartment Model with Additive White<br />

Noise", Research Report No.290, Dept. Theor. Statist., University of<br />

Aarhus.<br />

[5] Bibby, B.M, 1994, \Optimal Comb<strong>in</strong>ation of Mart<strong>in</strong>gale Estimat<strong>in</strong>g<br />

Functions for Discretely Observed Diusion Processes", Research Report<br />

No.298, Dept. Theor. Statist., University of Aarhus.<br />

[6] Bibby, B.M., 1995, \Analysis of a Tracer Experiment Based on Compartmental<br />

Diusion <strong>Models</strong>", Research Report No.303, Dept. Theor.<br />

Statist., University of Aarhus.<br />

[7] Bibby, B.M., 1995, \On <strong>Estimation</strong> <strong>in</strong> Compartmental Diusion<br />

<strong>Models</strong>", Research Report No.305, Dept. Theor. Statist., University of<br />

Aarhus.<br />

[8] Bibby, B.M and M. Srensen, 1995, \Mart<strong>in</strong>gale <strong>Estimation</strong> Functions<br />

for Discretely Observed Diusion Processes", Bernoulli Vol. 1 No.(1/2),<br />

pp. 17-39.<br />

[9] Bill<strong>in</strong>gsley, P., 1986, \Probability and Measure", 2nd ed., Wiley, New<br />

York.<br />

79


[10] Bill<strong>in</strong>gsley,P., 1968, \Convergence of Probability Measures", Wiley, New<br />

York.<br />

[11] Black, F., 1990, \Liv<strong>in</strong>g up to the model", Risk Magaz<strong>in</strong>e, Mar 1990.<br />

[12] Bollerslev, T., 1986, \Generalized Autoregressive Conditional Heteroskedasticity",<br />

Journal of Econometrics 31, pp. 307-327.<br />

[13] Bollerslev, T., R.Y. Chou and K.F. Kroner, 1992, \ARCH Model<strong>in</strong>g <strong>in</strong><br />

F<strong>in</strong>ance", Journal of Econometrics 52, pp. 5-59.<br />

[14] Bollerslev, T., R.F. Engle and D.B. Nelson, 1994, \ARCH <strong>Models</strong>", The<br />

Handbook of Econometrics, Vol. 4.<br />

[15] Breiman, L., 1968, \Probability", Addison-Wesley Publish<strong>in</strong>g Company.<br />

[16] Brockwell, P.J. and R.A. Davis, 1987, \Time Series: Theory and Methods",<br />

Spr<strong>in</strong>ger-Verlag, New York.<br />

[17] Cramer, H., 1946, \Mathematical Methods of Statistics", Pr<strong>in</strong>ceton University<br />

Press.<br />

[18] Dacunha-Castelle, D. and M. Duo, 1993, \Probabilites et Statistiques",<br />

Masson, Paris.<br />

[19] Dacunha-Castelle, D. and D. Florens-Zmirou, 1986, \<strong>Estimation</strong> of the<br />

Coecients of a Diusion from Discrete Observations", Stochastics 19,<br />

pp. 263-284.<br />

[20] Deelstra, G. and G. Parker, 1995, \A Covariance Equivalent Discretisation<br />

of the CIR Model", Proceed<strong>in</strong>gs of the 5th AFIR International<br />

Colloquium, Vol.2, pp. 731-747.<br />

[21] Due, D. and K.J. S<strong>in</strong>gleton, 1995, \An Econometric Model of the<br />

Term Structure of Interest Rate Swap Yields", Work<strong>in</strong>g paper, Graduate<br />

School of Bus<strong>in</strong>ess, Stanford University.<br />

[22] Engle, R.F., 1982, \Autoregressive Conditional Heteroskedasticity with<br />

Estimates of the Variance of U.K. Ination", Econometrica 50, pp. 987-<br />

1008.<br />

[23] Engle, R.F. and M. Rothschild, 1992, \ARCH <strong>Models</strong> <strong>in</strong> F<strong>in</strong>ance", Journal<br />

of Econometrics, Annals 1992-1, Vol. 52.<br />

[24] Feig<strong>in</strong>, P.D., 1976, \Maximum Likelihood <strong>Estimation</strong> for Cont<strong>in</strong>uoustime<br />

Stochastic Processes', Adv. Appl. Probab., Vol. 8, pp. 712-736.<br />

80


[25] Florens-Zmirou, D., 1989, \Approximate Discrete-Time Schemes for Statistics<br />

of Diusion Processes", Statistics 20, pp. 547-557.<br />

[26] Follmer, H., P. Protter and A.N. Shiryayev, 1995, \Quadratic covariation<br />

and an extension of It^o's formula", Bernoulli Vol. 1 No.(1/2), pp. 149-<br />

169.<br />

[27] Fournie, E. and D. Talay, \Application de la Statistique des Diusions<br />

a un Modele de Taux d'Inter^et", INRIA, forthcom<strong>in</strong>g <strong>in</strong> F<strong>in</strong>ance.<br />

[28] Gallant, A.R., 1981, \On the Bias <strong>in</strong> Flexible Functional Forms and<br />

an Essentially Unbiased Form: The Fourier Flexible Form", Journal of<br />

Economics 15, pp. 211-244.<br />

[29] Geman, S. and D. Geman, 1984, \Stochastic Relaxation, Gibbs Distributions,<br />

and the Bayesian Restoration of Images", IEEE Trans. Pattern<br />

Anal. Mach<strong>in</strong>e Intell., Vol. PAMI-6, No. 6., pp. 721-741.<br />

[30] Genon-Catalot, V. and D. Picard, 1993, \Elements de statistique asymptotique",<br />

Spr<strong>in</strong>ger-Verlag France, Paris.<br />

[31] Geweke, J., 1989, \Bayesian Inference <strong>in</strong> Econometric models us<strong>in</strong>g<br />

Monte Carlo <strong>in</strong>tegration", Econometrica 57, pp. 1317-1339.<br />

[32] Gihman, I.I. and A.V. Skorokhod, 1972, \Stochastic Dierential Equations",<br />

Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>.<br />

[33] Godambe, V.P. and C.C. Heyde, 1987, \Quasi-likelihood and Optimal<br />

<strong>Estimation</strong>", Int. Statist. Rev. 55, pp. 231-244.<br />

[34] Hamilton, J.D., 1994, \Time Series Analysis", Pr<strong>in</strong>ceton University<br />

Press.<br />

[35] Heyde, C.C., 1987, \On Comb<strong>in</strong><strong>in</strong>g Quasi-likelihood Estimat<strong>in</strong>g Functions",<br />

Stochastic Processes and their Applications 25, pp. 281-287.<br />

[36] Heyde, C.C., 1989, \Quasi-Likelihood and Optimality for Estimat<strong>in</strong>g<br />

Functions: Some Current Unify<strong>in</strong>g Themes", Fisher Lecture, I.S.I, 47th<br />

Session, Paris Aug 29 { Sep 6, 1989.<br />

[37] Hutton, J.E. and P.I. Nelson, 1986, \Quasi-Likelihood <strong>Estimation</strong> for<br />

Semimart<strong>in</strong>gales", Stochastic Processes and their Applications 22, pp.<br />

245-257.<br />

81


[38] Ibragimov, I.A. and R.Z. Has'm<strong>in</strong>skii, 1981, \Statistical <strong>Estimation</strong>",<br />

Spr<strong>in</strong>ger-Verlag, New York.<br />

[39] Jacod, J., 1994, \Statistics of Diusion Processes: Some Elements", Istituto<br />

per le Applicazioni della Matematica e dell'Informatica 94.17.<br />

[40] Jacod, J., \La variation quadratique du brownien en pr'esence d'erreurs<br />

d'arrondi", Work<strong>in</strong>g Paper, Laboratoire deProbabilites, Universite<br />

Pierre et Marie Curie, Paris.<br />

[41] Jacod, J. and A.N. Shiryaev, 1987, \Limit Theorems for Stochastic<br />

Processes", Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>.<br />

[42] Jacquier, E., N.G. Polson and P.E. Rossi, 1994, \Bayesian Analysis of<br />

Stochastic Volatility <strong>Models</strong>", Journal of Bus<strong>in</strong>ess and Economic Statistics,<br />

Oct 1994, Vol. 12, No. 4.<br />

[43] Kallianpur, G., 1980, \Stochastic Filter<strong>in</strong>g Theory", Spr<strong>in</strong>ger-Verlag,<br />

New York.<br />

[44] Kallianpur, G. and R.L. Karandikar, 1988, \White Noise Theory of Prediction,<br />

Filter<strong>in</strong>g and Smooth<strong>in</strong>g", Gordon and Breach Science Publishers.<br />

[45] Kloeden, P.E. and E. Platen, 1992, \Numerical Solution of Stochastic<br />

Dierential Equations", Spr<strong>in</strong>ger-Verlag, New York.<br />

[46] Kloeden, P.E., H. Schurz, E. Platen and M. Srensen, 1992, \On Effects<br />

of Discretization on Estimators of Drift Parameters for Diusion<br />

Processes", Research Report No.249, Dept. Theor. Statist., University<br />

of Aarhus.<br />

[47] Kutoyants, Yu.A., 1994, \Identication of Dynamical Systems with<br />

Small Noise", Kluwer Academic Publishers.<br />

[48] Kutoyants, A.Y., 1984, \Parameter <strong>Estimation</strong> for Stochastic Processes",<br />

Heldermann-Verlag, Berl<strong>in</strong>.<br />

[49] L<strong>in</strong>dgren, B.W., 1976, \Statistical Theory", 3rd ed., Macmillan Publish<strong>in</strong>g<br />

Co.<br />

[50] L<strong>in</strong>ton, O., 1993, \Adaptive <strong>Estimation</strong> <strong>in</strong> ARCH <strong>Models</strong>", Econometric<br />

Theory 9, pp. 539-569.<br />

82


[51] Liptser, R.S. and A.N. Shiryaev, 1977, \Statistics of Random Processes<br />

I, II", Spr<strong>in</strong>ger-Verlag, New York.<br />

[52] Luschgy, H. and A.L. Rukh<strong>in</strong>, 1993, \Asymptotic Properties of Tests for<br />

a Class of Diusion Processes: Optimality and Adaption", Mathematical<br />

Methods of Statistics, Allerton Press, Inc., Vol. 2, No. 1, pp. 42-51.<br />

[53] Mills, T.C., 1993, \The Econometric Modell<strong>in</strong>g of F<strong>in</strong>ancial Time Series",<br />

Cambridge University Press.<br />

[54] Nelson, D.B., 1990, \ARCH <strong>Models</strong> as Diusion Approximations", Journal<br />

of Econometrics 45, pp. 7-38.<br />

[55] O'Hagan, A., 1994, \Bayesian <strong>in</strong>ference", Kendall's Advanced Theory of<br />

Statistics, Arnold, Vol. 2B.<br />

[56] ksendal, B., 1989, \Stochastic Dierential Equations", Second Edition,<br />

Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>.<br />

[57] Pagan, A.R. and G.W. Schwert, 1990, \Alternative <strong>Models</strong> for Conditional<br />

Stock Volatility", Journal of Econometrics 45, pp. 267-290.<br />

[58] Pedersen, A.R., 1993, \A New Approach to Maximum Likelihood <strong>Estimation</strong><br />

for Stochastic Dierential Equations Based on Discrete Observations",<br />

Research Report No.264, Dept. Theor. Statist., University of<br />

Aarhus.<br />

[59] Pedersen, A.R., 1993, \Maximum Likelihood <strong>Estimation</strong> Based on Incomplete<br />

Observations for a Class of Discrete Time Stochastic Processes<br />

by Means of the Kalman Filter", Research Report No.272, Dept. Theor.<br />

Statist., University of Aarhus.<br />

[60] Pedersen, A.R., 1995, \Consistency and Asymptotic Normality of an<br />

Approximate Maximum Likelihood Estimator for Discretely Observed<br />

Diusion Processes", Bernoulli Vol.1 No.3, pp. 257-279.<br />

[61] Pedersen, A.R., 1994, \Uniform Residuals for Discretely Observed Diffusion<br />

Processes", Research Report No.290, Dept. Theor. Statist., University<br />

of Aarhus.<br />

[62] Pedersen, A.R., 1994, \Quasi-Likelihood Inference for Discretely Observed<br />

Diusion Processes", Research Report No.295, Dept. Theor. Statist.,<br />

University of Aarhus.<br />

83


[63] Pedersen, A.R., 1994, \Statistical Analysis of Gaussian Diusion<br />

Processes Based on Incomplete Discrete Observations", Research Report<br />

No.297, Dept. Theor. Statist., University of Aarhus.<br />

[64] Priestley, M.B., 1981, \Spectral analysis and time series", Vol.1 and 2,<br />

Academic Press.<br />

[65] Rogers, L.C.G. and D. Williams, 1987, \Diusions, Markov Processes<br />

and Mart<strong>in</strong>gales, Vol. 2: It^o Calculus", Wiley, Chichester.<br />

[66] Shiryayev, A.N., 1984, \Probability", Spr<strong>in</strong>ger-Verlag, New York.<br />

[67] Shiryayev, A.N. and V.G. Spoko<strong>in</strong>y, \Statistical Experiments and Decisions:<br />

Asymptotic Theory", forthcom<strong>in</strong>g <strong>in</strong> Spr<strong>in</strong>ger-Verlag.<br />

[68] Shorack, G.R. and J.A. Wellner, 1986, \Empirical Processes with Applications<br />

to Statistics", Wiley, New York.<br />

[69] Silverman, B.W., 1986, \Density <strong>Estimation</strong> for Statistics and Data<br />

Analysis", Chapman and Hall, London.<br />

[70] Skorokhod, A.V., 1989, \Asymptotic Methods <strong>in</strong> the Theory of Stochastic<br />

Dierential Equations", American Mathematical Society, Translations<br />

of Mathematical Monographs Vol. 78.<br />

[71] Smith, A.F.M. and G.O.Roberts, 1993, \Bayesian Computation via the<br />

Gibbs Sampler and Related Markov Cha<strong>in</strong> Monte Carlo Methods",<br />

J.R.Statist.Soc. B, 55, No. 1, pp. 3-23.<br />

[72] Ste<strong>in</strong>, E.M. and J.C. Ste<strong>in</strong>, 1991, \Stock Price Distributions with Stochastic<br />

Volatility: An Analytic Approach", The Review of F<strong>in</strong>ancial<br />

Studies, Vol. 4, No. 4, pp. 727-752.<br />

[73] Stroock, D.W. and S.R.S. Varadhan, 1979, \Multidimensional Diusion<br />

Processes", Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>.<br />

[74] Taylor, S., 1986, \Modell<strong>in</strong>g F<strong>in</strong>ancial Time Series", Wiley, Chichester.<br />

[75] Wigg<strong>in</strong>s, J.B., 1987, \Option Values Under Stochastic Volatility: Theory<br />

and Empirical Estimates", Journal of F<strong>in</strong>ancial Economics, 19, pp. 351-<br />

372.<br />

[76] Wong, E. and B. Hajek, 1985, \Stochastic Processes <strong>in</strong> Eng<strong>in</strong>eer<strong>in</strong>g Systems",<br />

Spr<strong>in</strong>ger-Verlag, New York.<br />

84

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!