Estimation in Financial Models - RiskLab
Estimation in Financial Models - RiskLab
Estimation in Financial Models - RiskLab
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Estimation</strong> <strong>in</strong> F<strong>in</strong>ancial <strong>Models</strong><br />
RISKLAB<br />
Anja Go<strong>in</strong>g<br />
ETH Zurich<br />
Department of Mathematics<br />
CH-8092 Zurich-Switzerland<br />
Telephone: 41-1-632 34 58<br />
Fax: 41-1-632 10 85<br />
E-Mail: goe<strong>in</strong>g@math.ethz.ch<br />
January 1996
Contents<br />
Introduction 3<br />
1 Stochastic Volatility <strong>Models</strong> 5<br />
1.1 Cont<strong>in</strong>uous time : : : : : : : : : : : : : : : : : : : : : : : : : : 5<br />
1.2 Discrete time : : : : : : : : : : : : : : : : : : : : : : : : : : : 7<br />
2 Approximations 8<br />
2.1 Diusion models by discrete time models : : : : : : : : : : : : 8<br />
2.2 Discrete time models by diusion models : : : : : : : : : : : : 14<br />
3 Parameter <strong>Estimation</strong> 16<br />
3.1 Diusion models : : : : : : : : : : : : : : : : : : : : : : : : : 16<br />
3.1.1 Cont<strong>in</strong>uous observations : : : : : : : : : : : : : : : : : 16<br />
3.1.2 Discrete observations : : : : : : : : : : : : : : : : : : : 22<br />
3.2 Discrete models : : : : : : : : : : : : : : : : : : : : : : : : : : 42<br />
3.2.1 AR models : : : : : : : : : : : : : : : : : : : : : : : : 42<br />
3.2.2 ARCH and GARCH models : : : : : : : : : : : : : : : 45<br />
3.2.3 The Bayesian estimation method : : : : : : : : : : : : 50<br />
4 Nonparametric <strong>Estimation</strong> 56<br />
4.1 Diusion models : : : : : : : : : : : : : : : : : : : : : : : : : 56<br />
4.2 Discrete models : : : : : : : : : : : : : : : : : : : : : : : : : : 58<br />
1
5 Some Diusion <strong>Models</strong> with Explicit Solutions 60<br />
5.1 L<strong>in</strong>ear stochastic dierential equations : : : : : : : : : : : : : 60<br />
5.2 Nonl<strong>in</strong>ear stochastic dierential equations : : : : : : : : : : : 63<br />
A The Kalman-Bucy Filter 70<br />
A.1 The cont<strong>in</strong>uous case : : : : : : : : : : : : : : : : : : : : : : : 70<br />
A.1.1 The general lter<strong>in</strong>g problem : : : : : : : : : : : : : : 70<br />
A.1.2 The l<strong>in</strong>ear lter<strong>in</strong>g problem : : : : : : : : : : : : : : : 71<br />
A.2 The discrete case : : : : : : : : : : : : : : : : : : : : : : : : : 72<br />
B Numerical Methods 73<br />
B.1 The Euler Scheme : : : : : : : : : : : : : : : : : : : : : : : : : 73<br />
B.2 The Milste<strong>in</strong> Scheme : : : : : : : : : : : : : : : : : : : : : : : 75<br />
C The It^o Formula 76<br />
C.1 The one-dimensional case : : : : : : : : : : : : : : : : : : : : : 76<br />
C.2 The multi-dimensional case : : : : : : : : : : : : : : : : : : : 76<br />
D The Radon-Nikodym Theorem 78<br />
Bibliography 79<br />
2
Introduction<br />
Over the last few years various new derivative <strong>in</strong>struments have emerged <strong>in</strong><br />
nancial markets lead<strong>in</strong>g to a demand for versatile estimation methods for<br />
relevant model parameters. Typical examples <strong>in</strong>clude volatility, covariances<br />
and correlations. In this paper we give a survey on statistical estimation<br />
methods for both discrete as well as cont<strong>in</strong>uous time stochastic models.<br />
The text is organized as follows: <strong>in</strong> Chapter 1 we rst motivate a model <strong>in</strong><br />
which volatility ofa price process is assumed to follow astochastic process.<br />
Out of the variety ofcont<strong>in</strong>uous time stochastic volatility models <strong>in</strong>troduced<br />
<strong>in</strong> the literature wechoose two empirically relevant ones, that is an arithmetic<br />
Ornste<strong>in</strong>-Uhlenbeck process and a square root diusion model. Those two<br />
models serve as reference models <strong>in</strong> some of the later chapters. As for discrete<br />
time stochastic volatility models, we concentrate on log-AR(p) processes,<br />
ARCH(q) and GARCH(p; q) processes all of which will be discussed <strong>in</strong> detail<br />
<strong>in</strong> Section 3.2.<br />
Approximations of diusion models by discrete time models and vice versa<br />
are described <strong>in</strong> Chapter 2. The convergence result on which these approximations<br />
are based is stated. As applications the diusion limits of GARCH(1,1)-<br />
type processes and of AR(1) E-ARCH processes are derived. In addition, we<br />
present strategies for approximat<strong>in</strong>g diusions and briey compare dierent<br />
discretizations of a diusion model.<br />
In Section 3.1 estimation of an unknown parameter <strong>in</strong> a general diusion<br />
process is discussed. For cont<strong>in</strong>uously observed processes we develop the classical<br />
theory of maximum likelihood estimation <strong>in</strong>clud<strong>in</strong>g properties like consistency<br />
and asymptotic normality. In the case of discrete observations the<br />
maximum likelihood estimator reta<strong>in</strong>s all the 'good' properties if the transition<br />
densities of the process are known. However, <strong>in</strong> most cases relevant for<br />
nance, we do not have explicit expressions for the underly<strong>in</strong>g transition densities<br />
and the use of approximate likelihood functions leads to <strong>in</strong>consistent<br />
estimators when the time between observations is bounded away from zero.<br />
We describe alternative estimation methods, as there are mart<strong>in</strong>gale estimat-<br />
3
<strong>in</strong>g functions or methods based on approximat<strong>in</strong>g the transition densities. As<br />
a result, we are able to obta<strong>in</strong> consistent and asymptotically normal estimators.<br />
The methods <strong>in</strong>troduced will be tested on some examples. In Section<br />
3.2 discrete AR, ARCH and GARCH models and their asymptotic properties<br />
are discussed. F<strong>in</strong>ally, a brief description of Bayesian estimation is given.<br />
Concern<strong>in</strong>g nonparametric estimation <strong>in</strong> diusions, <strong>in</strong> Section 4.1 we discuss<br />
estimation of a probability density and estimation of an unknown signal. Section<br />
4.2 presents two dierent nonparametric techniques for discrete models,<br />
namely kernel estimators and Fourier type estimators.<br />
In Chapter 5 we show how to solve l<strong>in</strong>ear stochastic dierential equations<br />
explicitly and give some classes of nonl<strong>in</strong>ear stochastic dierential equations<br />
that can be reduced to l<strong>in</strong>ear ones.<br />
Some necessary background material is briey <strong>in</strong>troduced <strong>in</strong> the appendices.<br />
An extensive bibliography guides the <strong>in</strong>terested reader to further published<br />
material.<br />
Acknowledgements:<br />
In work<strong>in</strong>g out the material presented, I was fortunate to have been able<br />
to discuss with various people aspects of the text. I am especially grateful<br />
to Prof. P. Embrechts for his <strong>in</strong>valuable advice and constant encouragement.<br />
I also take great pleasure <strong>in</strong> thank<strong>in</strong>g A.N. Shiryaev, A. Dassios, G. Parker<br />
and H. Schurz for their much appreciated help. I thank M. Kafetzaki Boulamatsis<br />
very much for all the helpful discussions and her support dur<strong>in</strong>g the<br />
preparation of the manuscript. The fruitful discussions with<strong>in</strong> Risklab also<br />
were a constant source of <strong>in</strong>spiration.<br />
4
Chapter 1<br />
Stochastic Volatility <strong>Models</strong><br />
1.1 Cont<strong>in</strong>uous time<br />
The value of a stock price S is supposed to follow the process<br />
dS t = S t ( t dt + t dW t ); (1.1)<br />
where W t is a Wiener process and and are functions of t. Consider<strong>in</strong>g<br />
the logarithm of the stock price H ln S and us<strong>in</strong>g It^o's formula we derive<br />
the process followed by H<br />
dH t =( t , 2 t<br />
2 ) dt + t dW t : (1.2)<br />
In the follow<strong>in</strong>g consider the process H t <strong>in</strong>stead of the equivalent process S t .<br />
For many purposes this leads to a more tractable dierential equation. Also<br />
from a statistical po<strong>in</strong>t of view do the <strong>in</strong>crements of H t (i.e. the so-called<br />
log-returns) behave <strong>in</strong> a 'nicer' way as the <strong>in</strong>crements of S t .<br />
In the case t and t , S t follows geometric Brownian motion. This<br />
is assumed <strong>in</strong> the Black-Scholes option pric<strong>in</strong>g model which has been used<br />
as an eective tool for the pric<strong>in</strong>g of options for more than two decades.<br />
When compar<strong>in</strong>g the calculated option values us<strong>in</strong>g the Black-Scholes model<br />
with the option prices there is usually a dierence. Among these biases <strong>in</strong><br />
model prices the well-known \smile"-eect is important: the Black-Scholes<br />
option pric<strong>in</strong>g formula tends to underprice out-of-the-money-options and to<br />
overprice at-the-money-options, that means implied volatility changes with<br />
the strik<strong>in</strong>g price (see Ball [1]). This eect arises from the assumption <strong>in</strong><br />
the Black-Scholes model that volatility is a known constant (\I sometimes<br />
5
wonder why people still use the Black-Scholes formula, s<strong>in</strong>ce it is based on<br />
such simple assumptions { unrealistically simple assumptions". Black [11]).<br />
In real life volatility is not constant at all. It is non-uniform, that means on<br />
days with major economic events volatility is usually higher than on other<br />
days (especially on non-trad<strong>in</strong>g days). In addition, sometimes stock prices<br />
jump which can be thought of as a sudden large <strong>in</strong>crease <strong>in</strong> the stock's<br />
volatility. Furthermore many economic time series have a mean-reversion<br />
tendency: when the value of a random variable reaches a very high level then<br />
it will rather go down than up and vice versa. Put dierently, it tends away<br />
from extremely high or low values and reverts to some long-term mean.<br />
All these observations lead to the assumption that volatility is a random variable<br />
and a lot of stochastic volatility models have been recently <strong>in</strong>troduced<br />
<strong>in</strong> the literature.<br />
Now the question arises <strong>in</strong> which way stochastic volatility 2 t = 2 (t; !) can<br />
be modeled. The standard framework assumes the volatility specication<br />
dv = m(v) dt + k(v) d ~W t ; (1.3)<br />
with v = 2 t or v = ln 2 t and ~ W t a Wiener process. Out of the variety of<br />
stochastic volatility models we will consider the follow<strong>in</strong>g two:<br />
I: dH t = ( t , 2 t<br />
2 ) dt + t dW t<br />
dv t = ( , v t ) dt + d~W t ; with v t =ln 2 t (1.4)<br />
II: dH t = ( t , 2 t<br />
2 ) dt + t dW t<br />
dv t = b (a , v t ) dt + c p v t d ~W t ; with v t = 2 t (1.5)<br />
where ; ; ; a; b and c are xed constants, W t and W ~ t are <strong>in</strong>dependent<br />
Wiener processes.<br />
In the model (1.4) volatility, ormore exactly ln 2 , is governed by an arithmetic<br />
Ornste<strong>in</strong>-Uhlenbeck (or rst order autoregressive) process with a meanrevert<strong>in</strong>g<br />
tendency. A large and grow<strong>in</strong>g literature treats this case as an<br />
empirically relevant one (see Bollerslev, Engle and Nelson [14], Ste<strong>in</strong> and<br />
Ste<strong>in</strong> [72], Wigg<strong>in</strong>s [75]). In (1.5) a square root diusion model for stochastic<br />
volatility is suggested. For this model see Ball [1] and literature given there.<br />
One further reason why we have restricted our attention to the models (1.4)<br />
and (1.5) above is to be able to show explicitly how certa<strong>in</strong> statistical estimation<br />
procedures are to be implemented and also compared and contrasted<br />
<strong>in</strong> a specic nance context. Most of the techniques <strong>in</strong>troduced do apply to<br />
much more general set-ups.<br />
6
1.2 Discrete time<br />
In the previous section we dealt with cont<strong>in</strong>uous time stochastic volatility<br />
models based on systems of stochastic dierential equations. In the discrete<br />
time approach we consider time series models which are systems of stochastic<br />
dierence equations. In analogy to the cont<strong>in</strong>uous time approach we assume<br />
the stock price S n , S n > 0, to follow the process<br />
S n = S n ( n + n z n ); (1.6)<br />
with and dependent on time and z n N (0; 1). We consider the logarithm<br />
of the stock price H n ln S n . With the notation<br />
we derive the process<br />
H n = y 1 + :::+ y n ;<br />
H n = y n =( n , 2 n<br />
2 )+ nz n : (1.7)<br />
We will concentrate on three dierent discrete stochastic volatility models.<br />
First we consider a rather simple model <strong>in</strong> which the conditional variance of<br />
the time series fh n g follows a logarithmic AutoRegressive (log-AR) process<br />
(see Jacquier, Polson and Rossi [42]).<br />
I. log-AR(p)/stochastic volatility:<br />
v n = 0 + 1 v n,1 + :::+ p v n,p + ~z n ; (1.8)<br />
with v n =ln 2 n and (z n ; ~z n ) <strong>in</strong>dependent N (0; 1).<br />
The famous AutoRegressive Conditional Heteroskedastic (ARCH) model proposed<br />
by Engle [22] will be the second model to look at. The key <strong>in</strong>sight offered<br />
by the ARCH model lies <strong>in</strong> the dist<strong>in</strong>ction between the conditional and<br />
the unconditional second order moments. While the unconditional covariance<br />
matrix for the variables of <strong>in</strong>terest may be time <strong>in</strong>variant, the conditional<br />
variances and covariances often depend non-trivially on the past.<br />
II. ARCH(q):<br />
2 n = 0 + 1 y 2 n,1 + :::+ qy 2 n,q : (1.9)<br />
A long lag length q and a large number of parameters are often needed <strong>in</strong> empirical<br />
applications of ARCH(q). To avoid this problem Bollerslev <strong>in</strong>troduced<br />
the Generalized ARCH (GARCH) model [12].<br />
III. GARCH(p; q):<br />
2 n = 0 + 1 y 2 n,1<br />
+ :::+ q y 2 n,q + 1 2 n,1<br />
+ :::+ p 2 n,p: (1.10)<br />
For a discussion of AR, ARCH and GARCH models see section 3.2.<br />
7
Chapter 2<br />
Approximations<br />
2.1 Diusion models by discrete time models<br />
The theory of nance is ma<strong>in</strong>ly treated <strong>in</strong> terms of stochastic dierential<br />
equations, whereas <strong>in</strong> practice observations can be made only at discrete time<br />
<strong>in</strong>tervals. We want to bridge this gap by approximat<strong>in</strong>g diusion models by<br />
discrete time models and vice versa.<br />
In this section we will deal with discrete time models as diusion approximations.<br />
The advantage of approximat<strong>in</strong>g a diusion model by a sequence<br />
of discrete time models ma<strong>in</strong>ly lies <strong>in</strong> estimat<strong>in</strong>g and forecast<strong>in</strong>g. Whereas<br />
the likelihood function of a discrete time model is easy to compute and to<br />
maximize, there may arise problems <strong>in</strong> deriv<strong>in</strong>g the likelihood of a discretely<br />
observed nonl<strong>in</strong>ear stochastic dierential equation system, e.g. when the diffusion<br />
coecient depends on an unknown parameter or when there are unobservable<br />
state variables like conditional variance (see [54], [14]). In the case<br />
of a diusion model with cont<strong>in</strong>uous observations, see 3.1.1 for the denition<br />
of the maximum likelihood estimator, its properties and the mentioned<br />
problems like parameter dependence. When the diusion model is observed<br />
at discrete time po<strong>in</strong>ts and the transition densities are unknown, discretiz<strong>in</strong>g<br />
the cont<strong>in</strong>uous time log-likelihood leads to the problem of <strong>in</strong>consistent estimators.<br />
This problem is described <strong>in</strong> 3.1.2 where <strong>in</strong> addition three dierent<br />
approaches are given to overcome this problem. The latter also <strong>in</strong> the case<br />
of <strong>in</strong>complete observations.<br />
In summary rather than estimat<strong>in</strong>g and forecast<strong>in</strong>g with a diusion model<br />
observed at discrete time po<strong>in</strong>ts it may bemuch easier to use a discrete time<br />
model directly, e.g. an ARCH model, as basic approximation.<br />
8
First of all, as a simple though important example we approximate a Wiener<br />
process us<strong>in</strong>g the central limit theorem. Let 1 ; 2 ;::: be a sequence of iid<br />
random variables dened on a probability space (; B;P). Suppose E n =0,<br />
Var n = 2 and S n = P n<br />
i=1<br />
i . Via the central limit theorem we have that the<br />
distribution of S n =( p n) converges <strong>in</strong> distribution to the normal distribution<br />
as n tends to <strong>in</strong>nity. By means of the partial sums S n we dene piecewise<br />
l<strong>in</strong>ear functions X n on the <strong>in</strong>terval [0; 1]. For each n and each ! the function<br />
X n (;!) is l<strong>in</strong>ear on each <strong>in</strong>terval [(i , 1)=n; i=n], 1 i n and has the<br />
value S i (!)=( p n) at the po<strong>in</strong>t i=n with start<strong>in</strong>g po<strong>in</strong>t S 0 (!) = 0. Hence,<br />
we construct a function X n of the form<br />
X n (t; !) = 1<br />
p n S i,1(!)+<br />
t , (i , 1)=n<br />
1=n<br />
1<br />
p n i(!);<br />
for t 2 h i,1<br />
n ; i ni<br />
. For each !, Xn (;!) is a cont<strong>in</strong>uous function on [0; 1], i.e.<br />
X n (;!) 2 C[0; 1]. Denote by P n the distribution of X n (;!)<strong>in</strong>C[0; 1]. Then<br />
P n converges weakly to a Wiener measure W<br />
P n ,! W<br />
(see [10], x2). Note that this convergence result can be viewed <strong>in</strong> two dierent<br />
ways: on the one hand X n can be seen as an approximation to a Wiener<br />
process, and on the other hand the Wiener process as an approximation to<br />
X n .We will come back to this po<strong>in</strong>t later.<br />
The ma<strong>in</strong> statement of this chapter is a convergence theorem, where general<br />
conditions are given for a sequence of discrete time Markov processes<br />
to converge weakly to an It^o process. These conditions were developed by<br />
Stroock and Varadhan (see [73], x11.2). As mentioned <strong>in</strong> the example above,<br />
note that given a convergence result like this, we may use this fact to tackle<br />
the approximation problem of cont<strong>in</strong>uous time by discrete models and vice<br />
versa (see section 2.2). For a presentation of the convergence theorem and<br />
the proof we also refer to Nelson [54].<br />
The Convergence Theorem<br />
For h > 0 arbitrarily, consider a discrete time Markov process X 0 ;X h ,<br />
X 2h ;:::;X kh denoted by fX kh g, where X kh takes values <strong>in</strong> IR n for all k.<br />
Assume that we know the transition probabilities of fX kh g and the distribution<br />
of the <strong>in</strong>itial random po<strong>in</strong>t X 0 .We construct the cont<strong>in</strong>uous time process<br />
fX (h)<br />
t g from the discrete time process fX kh g by mak<strong>in</strong>g X (h)<br />
t a step function<br />
with jumps at times h; 2h; 3h::: and values X (h)<br />
t = X kh almost surely for<br />
kh t
1) fX kh g: the family of discrete time processes fX kh g depend<strong>in</strong>g on h and<br />
on the discrete time <strong>in</strong>dex kh, k 2 IN.<br />
2) fX (h)<br />
t g: the family of cont<strong>in</strong>uous time processes fX (h)<br />
t g that are step<br />
functions constructed from the discrete time process fX kh g as described<br />
above. Observe that fX (h)<br />
t g depends both on h and on the cont<strong>in</strong>uous<br />
time <strong>in</strong>dex t 0.<br />
3) fX t g: the limit<strong>in</strong>g process fX t g, to which under some conditions as will<br />
be shown <strong>in</strong> the theorem below, the sequence of processes fX (h)<br />
t g for<br />
h # 0weakly converges.<br />
Instead of giv<strong>in</strong>g the explicit mathematical conditions needed <strong>in</strong> the follow<strong>in</strong>g<br />
theorem (for details see Nelson [54], pp.10-15) we briey describe<br />
and <strong>in</strong>terpret them. Functions a h (x; t) and b h (x; t) are dened as measures<br />
of the second moment and the drift, respectively, and are required to converge<br />
uniformly on compact sets to well-behaved cont<strong>in</strong>uous functions a(x; t)<br />
and b(x; t). Moreover, there shall exist a cont<strong>in</strong>uous function (x; t) with<br />
a(x; t) = (x; t)(x; t) T . The sample paths of the limit process X t are assumed<br />
to be cont<strong>in</strong>uous with probability one. We require the probability<br />
measures of the <strong>in</strong>itial po<strong>in</strong>ts X (h)<br />
0 to converge to a limit measure 0 as h # 0<br />
and thus have determ<strong>in</strong>ed the <strong>in</strong>itial distribution 0 of X t . F<strong>in</strong>ally, certa<strong>in</strong><br />
conditions are needed so that 0 , a(x; t) and b(x; t) uniquely dene the distribution<br />
of the limit process X t .<br />
Theorem 1 Under the assumptions <strong>in</strong>dicated above the family fX (h)<br />
t g converges<br />
weakly as h # 0 to the process fX t g dened by the stochastic dierential<br />
equation<br />
X t = X 0 +<br />
Z t<br />
0<br />
b(X s ;s) ds +<br />
Z t<br />
0<br />
(X s ;s) dW (n)<br />
s ;<br />
where W (n)<br />
t is an n-dimensional Wiener process <strong>in</strong>dependent of X 0 , and fX t g<br />
has <strong>in</strong>itial distribution 0 . The process fX t g exists, is distributionally unique<br />
and rema<strong>in</strong>s with probability one nite <strong>in</strong> nite time <strong>in</strong>tervals.<br />
We remark that the distribution of X t does not depend on the choice of<br />
, see assumptions above. Moreover, note that convergence <strong>in</strong> distribution<br />
means convergence regard<strong>in</strong>g the whole sample path, that means the probability<br />
laws generat<strong>in</strong>g the sample paths fX (h)<br />
t g converge to the probability<br />
law generat<strong>in</strong>g the sample path of fX t ,0 t T g for any 0 T < 1. Furthermore,<br />
we remark that Nelson [54] shows the same result based on simpler<br />
10
conditions for the cont<strong>in</strong>uity of the sample paths of X t and the denitions<br />
for a h and b h . Later we will refer to these conditions as Nelson-conditions.<br />
Now as an application of Theorem 1 Nelson [54] nds and analyzes the diffusion<br />
limit of GARCH(1,1)-type processes.<br />
The GARCH(1,1)-M process of Engle and Bollerslev (see [13]) is dened as<br />
where f" t gN(0; 1) iid.<br />
Y t = Y t,1 + c 2 t + t " t ; (2.1)<br />
2 t+1<br />
= + 2 t<br />
h<br />
+ "<br />
2<br />
t<br />
i<br />
; (2.2)<br />
Now our purpose is to reduce the length of the time <strong>in</strong>tervals more and more.<br />
The parameters , and of the system may depend on h. The drift term<br />
<strong>in</strong> (2.1) and the variance of " t are made proportional to h:<br />
Y kh = Y (k,1)h + h ckh 2 + kh " kh ; (2.3)<br />
<br />
(k+1)h 2 = h + kh<br />
2 h + <br />
h<br />
h "2 kh ; (2.4)<br />
where f" kh gN(0;h) iid and as for the <strong>in</strong>itial distribution (k =0)wehave<br />
P h (Y 0 ; 2 0) 2 A i = h (A)<br />
and (h) 2<br />
t are con-<br />
for all A 2 B(IR 2 ). The cont<strong>in</strong>uous time processes Y (h)<br />
t<br />
structed by<br />
Y (h)<br />
t Y kh and (h) 2<br />
t <br />
2<br />
kh<br />
for kh t 0:<br />
h#0<br />
11
By means of Theorem 1we now obta<strong>in</strong> a diusion limit of the form<br />
d Y t<br />
2 t<br />
!<br />
=<br />
!<br />
!<br />
c 2<br />
dt + 2 0<br />
d W !<br />
1;t<br />
; (2.5)<br />
, 2 0 2 4 W 2;t<br />
where W 1;t and W 2;t are <strong>in</strong>dependent Wiener processes, <strong>in</strong>dependent of<br />
(Y 0 ; 2 0) and with <strong>in</strong>itial distribution<br />
for all A 2B(IR 2 ), see [54], p.17.<br />
P [(Y 0 ; 2 0) 2 A] = 0 (A)<br />
At this po<strong>in</strong>t we see that the two sections 2.1 and 2.2 are closely related to<br />
each other. There exists no closed form for the stationary distribution of the<br />
system (2.1,2.2) <strong>in</strong> discrete time, but <strong>in</strong> cont<strong>in</strong>uous time we are able to derive<br />
the stationary distribution of 2 t <strong>in</strong> (2.5) by us<strong>in</strong>g the results of Wong [76]<br />
(see Nelson [54]). The idea is to draw a conclusion concern<strong>in</strong>g the stationary<br />
distribution from cont<strong>in</strong>uous time to discrete time by means of Theorem 1,<br />
that is, we use Theorem 1 <strong>in</strong> the other direction. This is the ma<strong>in</strong> subject of<br />
section 2.2 and therefore regard<strong>in</strong>g <strong>in</strong>ference from cont<strong>in</strong>uous time to discrete<br />
time we refer to the next section, where we also consider another example, a<br />
so called E-ARCH model, proposed by Nelson.<br />
Cont<strong>in</strong>u<strong>in</strong>g this section, we come to the subject of strategies for approximat<strong>in</strong>g<br />
diusions. Approximation schemes like the standard Euler approximation<br />
(see Appendix B.1) or the Milste<strong>in</strong> scheme (see Appendix B.2) are known<br />
strategies for approximat<strong>in</strong>g diusions. As an example we apply the standard<br />
Euler approximation scheme to the model (1.4), which is written here aga<strong>in</strong><br />
for the reader's convenience<br />
d h ln 2 t<br />
dY t = t , 2 t<br />
2<br />
i<br />
!<br />
dt + t dW 1;t<br />
= h , ln 2 t<br />
i<br />
dt + dW2;t : (2.6)<br />
The standard Euler approximation scheme for (2.6), respectively (1.4), is<br />
given by<br />
!<br />
p<br />
y t+h = y t + t , 2 t<br />
h + t h"1;t+h<br />
2<br />
ln 2 t+h<br />
= ln<br />
<br />
<br />
2<br />
t<br />
<br />
+ h<br />
h<br />
, ln<br />
<br />
<br />
2<br />
t<br />
i<br />
+ <br />
p<br />
h"2;t+h ; (2.7)<br />
for t = h; 2h; 3h;:::, with (y 0 ; 0 ) xed and " 1;t , " 2;t <strong>in</strong>dependent, N(0; 1).<br />
The diusion (2.6), respectively (1.4), does not satisfy global Lipschitz conditions<br />
and cannot be transformed <strong>in</strong> order to satisfy them. Thus, the known<br />
12
theorems on strong convergence of the Euler approximation (see Appendix<br />
B.1) do not apply. But by means of the convergence theorem above the weak<br />
convergence of the system (2.7) to the diusion model (2.6) can be veried<br />
(for the proof we refer to [14]). Hence, with the standard Euler approximation<br />
scheme we arrive ataway to approximate model (1.4).<br />
As a further remark, consider a sequence of processes fX n;t g dened by<br />
dX n;t = b(X n;t ;t) dt + (X n;t ;t) dW n;t :<br />
Wong and Hajek [76], x4.5, show that, under reasonable conditions, as n<br />
tends to <strong>in</strong>nity, the processes fX n;t g converge to the process fX t g<br />
dX t = b(X t ;t) dt + 1 2 (X t;t) 0 (X t ;t) dt + (X t ;t) dW t ;<br />
where 0 =(@)=(@x). Note the presence of the additional term 1 2 0 which<br />
results from an application of the It^o formula (see Appendix C). Intuitively,<br />
one would expect a limit without the term 1 2 0 , and note that this is the limit<br />
<strong>in</strong> the case where the diusion coecient only depends on time, (x; t) <br />
(t).<br />
Another <strong>in</strong>terest<strong>in</strong>g problem is to nd the <strong>in</strong> some sense best discretization<br />
of a cont<strong>in</strong>uous time model. In the case of the centered Cox-Ingersoll-Ross<br />
model<br />
dr t = ,kr t dt + p r t + dW t ; (2.8)<br />
(see also model (1.5)), we refer to Deelstra and Parker [20] for a discussion<br />
of two dierent discretizations. Besides the 'simple' discretization of (2.8)<br />
r t = r t,1 + "<br />
p<br />
rt,1 + " t ; (2.9)<br />
where " t N (0; 1) i.i.d. and ; " > 0, Deelstra and Parker [20] consider<br />
another discrete representation of (2.8)<br />
r t = r t,1 + "<br />
s<br />
2<br />
1+ r t,1 + " t ; (2.10)<br />
where " t N (0; 1) i.i.d. and ; " > 0. They establish the parametric relations<br />
between the cont<strong>in</strong>uous time model and the discrete models. Whereas <strong>in</strong><br />
the simple model (2.9), the mean and the stationary variance are the same as<br />
<strong>in</strong> the cont<strong>in</strong>uous time model (2.8), the discretization (2.10) has <strong>in</strong> addition<br />
the same covariance as model (2.8) at all sampl<strong>in</strong>g times, that is both the<br />
rst and the second moment are equal. Therefore the covariance equivalent<br />
discretization (2.10) is suggested <strong>in</strong> [20] to be used as a discrete representation<br />
of the cont<strong>in</strong>uous time model (2.8). It is also illustrated <strong>in</strong> [20] that<br />
although model (2.9) looks a lot like model (2.10) both models can produce<br />
signicantly dierent estimates for the parameters.<br />
13
2.2 Discrete time models by diusion models<br />
Approximation of discrete time models by diusion models is a way to simplify<br />
the analysis of discrete models. For <strong>in</strong>stance, the properties of discrete<br />
time models such as consistency and asymptotic normality of maximum likelihood<br />
estimates are rather dicult, and often distributional results are available<br />
for the diusion limit of a sequence of discrete processes that are not<br />
available for the discrete models themselves. In such cases we may be able to<br />
use a convergence theorem as <strong>in</strong> section 2.1 and approximate discrete time<br />
processes, especially ARCH processes, by diusion processes.<br />
We pick up aga<strong>in</strong> the GARCH(1,1) model (2.1,2.2) dealt with <strong>in</strong> the previous<br />
section, where the diusion limit (2.5) of the system (2.1,2.2) was obta<strong>in</strong>ed<br />
via the convergence theorem (with Nelson-conditions). As mentioned, there<br />
exists no closed form for the stationary distribution of the system (2.1,2.2)<br />
<strong>in</strong> discrete time, but we are able to derive the stationary distribution of 2 t <strong>in</strong><br />
the diusion limit (2.5) by us<strong>in</strong>g the results of Wong [76] (see Nelson [54]).<br />
Nelson shows that the stationary distribution of 2 t is an <strong>in</strong>verted gamma and<br />
uses this knowledge <strong>in</strong> cont<strong>in</strong>uous time to obta<strong>in</strong> distributional results for the<br />
discrete time. While the <strong>in</strong>novation process kh " kh , see (2.3), is conditionally<br />
normal distributed, we obta<strong>in</strong> that it is (unconditionally) approximately<br />
distributed as a Student t, <strong>in</strong> the case when the time length between the<br />
observations, i.e. h, is small and kh is large (see [54], p.18f).<br />
Consider a (slightly dierent) model (1.4)<br />
d h ln 2 t<br />
dY t = 2 t dt + t dW 1;t<br />
i<br />
= h , ln 2 t<br />
i<br />
dt + dW2;t ; (2.11)<br />
where W 1;t and W 2;t are Brownian motions with<br />
!<br />
!<br />
dW 1;t<br />
1 C<br />
(dW 1;t dW 2;t )=<br />
12<br />
dt;<br />
dW 2;t C 12 C 22<br />
and C 22 C 2 12. As mentioned <strong>in</strong> the example <strong>in</strong> the previous section (see p.12)<br />
the system (2.11) does not satisfy global Lipschitz conditions and hence the<br />
standard convergence theorems of the Euler approximation do not apply. Nelson<br />
developes a class of discrete time models based on the Exponential ARCH<br />
(E-ARCH) model that converge weakly to a diusion, see [54] pp.20-23. We<br />
will consider such a diusion approximation for the model (2.11). S<strong>in</strong>ce ln( 2 t )<br />
follows a cont<strong>in</strong>uous time AR(1) process, <strong>in</strong> the discrete time model the conditional<br />
variance process is also assumed to be an AR(1) process, that means<br />
14
we assume that it follows an AR(1) E-ARCH process:<br />
h i<br />
ln [Y kh ] = ln Y (k,1)h + hh kh 2 + kh " kh ;<br />
i h h i<br />
= ln khi 2 + , ln <br />
2<br />
kh h + C12 " kh<br />
ln h 2 (k+1)h<br />
+ h j" kh j,(2h=) 1=2i ; (2.12)<br />
where [(C 22 , C 2 12)=(1 , 2=)] 1=2 and " kh iid, N(0;h).<br />
If ln( 2 0) is normally distributed, then the ln( 2 t ) process <strong>in</strong> (2.11) is Gaussian.<br />
Otherwise, if > 0, a Gaussian stationary limit distribution for ln( 2 t ) exists.<br />
Thus, the conditional variance <strong>in</strong> cont<strong>in</strong>uous time is lognormal, and as <strong>in</strong><br />
the GARCH(1,1) example above, from this <strong>in</strong>formation Nelson [54] <strong>in</strong>fers<br />
the distribution of the <strong>in</strong>novation process <strong>in</strong> the discrete time model (2.12).<br />
He shows that <strong>in</strong> the discrete time model (with short time <strong>in</strong>tervals) the<br />
distribution of the <strong>in</strong>novations is approximately a normal-lognormal mixture.<br />
Hence, we derived the approximate distributions of GARCH(1,1) and AR(1)<br />
E-ARCH models for small sampl<strong>in</strong>g <strong>in</strong>tervals by us<strong>in</strong>g the distributional results<br />
available for the diusion limit.<br />
15
Chapter 3<br />
Parameter <strong>Estimation</strong><br />
3.1 Diusion models<br />
Consider the general type of a diusion process X =(X t ) t0 dened as the<br />
solution to the stochastic dierential equation<br />
dX t = b(t; X t ; ) dt + (t; X t ; ) dW t ; X 0 = x 0 ; t 0; (3.1)<br />
where W is an r-dimensional Wiener process, 2 IR p , b(; ; ) :[0; 1)<br />
IR d 7! IR d and (; ; ) : [0; 1) IR d 7! M dr are "nice" 1 functions where<br />
M dr denotes the set of real d r matrices.<br />
The situation will be discussed where a realization of the process X is observed,<br />
but the parameter is unknown to the observer. Hence, we have<br />
to construct suciently good estimators of and exam<strong>in</strong>e their properties.<br />
Deriv<strong>in</strong>g estimates of there are two dierent k<strong>in</strong>ds of observations to dist<strong>in</strong>guish:<br />
cont<strong>in</strong>uous observations of X as considered <strong>in</strong> section 3.1.1, and<br />
discrete observations that will be considered <strong>in</strong> section 3.1.2.<br />
3.1.1 Cont<strong>in</strong>uous observations<br />
Suppose that X satises<br />
dX t = b(; X t )dt + (; X t )dW t ; X 0 = x 0 ; t 0; (3.2)<br />
where for convenience X and W are one-dimensional processes, b and are<br />
smooth functions, and the parameter 2 IR p is to be estimated by<br />
1 b and are Lipschitz cont<strong>in</strong>uous and satisfy a growth condition. Then there exists a<br />
unique strong solution of (3.1), see [65], p.128 and p.136.<br />
16
cont<strong>in</strong>uous observations of X. X is a Markov process. P denotes the law<br />
of the process X on the canonical space = C(IR + ; IR) with the canonical<br />
ltration F t = (X s js t). Denote by P ;t := P jF t the restriction of P to<br />
F t .<br />
First consider the case where (; x) does not depend on . If does not<br />
vanish, the measures P ;t and P 1 ;t for any ; 1 2 , t0, and the process (X t ) is observed <strong>in</strong> the<br />
time <strong>in</strong>terval [0;T]. Then the likelihood process (3.3) <strong>in</strong> T with 1 = 0 equals<br />
"Z T b(X<br />
L ;0<br />
s )<br />
T L T () = exp<br />
0 2 (X s ) dX s , 1 Z T<br />
#<br />
2 b 2 (X s )<br />
2 0 2 (X s ) ds ; (3.5)<br />
see e.g. [51], x17, or [43], p.76, or [39]. In the follow<strong>in</strong>g consider a constant<br />
diusion term , say 1. For convenience we take the logarithm <strong>in</strong> (3.5)<br />
and obta<strong>in</strong> the logarithmic likelihood (log-likelihood) function<br />
ln L T () =<br />
Z T<br />
0<br />
b(X s )dX s , 1 2<br />
Z T<br />
0<br />
2 b 2 (X s )ds: (3.6)<br />
Maximiz<strong>in</strong>g L T , or equivalently ln L T , with respect to we obta<strong>in</strong> the so<br />
called Maximum Likelihood Estimator (MLE) ^ T based on cont<strong>in</strong>uous observations<br />
of X <strong>in</strong> the <strong>in</strong>terval 0 t T . We remark that by denition of the<br />
likelihood function the measure P 1 , that dom<strong>in</strong>ates the measure P (that<br />
is P
which <strong>in</strong> the case (3.6) has the form<br />
@<br />
@ (ln L T ()) =<br />
Z T<br />
0<br />
b(X s )dX s , <br />
Solv<strong>in</strong>g @ @ (ln L T ()) = 0, we obta<strong>in</strong> the MLE<br />
Z T<br />
0<br />
b 2 (X s )ds:<br />
^ T =<br />
R T<br />
R<br />
0<br />
T<br />
0<br />
b(X s )dX s<br />
b 2 (X s )ds<br />
and hence, us<strong>in</strong>g equality (3.4), we have<br />
^ T = 0 +<br />
where 0 denotes the true value of the parameter.<br />
Now we derive some properties of the MLE ^ T .<br />
R T<br />
0<br />
b(X s )dW R<br />
s<br />
T<br />
0<br />
b 2 (X s )ds ; (3.7)<br />
First, we treat the problem of the bias of an estimator. The bias <strong>in</strong> ^ T as an<br />
estimator of 0 is dened as<br />
b^ T<br />
( 0 )=E 0 (^ T , 0 );<br />
where E 0<br />
denotes the expectation with respect to P 0 . By (3.7) we have<br />
"R T<br />
#<br />
0<br />
b(X s )dW s<br />
E 0<br />
^T = 0 + E 0 : (3.8)<br />
b 2 (X s )ds<br />
R T<br />
0<br />
Note that <strong>in</strong> the case of a constant drift coecient b(X t ) b<br />
E <br />
bW T<br />
b 2 T = 1<br />
bT E W T =0;<br />
thus E 0<br />
^T = 0 , so that ^ T is unbiased. In all other cases the estimator is<br />
biased and <strong>in</strong> the case (3.6) we have under some regularity conditions the<br />
equality<br />
b^ T<br />
( 0 )=<br />
d Z T<br />
,1<br />
E 0 b 2 (X t )dt!<br />
;<br />
d 0 0<br />
(see [51], x17.2 and x17.3, Theorem 17.2 and 17.3).<br />
Now we turn to the properties of consistency and asymptotic normality.<br />
Assum<strong>in</strong>g some natural regularity conditions (see [38], p.83 and p.90, or [17],<br />
p.500f, or [48], p.95), the stochastic dierential equation (3.4) has an ergodic<br />
solution. For the ergodic theory we refer to e.g. [32], x18. Then the stationary<br />
18
density ~p of the process solves the (determ<strong>in</strong>istic) stationary Fokker-Planck<br />
equation, which <strong>in</strong> the above case reduces to<br />
Ergodicity implies<br />
and<br />
1<br />
lim<br />
T !1 T<br />
d 2<br />
d<br />
dx [b(x)~p(x)] , 1 ~p(x) =0:<br />
2 dx2 1<br />
Z T<br />
lim b(X t )dW t =0 P 0 a.s.<br />
T !1 T 0<br />
Z T<br />
0<br />
b 2 (X t )dt =<br />
Z 1<br />
,1 b2 (x)~p(x)dx P 0 a.s.<br />
Hence we conclude that it is possible to achieve ultimate arbitrary precision<br />
of the MLE ^ T <strong>in</strong> (3.7) by <strong>in</strong>nitely <strong>in</strong>creas<strong>in</strong>g T , that means the MLE ^ T is<br />
weakly consistent:<br />
h i<br />
lim <br />
T !1 0 j^T , 0 j >" =0 (3.9)<br />
for all ">0.<br />
Furthermore, under these regularity conditions the MLE is asymptotically<br />
normal, that means as T !1the dierence p T (^ T , 0 ) is asymptotically<br />
normal with parameters (0; 2 ( 0 )), where<br />
Z 1<br />
,1<br />
2 ( 0 )=<br />
,1 b2 (x)~p(x)dx ;<br />
(see e.g. [48], p.95f, or [45], p.243).<br />
We remark that this convergence is uniformly for all on a compact set<br />
K , (see e.g. [48], p.94f). Moreover, note that under the same regularity<br />
conditions the MLE is also consistent and asymptotic normal if the drift term<br />
is nonl<strong>in</strong>ear (aga<strong>in</strong> see e.g. [48], p.94f). F<strong>in</strong>ally, we remark that 2 ( 0 ) equals<br />
I ,1 ( 0 ), where I denotes the so called Fisher <strong>in</strong>formation<br />
I() = Var<br />
" @<br />
@ ln L T ()<br />
2<br />
= E 4 @ @ ln L T ()<br />
#<br />
! 2<br />
3<br />
5 ;<br />
which can alternatively be computed by<br />
" @<br />
2<br />
I() =,E<br />
@ ln L 2 T ()<br />
19<br />
#<br />
;
(see [49], p.249f).<br />
Asymptotic normality can be used to determ<strong>in</strong>e condence <strong>in</strong>tervals for <br />
and to determ<strong>in</strong>e a 'suitable' value of T .<br />
Example 1. a) The regularity conditions are satised <strong>in</strong> the case of observations<br />
com<strong>in</strong>g from<br />
dX t = ,X t dt + dW t ; X 0 =0; 0 t T; (3.10)<br />
where 2 (; ), >0, that is , 0, the process has no<br />
stationary distribution. Nevertheless we can show that the MLE ^ T is consistent.<br />
Furthermore, e 0T (^ T , 0 ) is asymptotically normal with parameters<br />
(0; 40) 2 (see [48], p.100).<br />
c) In the case = 0 the MLE ^ T is consistent, but not asymptotically normal<br />
(see [48], p.100).<br />
We remark that <strong>in</strong> the discrete case (section 3.2) an analogous dist<strong>in</strong>ction<br />
regard<strong>in</strong>g parameter values is made, see p.43.<br />
After hav<strong>in</strong>g treated the special case of a drift term l<strong>in</strong>ear <strong>in</strong> we give a<br />
short remark to the general case<br />
dX t = b(; X t )dt + dW t ;<br />
where b(; x) may be a nonl<strong>in</strong>ear function. By Taylor expansion for we have<br />
b(; x) =b( 0 ;x)+( , 0 ) d d b( 0;x)+O(( , 0 ) 2 );<br />
where 0 aga<strong>in</strong> denotes the true value of parameter . Hence b(; x) can be<br />
approximated by a term l<strong>in</strong>ear <strong>in</strong> . S<strong>in</strong>ce <strong>in</strong> practice a 'good guess' of 0<br />
often is available, then with the above considerations it is sucient only to<br />
consider the already treated case: a drift term l<strong>in</strong>ear <strong>in</strong> . This closes our<br />
considerations about ML estimation.<br />
20
In the follow<strong>in</strong>g we give some remarks to the case where the diusion term<br />
depends on an unknown parameter . As mentioned before <strong>in</strong> this case we<br />
cannot estimate by us<strong>in</strong>g Maximum Likelihood theory (see p.17).<br />
As a rst estimat<strong>in</strong>g problem we deal with the process<br />
dX t = ()dW t ; 0 t T: (3.11)<br />
In order to estimate we discretize the process and derive its limit. Consider<br />
partitions n = t (n)<br />
0 ;:::;t m(n) of [0;T] constructed <strong>in</strong> such a way that<br />
P 1n=1<br />
sup k jt (n)<br />
k+1<br />
, t (n)<br />
k j < 1 for n ,! 1. Then for the discretized Wiener<br />
process W we have the convergence result<br />
mX<br />
k=1<br />
jW t<br />
(n)<br />
k<br />
, W (n) t<br />
j 2 ,! T;<br />
k,1<br />
<strong>in</strong> probability for n ,! 1(see [15], p.262f). Hence for (3.11) we obta<strong>in</strong><br />
mX<br />
k=1<br />
jX t<br />
(n)<br />
k<br />
, X (n) t<br />
j 2 ,! T 2 ();<br />
k,1<br />
<strong>in</strong> probability for n ,! 1, thus we are able to give an arbitrarily precise<br />
estimate of 2 (). Notice that we do not obta<strong>in</strong> a direct estimate of .<br />
Next, if the unknown parameter splits <strong>in</strong>to two parts ( 1 ; 2 ) <strong>in</strong> the follow<strong>in</strong>g<br />
way<br />
dX t = b(t; 1 )dt + ( 2 )dW t ; (3.12)<br />
wewant to estimate the part 2 <strong>in</strong> the diusion coecient. Denot<strong>in</strong>g X (n)<br />
k<br />
X (n) t<br />
, X (n)<br />
k t<br />
and W (n)<br />
k W (n)<br />
k,1<br />
t<br />
, W (n)<br />
k t<br />
we obta<strong>in</strong> by discretiz<strong>in</strong>g (3.12)<br />
k,1<br />
mX<br />
k=1<br />
<br />
X<br />
(n)<br />
k<br />
2 <br />
, (2 )W (n) 2<br />
<br />
k<br />
=<br />
mX<br />
k=1<br />
+2<br />
<br />
b 2 (t k ; 1 ) t (n) 2<br />
k<br />
mX<br />
k=1<br />
b(t k ; 1 )( 2 )W (n)<br />
k t (n)<br />
k ;<br />
which tends to 0 <strong>in</strong> probability as n ,! 1. Thus the drift term is <strong>in</strong>significant<br />
as n ,! 1 and we are able to estimate ( 2 ) just as <strong>in</strong> the previous<br />
case. Then by treat<strong>in</strong>g 2 as known we can estimate 1 via the ML method.<br />
F<strong>in</strong>ally, for the general problem with a diusion coecient also depend<strong>in</strong>g<br />
on X<br />
dX t = (; X t )dW t ; (3.13)<br />
<br />
21
a convergence <strong>in</strong> probability result is given by<br />
mX<br />
k=1<br />
jX t<br />
(n)<br />
k<br />
, X (n) t<br />
j 2 ,!<br />
k,1<br />
Z T<br />
0<br />
2 (; X s )ds; (3.14)<br />
for n ,! 1 (see [24], [51], x4). A h<strong>in</strong>t towards the correctness of (3.14) is<br />
given by the well-known equality<br />
E(X 2 t )=<br />
Z t<br />
0<br />
E( 2 (; X s ))ds:<br />
Aga<strong>in</strong><br />
R<br />
note that we cannot estimate directly but are only able to estimate<br />
T<br />
0<br />
2 (; X t )dt.<br />
3.1.2 Discrete observations<br />
In this section we deal with the (more realistic) situation where the diusion<br />
process X (3.1) has only been observed at not necessarily equally spaced<br />
discrete time po<strong>in</strong>ts 0 = t 0 < t 1 < ::: < t n . In our discussion below, we<br />
basicly follow Kloeden, Schurz, Platen and Srensen [46], Bibby [4, 5, 6, 7],<br />
Bibby and Srensen [8] and Pedersen [58, 59, 60, 61, 62, 63].<br />
For the estimation of from discrete observations of X wehave to dist<strong>in</strong>guish<br />
between two cases: the transition densities of X are known or unknown.<br />
If the transition densities p(s; x; t; y; ) of X are known, e.g. <strong>in</strong> the case of<br />
an Ornste<strong>in</strong>-Uhlenbeck process (see p.36, Example 1), an obvious choice of<br />
an estimator for is the Maximum Likelihood Estimator (MLE) ^ n which<br />
maximizes the likelihood function<br />
L n () =<br />
nY<br />
i=1<br />
or equivalently the log-likelihood function<br />
l n () ln L n () =<br />
p(t i,1 ;X ti,1 ;t i ;X ti ; );<br />
nX<br />
i=1<br />
log (p(t i,1 ;X ti,1 ;t i ;X ti ; )) (3.15)<br />
for , see e.g. [2], p.14. In the case of time-equidistant observations (t i =<br />
i; i = 0; 1 :::;n for some xed > 0) Dacunha-Castelle and Florens-<br />
Zmirou [19] prove consistency and asymptotic normality of ^ n as n ! 1,<br />
<strong>in</strong>dependent of the value of . Unfortunately <strong>in</strong> general the transition densities<br />
of X are unknown.<br />
22
When the transition densities of X are unknown, the usual alternative estimator<br />
is dened by approximat<strong>in</strong>g the log-likelihood function for based on<br />
cont<strong>in</strong>uous observations of X. For this log-likelihood function to be dened,<br />
the diusion coecient (t; x; ) =(t; x) has to be known (see Section 3.1.1,<br />
p.17). Under some assumptions (see [51] x7) the log-likelihood function for <br />
based on cont<strong>in</strong>uous observations of X <strong>in</strong> [0;t n ] can be written <strong>in</strong> terms of<br />
<strong>in</strong>tegrals<br />
l c t n<br />
() =<br />
Z tn<br />
0<br />
, 1 2<br />
b (s; X s ; ) T (s; X s )(s; X s ) T ,1<br />
dXs<br />
Z tn<br />
0<br />
b (s; X s ; ) T (s; X s )(s; X s ) T ,1<br />
b (s; Xs ; ) ds; (3.16)<br />
and the usual approximation of these <strong>in</strong>tegrals leads to the approximate loglikelihood<br />
function for based on discrete observations of X<br />
~ ln () =<br />
nX<br />
i=1<br />
, 1 2<br />
with the notation<br />
b (t i,1 ;X ti,1 ; ) T i,1 (X ti , X ti,1 )<br />
nX<br />
i=1<br />
b (t i,1 ;X ti,1 ; ) T i,1 b (t i,1 ;X ti,1 ; )(t i , t i,1 ); (3.17)<br />
i,1 (t i,1 ;X ti,1 ) (t i,1 ;X ti,1 ) T ,1<br />
:<br />
When the diusion coecient also depends on an unknown parameter, the<br />
question arises how the parameter is to be estimated. If divides <strong>in</strong>to two<br />
parts =( 1 ; 2 ), such that b(; ; ) =b(; ; 1 ) and (; ; ) is known up to<br />
the scalar factor 2 , that means (; ; ) = 2 ~(; ), we may avoid the problem<br />
of parameter dependence. In this case 2 2<br />
can be estimated <strong>in</strong> advance by<br />
a quadratic variance-like formula (see Florens-Zmirou [25]), and by <strong>in</strong>sert<strong>in</strong>g<br />
this estimate <strong>in</strong> (; ; ) = 2 ~(; ), the diusion term can be assumed<br />
"known". Then the estimate of the approximate log-likelihood function ~ l n<br />
can be used to estimate 1 .<br />
In the case where the diusion coecient (; ; ) depends on more generally,<br />
Hutton and Nelson [37] show that the discretized score function correspond<strong>in</strong>g<br />
to lt c n<br />
can under certa<strong>in</strong> regularity conditions still be used to estimate<br />
. That means <strong>in</strong> this case we are able to estimate based on discrete<br />
observations of X as well.<br />
However, estimation methods for discrete observations that arise from the<br />
theory of cont<strong>in</strong>uous observations have the undesirable property that the<br />
estimators are strongly biased unless max 1<strong>in</strong> jt i , t i,1 j is "small". If the<br />
23
time between observations is bounded away from zero, especially <strong>in</strong> the case<br />
of time-equidistant observations with xed, Florens-Zmirou [25] shows that<br />
the estimator ~ n of obta<strong>in</strong>ed by maximiz<strong>in</strong>g the approximate log-likelihood<br />
function ~ l n () is <strong>in</strong>consistent.<br />
To overcome the diculties regard<strong>in</strong>g parameter dependence and the dependence<br />
of ~ l n ()onmax 1<strong>in</strong> jt i ,t i,1 j,we will propose three dierent estimation<br />
approaches. The basic idea of the rst two approaches is to nd good approximations<br />
to the transition densities. In the third approach we construct<br />
mart<strong>in</strong>gale estimat<strong>in</strong>g functions.<br />
Approximation to the transition densities of X by a sequence of<br />
transition densities of approximat<strong>in</strong>g Markov processes<br />
The follow<strong>in</strong>g approach is proposed by Pedersen [58, 60].<br />
We shall derive a sequence (l n;N ()) 1 N=1<br />
of approximations to l n (), that<br />
builds a connection between ~ l n () (see (3.17)) and l n () <strong>in</strong> the follow<strong>in</strong>g sense:<br />
the approximation l n;1 () is a generalization of ~ l n () with no restrictions<br />
on (; ; ) regard<strong>in</strong>g parameter dependence, each l n;N () for N 2 is an<br />
improvement of l n;1 () and as N tends to <strong>in</strong>nity l n;N () converges for each<br />
<strong>in</strong> probability tol n ().<br />
The crucial po<strong>in</strong>t of the approach is to approximate the transition densities<br />
p(s; x; t; y; ) of X by a sequence of transition densities (p N (s; x; t; y; )) 1 N=1<br />
of approximat<strong>in</strong>g Markov processes which converges to p(s; x; t; y; ) as N<br />
tends to <strong>in</strong>nity, and then to dene the approximate log-likelihood functions<br />
l n;N () =<br />
nX<br />
i=1<br />
log (p N (t i,1 ;X ti,1 ;t i ;X ti ; )): (3.18)<br />
In the follow<strong>in</strong>g we will derive the approximat<strong>in</strong>g densities p N (s; x; t; y; ).<br />
We remark that we may relax the Lipschitz assumption made <strong>in</strong> (3.1) for b<br />
and a little bit, namely we only assume b and to be locally Lipschitz<br />
cont<strong>in</strong>uous as dened below.<br />
Under the follow<strong>in</strong>g conditions (A1), (A2) and (A3) which must hold for all<br />
2 , the stochastic dierential equation (3.1) has a weak solution for all x 0<br />
and and has the pathwise-uniqueness property which implies the uniqueness<br />
<strong>in</strong> law (see [65], p.132 and p.151; <strong>in</strong> this context see also [73], x5-x8).<br />
(A1) b and are cont<strong>in</strong>uous <strong>in</strong> t for all x 2 IR d .<br />
24
(A2) b and are local Lipschitz cont<strong>in</strong>uous:<br />
for all 0
For xed 0 s < t, x 2 IR d , 2 and N 2 IN, we consider for k =<br />
0; 1;:::;N the Euler approximation (see Appendix B.1):<br />
k = s + k t , s<br />
N<br />
Y (N)<br />
s = x<br />
Y (N)<br />
k<br />
= Y (N)<br />
k,1<br />
+ t , s<br />
N<br />
b ( k,1;Y (N)<br />
k,1<br />
; )+a ( k,1 ;Y (N)<br />
k,1<br />
; ) 1=2 (W ;s<br />
k<br />
, W ;s<br />
k,1<br />
) :<br />
Under (A1)-(A4) we have<br />
Y (N)<br />
N<br />
<strong>in</strong> L 1 (P ;s;x )asN ,! 1(see [45], x10.2).<br />
Y (N)<br />
t ,! X t (3.21)<br />
Theorem 2 For xed 0 s < t, x 2 IR d , 2 and N 2 IN the distribution<br />
of Y (N)<br />
t under P ;s;x has a density p N (s; x; t; ; ) with respect to the<br />
d-dimensional Lebesgue measure d . For N=1<br />
and for N 2<br />
p 1 (s; x; t; y; ) =(2(t , s)) ,d=2 [det(a(s; x; ))] ,1=2<br />
(<br />
exp , 1 [y , x , (t , s)b(s; x; )]T<br />
2(t , s)<br />
a(s; x; ) ,1 [y , z , (t , s)b(s; x; )]<br />
<br />
; (3.22)<br />
p N (s; x; t; y; ) =E P;s;x<br />
<br />
p1 ( N,1 ;Y (N)<br />
N,1<br />
;t;y; ) : (3.23)<br />
For the proof we refer to Pedersen [58].<br />
Our aim is to show that l n;N () will <strong>in</strong>deed be close to l n () for large values of<br />
N, that is we give a result on the convergence of the approximat<strong>in</strong>g densities<br />
p N (s; x; t; y; ) top(s; x; t; y; ) asN ,! 1.<br />
Theorem 3 In addition to (A2) and (A3) assume for all 2 that<br />
(1) b is cont<strong>in</strong>uous <strong>in</strong> t and <strong>in</strong> x, and<br />
(2) a(t; x; ) a() is positive denite.<br />
26
Then p(s; x; t; y; ) exists, and for all 0 s
After these theoretical considerations we deal with the actual calculation of<br />
l n;N () for large values of N.<br />
In order to maximize l n;N () numerical algorithms usually require the value<br />
of l n;N () <strong>in</strong> a nite numberofpo<strong>in</strong>ts . For N =1 l n;N () is explicitly given,<br />
because p 1 has a closed expression, but for N 2 this is <strong>in</strong> general not the<br />
case.<br />
For calculat<strong>in</strong>g l n;N () for N 2, we have to know how to calculate<br />
p N (s; x; t; y; ) for all 0 s < t, x; y 2 IR d and 2 . Consider<strong>in</strong>g aga<strong>in</strong><br />
equation (3.23)<br />
p N (s; x; t; y; ) =E P;s;x<br />
<br />
p1 ( N,1 ;Y (N)<br />
N,1<br />
;t;y; ) ;<br />
the idea is to calculate p N (s; x; t; y; ) by nd<strong>in</strong>g a good approximation to<br />
the right hand side. Denote by (Uk m ) N,1;M<br />
k=1;m=1<br />
an i.i.d. sample from the r-<br />
dimensional standard normal distribution. Then (Y m ) M m=1<br />
=(YN,1) m M m=1<br />
given<br />
by the Euler approximation<br />
Y m<br />
0<br />
= x; m =1;:::;M<br />
Yk m = Y m<br />
k,1<br />
+ t , s<br />
N<br />
b( k,1;Yk,1; m )+<br />
s<br />
t , s<br />
N<br />
( k,1;Y m<br />
k,1; ) U m k<br />
for k =1;:::;N, 1 and m =1;:::;M, has the same distribution as an i.i.d.<br />
sample of Y (N)<br />
N,1<br />
under P ;s;x .Thus we are able to approximate the right hand<br />
side of (3.23), while we calculate<br />
1<br />
M<br />
MX<br />
m=1<br />
p 1 ( N,1 ;Y m ;t;y; ) (3.24)<br />
by means of the sequence (U m k ) N,1;M<br />
k=1;m=1, for M chosen suciently large, and<br />
thereby we are able to calculate p N (s; x; t; y; ) with any given accuracy.<br />
From a practical po<strong>in</strong>t of view it is convenient to simulate the sample<br />
(Uk m ) N,1;M<br />
k=1;m=1<br />
once (see Kloeden and Platen [45]) and store it. Then it can<br />
be used to calculate p N (s; x; t; y; ) for all values of 0 s
where X 0 = 0 and t 0, the log-likelihood function l n () is unknown. The<br />
approximate log-likelihood functions (l n;N ()) 1 N=1<br />
can be used to estimate ,<br />
because (3.25) has a weak solution for all x 0 and for all >0 which is unique<br />
<strong>in</strong> law and (A4) is satised for all >0.<br />
With 0 = 5, n = 1000 and = 0:1 the Milste<strong>in</strong>-scheme (see Appendix<br />
B.2) with time-step =1000 is used to simulate (X i ) n i=0. For such a simulation<br />
the approximate log-likelihood functions l n;N () are calculated for<br />
N =1; 25; 50; 1000. The estimates that are obta<strong>in</strong>ed are shown <strong>in</strong> Table 3.1.<br />
N 1 25 50 1000<br />
^ n;N 3.98 4.76 5.20 5.04<br />
Table 3.1: Example 1. The estimates correspond<strong>in</strong>g to the functions l n;N ().<br />
Approximation of the transition densities of X by means of the<br />
Kalman lter<br />
The follow<strong>in</strong>g approach is proposed by Pedersen [59, 63].<br />
We consider the stochastic system<br />
X i = D i X i,1 + S i + " i ; i =1;:::;n; (3.26)<br />
where (X i ) n i=0<br />
are random d 1 vectors, (D i ) n i=0<br />
are non-random d d matrices,<br />
(S i ) n i=1<br />
are non-random d 1vectors, X 0 N d (x 0 ;V 0 ), " i N d (0;V i ),<br />
i = 1;:::;n and X 0 , " 1 ;:::, " n are stochastically <strong>in</strong>dependent. The nonrandom<br />
elements <strong>in</strong> (3.26) <strong>in</strong>clud<strong>in</strong>g x 0 and (V i ) n i=0<br />
are given up to the parameter<br />
2 IR p .<br />
Though this chapter is about diusion models, we nevertheless look here at<br />
stochastic dierence equations of the type (3.26), s<strong>in</strong>ce as a particular case<br />
of (3.26) we will consider later discretely observed diusion processes given<br />
as solutions to l<strong>in</strong>ear stochastic dierential equations <strong>in</strong> the narrow sense.<br />
These are equations of the follow<strong>in</strong>g k<strong>in</strong>d<br />
dX t =(A t X t + a t )dt + B t dW t ; X 0 = x 0 ; t 0; (3.27)<br />
where A :[0; 1) 7! M dd , a :[0; 1) 7! IR d and B :[0; 1) 7! M dm (d m)<br />
are determ<strong>in</strong>istic functions of t and W is an m-dimensional Wiener process.<br />
Why (3.27) is a particular case of (3.26), will be expla<strong>in</strong>ed below.<br />
29
Consider<strong>in</strong>g the system (3.26) we dist<strong>in</strong>guish two cases depend<strong>in</strong>g on whether<br />
X can be observed completely or only partially, i.e. wether all or only a few<br />
coord<strong>in</strong>ates of X can be observed. First, if the complete observations of<br />
X 0 ;X 1 ;:::;X n are given, we can use the log-likelihood function to estimate<br />
the parameter , see (3.15).<br />
However, the case where X 0 ;X 1 ;:::;X n can only be observed partially and<br />
possibly with measurement errors often arises <strong>in</strong> practice. As mentioned, with<br />
'partially observed' we do not mean partially <strong>in</strong> time but <strong>in</strong> coord<strong>in</strong>ates of<br />
X i . We assume the observable quantities are Y 0 ;Y 1 ;:::;Y n given by<br />
Y i = T i X i + U i + e i ; i =0; 1;:::;n; (3.28)<br />
where (T i ) n i=0<br />
are non-random kd matrices (k d), (U i ) n i=0<br />
are non-random<br />
k 1 vectors, e i N k (0;W i ) and X 0 ;" 1 ;:::;" m ;e 0 ;e 1 ;:::;e n are stochastically<br />
<strong>in</strong>dependent, i = 0; 1;:::;n. The matrices (T i ) specify the observable<br />
parts of (X i ), the vectors (U i ) are other <strong>in</strong>puts and the vectors (e i ) are measurement<br />
errors. It is obvious that both the case of complete observations,<br />
Y i = X i , and the case of partial observations without measurement errors are<br />
conta<strong>in</strong>ed <strong>in</strong> the general case (3.28).<br />
As an application to the case of <strong>in</strong>complete observations th<strong>in</strong>k of a stochastic<br />
volatility model that can be seen as a multi-dimensional process with the<br />
volatility process as the unobservable coord<strong>in</strong>ates.<br />
In the case where all non-random elements <strong>in</strong> (3.26) and (3.28) <strong>in</strong>clud<strong>in</strong>g x 0 ,<br />
(V i ) n i=0<br />
and (W i ) n i=0<br />
are known, we want to obta<strong>in</strong> the, <strong>in</strong> some sense, best<br />
predictions of X 0 ;X 1 ;:::;X n from given observations of Y 0 ;Y 1 ;:::;Y n . The<br />
conditional expectations (E(X i jY i )) n i=0, respectively (E(X i jY i,1 )) n i=1, are the<br />
best predictors of X i given Y i (Y T<br />
0<br />
;:::;Yi<br />
T ) T , respectively given Y i,1 <br />
(Y T<br />
0<br />
;:::;Yi,1) T T , <strong>in</strong> the sense of m<strong>in</strong>imal variance (see Appendix A.1). They<br />
can be calculated by means of an iterative procedure called the Kalman-Bucy<br />
lter.<br />
The Kalman-Bucy lter and lter<strong>in</strong>g theory <strong>in</strong> general are well-studied<br />
for cont<strong>in</strong>uous time stochastic processes with cont<strong>in</strong>uous observations (see<br />
Liptser and Shiryaev [51], Kallianpur [43], ksendal [56] and Appendix A).<br />
Here we consider discrete time stochastic processes, and discretely observed<br />
cont<strong>in</strong>uous time stochastic processes <strong>in</strong> particular. Belowwe give the iterative<br />
procedure used for calculat<strong>in</strong>g (E(X i jY i )) and (E(X i jY i,1 )).<br />
The important po<strong>in</strong>t is that the Kalman-Bucy lter gives besides the predictors<br />
(E(X i jY i )) and (E(X i jY i,1 )) the density p iji,1 for the conditional distribution<br />
of Y i given Y i,1 . That means we are able to calculate the transition<br />
30
densities of Y and thus the log-likelihood function for based on observations<br />
of Y 0 ;Y 1 ;:::;Y n , from which an estimation of can be obta<strong>in</strong>ed. As<br />
Pedersen [59] po<strong>in</strong>ts out, the unknown vector x 0 2 IR d should be treated as<br />
a part of and is not to be chosen at random. The <strong>in</strong>dependence of X 0 and<br />
e 0 implies Y 0 N k (T 0 x 0 + U 0 ;T 0 V 0 T T 0<br />
+ W 0 ). Hence we can calculate x 0 by<br />
x 0 = T ,1<br />
0 (Y 0 ,U 0 ) a.s. if and only if k = d; V 0 = W 0 = 0 and T 0 is of full rank.<br />
Furthermore, the start<strong>in</strong>g po<strong>in</strong>ts of the iterative procedure used to calculate<br />
the densities p iji,1 depend on x 0 .<br />
The Kalman-Bucy lter<br />
Under some technical assumptions (see Pedersen [59], pp. 4{5), we have for<br />
given observations y 0 ;y 1 ;:::;y n of Y 0 ;Y 1 ;:::;Y n<br />
X i jY i = y i N d<br />
<br />
i (y i ); i<br />
<br />
; (3.29)<br />
X i jY i,1 = y i,1 N d<br />
<br />
Di i,1 (y i,1 )+S i ;R i<br />
<br />
; (3.30)<br />
Y i jY i,1 = y i,1 N d<br />
<br />
Ti (D i i,1 (y i,1 )+S i )+U i ;T i R i T T<br />
i + W i<br />
<br />
; (3.31)<br />
where R i = D i i,1 D T i<br />
+ V i is positive denite, and where<br />
0 (y 0 ) = x 0 + V 0 T T 0<br />
<br />
T0 V 0 T T 0<br />
+ W 0<br />
,1<br />
(y0 , T 0 x 0 , U 0 ); (3.32)<br />
0 = V 0 , V 0 T T 0<br />
i (y i ) = D i i,1 (y i,1 )+S i + R i T T<br />
i<br />
<br />
T0 V 0 T T ,1<br />
0<br />
+ W 0 T0 V 0 ; (3.33)<br />
<br />
Ti R i Ti T ,1<br />
+ W i<br />
(y i , T i<br />
<br />
Di i,1 (y i,1 )+S i<br />
<br />
, Ui ) ; (3.34)<br />
i = R i , R i T T<br />
i (T i R i T T<br />
i + W i ) ,1 T i R i : (3.35)<br />
By means of the Kalman-Bucy lter we are now able to calculate l n () for<br />
every xed 2 for given observations y 0 ;y 1 ;:::;y n of Y 0 ;Y 1 ;:::;Y n .<br />
The iterative procedure (by means of the Kalman-Bucy lter)<br />
(0) Calculate 0 (y 0 ) and 0 by means of formula (3.32) and (3.33).<br />
(1) Given i,1 (y i,1 ) and i,1 the conditional distribution of Y i given Y i,1<br />
is known, and so we can calculate the transition density p iji,1 (y i jy i,1 ; ).<br />
(2) Calculate i (y i ) and i by means of formula (3.34) and (3.35) and<br />
return to (1).<br />
31
An Application<br />
As an application of the above theory, we consider the l<strong>in</strong>ear stochastic differential<br />
equation (3.27). The functions A, a, and B are assumed to be cont<strong>in</strong>uous<br />
and given up to the unknown parameter 2 IR p . Under these<br />
assumptions the stochastic dierential equation (3.27) has for every xed<br />
x 0 2 IR d and 2 a unique solution given by<br />
X t = t<br />
x 0 +<br />
Z t<br />
0<br />
,1<br />
u a u du +<br />
where is the determ<strong>in</strong>istic d d matrix process solv<strong>in</strong>g<br />
If<br />
for all t>0 then<br />
Z t<br />
0<br />
<br />
,1<br />
u B u dW u ; t 0 ; (3.36)<br />
d t = A t t dt; 0 = I d ; t 0: (3.37)<br />
A t<br />
Z t<br />
0<br />
Z t<br />
<br />
A s ds = A s ds A t<br />
0<br />
Z t<br />
<br />
t = exp<br />
0<br />
A s ds<br />
is the unique solution to (3.37). We want to estimate from possibly <strong>in</strong>complete<br />
discrete observations of X at time-po<strong>in</strong>ts 0 = t 0
We can use the previous results to estimate from possibly <strong>in</strong>complete observations<br />
of fX ti g n i=0<br />
of the type given by (3.28).<br />
Note that <strong>in</strong> many applications, the non-random elements D ti (3.40), S ti<br />
(3.41) and V ti (3.39), can not be calculated exactly. The solution to (3.37)<br />
may be unknown. In that case fX ti g n i=0<br />
can be approximated by the Euler approximation<br />
(see Appendix B.1). But even if the solution to (3.37) is known,<br />
S ti and V ti often can not be calculated exactly, and hence have to be approximated;<br />
this can be done by dierent methods depend<strong>in</strong>g on the concrete<br />
application.<br />
Mart<strong>in</strong>gale estimat<strong>in</strong>g functions<br />
The follow<strong>in</strong>g approach is proposed by Bibby and Srensen [8].<br />
We consider one-dimensional diusion processes dened by the stochastic<br />
dierential equations<br />
dX t = b(X t ; ) dt + (X t ; ) dW t ; (3.42)<br />
where X 0 = x 0 and t 0. Besides the usual assumptions on b and <strong>in</strong> (3.1),<br />
such that (3.42) has a unique solution for all <strong>in</strong> an open subset IR, the<br />
functions b and are supposed to be twice cont<strong>in</strong>uously dierentiable with<br />
respect to both arguments and is assumed to be positive. In contrast to<br />
(3.1), for convenience here we only consider the time-homogeneous case. To<br />
simplify the exposition further, assume that we can observe fX t g at discrete<br />
equidistant time po<strong>in</strong>ts, say ; 2;:::;n. Later we will give an extension<br />
to the case where X and are multi-dimensional.<br />
Our goal is to estimate the parameter from these discrete observations<br />
X ;X 2 ;:::;X n of fX t g. Inference from discrete time observations can<br />
be based on an approximation of the score function of the cont<strong>in</strong>uous loglikelihood<br />
function l t (). Denote this approximation by ~ _ ln (). For the denition<br />
of the cont<strong>in</strong>uous log-likelihood see 3.1.1, p.17. In the case where does<br />
not depend on the cont<strong>in</strong>uous time log-likelihood function is<br />
l t () =<br />
Z t<br />
0<br />
b(X s ; )<br />
2 (X s ) dX s , 1 2<br />
Z t<br />
0<br />
b 2 (X s ; )<br />
2 (X s )<br />
ds; (3.43)<br />
see also (3.5). If we replace the Lebesgue <strong>in</strong>tegrals and the It^o <strong>in</strong>tegrals by<br />
Riemann-It^o sums and dierentiate with respect to we get the approximate<br />
score function<br />
nX _b(X (i,1) ; )<br />
nX<br />
_~l n () =<br />
<br />
i=1<br />
2 (X (i,1) ) (X b(X (i,1) ; ) b(X<br />
i , X (i,1) ) , <br />
_ (i,1) ; )<br />
:<br />
<br />
i=1<br />
2 (X (i,1) )<br />
(3.44)<br />
33
If depends on we use the same estimat<strong>in</strong>g function<br />
nX<br />
_b(X (i,1) ; )<br />
_~l n () =<br />
<br />
i=1<br />
2 (X (i,1) ; ) (X b(X (i,1) ; ) b(X<br />
i , X (i,1) ) , <br />
_ (i,1) ; )<br />
:<br />
<br />
i=1<br />
2 (X (i,1) ; )<br />
(3.45)<br />
By us<strong>in</strong>g this approach for the estimation of the problem of <strong>in</strong>consistency<br />
arises as already mentioned <strong>in</strong> the <strong>in</strong>troduction of this section, p.23. To avoid<br />
this problem we can use mart<strong>in</strong>gale estimat<strong>in</strong>g functions of which we will<br />
construct four dierent types. The idea is to modify the discretized scorefunction<br />
~ _ ln () <strong>in</strong> suchaway that a zero-mean P -mart<strong>in</strong>gale is obta<strong>in</strong>ed. Then<br />
the estimator can be shown to be consistent and asymptotically normal.<br />
(1) Our rst approach is to compensate _ ~ ln , so that a mart<strong>in</strong>gale ~G n is<br />
obta<strong>in</strong>ed.<br />
By substract<strong>in</strong>g from _ ~ ln () its compensator we get a zero-mean P -mart<strong>in</strong>gale<br />
with respect to the ltration dened by F i = (X ;:::;X i ), i = 1; 2;:::.<br />
The compensator is:<br />
nX <br />
E _<br />
<br />
~ li () , ~ _ <br />
li,1 ()jF i,1<br />
i=1<br />
where<br />
=<br />
nX<br />
i=1<br />
,<br />
nX<br />
_b(X (i,1) ; )<br />
2 (X (i,1) ; ) (F (X (i,1); ) , X (i,1) )<br />
nX<br />
i=1<br />
b(X (i,1) ; ) b(X _ (i,1) ; )<br />
; (3.46)<br />
2 (X (i,1) ; )<br />
F (X (i,1) ; ) E (X i jX (i,1) ): (3.47)<br />
Thus we obta<strong>in</strong> a zero-mean mart<strong>in</strong>gale estimat<strong>in</strong>g function of the form<br />
~G n () =<br />
nX<br />
i=1<br />
_b(X (i,1) ; )<br />
2 (X (i,1) ; )<br />
<br />
Xi , F (X (i,1) ; ) : (3.48)<br />
(2) Alternatively we consider the general class of zero-mean P -<br />
mart<strong>in</strong>gale estimat<strong>in</strong>g functions<br />
G n () =<br />
nX<br />
g i,1 (X (i,1) ; ) X i , F (X (i,1) ; ) ; (3.49)<br />
i=1<br />
where for i =1;:::;n, the function g i,1 is F i,1 -measurable and cont<strong>in</strong>uously<br />
dierentiable <strong>in</strong> . The optimal estimat<strong>in</strong>g function with<strong>in</strong> the class (3.49)<br />
<strong>in</strong> the asymptotic sense of Godambe and Heyde [33] is<br />
G n() =<br />
nX<br />
i=1<br />
_ F (X (i,1) ; )<br />
(X (i,1) ; )<br />
34<br />
<br />
Xi , F (X (i,1) ; ) ; (3.50)
where<br />
(X (i,1) ; ) =E <br />
h<br />
(Xi , F (X (i,1) ; )) 2 jX (i,1)<br />
i<br />
; i =1;:::;n: (3.51)<br />
The function G n() is with<strong>in</strong> the class (3.49) <strong>in</strong> some sense "closest" to the<br />
score function based on the usually unknown exact likelihood function.<br />
It should be mentioned here that for small the mart<strong>in</strong>gale estimat<strong>in</strong>g<br />
function ~ G n (), dened <strong>in</strong> (3.48), is a rst order approximation <strong>in</strong> of G n,<br />
that means ~G n () is approximately optimal.<br />
(3) In some cases there are possibly numerical problems <strong>in</strong> calculat<strong>in</strong>g<br />
F _ (x; ) <strong>in</strong> (3.50). One way to solve these problems <strong>in</strong>volves approximat<strong>in</strong>g<br />
F _ (x; ) up to the order O( 2 ). That leads to a third mart<strong>in</strong>gale estimat<strong>in</strong>g<br />
function<br />
G + n () =<br />
nX <br />
_b(X(i,1) ; ) + 1 _<br />
2 2 b(X(i,1) ; )b 0 (X (i,1) ; )<br />
i=1<br />
+b(X (i,1) ; ) _ b 0 (X (i,1) ; )+ 1 2 (_2 (X (i,1) ; )b 00 (X (i,1) ; )<br />
+ 2 (X (i,1) ; ) b _ i 00 (X i , F (X (i,1) ; ))<br />
(X (i,1) ; )) : (3.52)<br />
(X (i,1) ; )<br />
Altogether we have now found expressions for three dierent zero-mean P -<br />
mart<strong>in</strong>gale estimat<strong>in</strong>g functions.<br />
As for the multi-dimensional case, suppose is k-dimensional, fX t g and<br />
b(X t ; ) are d-dimensional, is a dm-dimensional matrix with T positive<br />
denite and the Wiener process fW t g is m-dimensional. Then the k 1-<br />
dimensional mart<strong>in</strong>gale estimat<strong>in</strong>g functions ~G n and G n have the form<br />
and<br />
~G n () =<br />
G n() =<br />
nX<br />
_b(X (i,1) ; ) T (X (i,1) ; )((X (i,1) ; ) T ,1<br />
i=1<br />
X i , F (X (i,1) ; ) <br />
nX<br />
F _<br />
(X (i,1) ; ) T (X (i,1) ; ) ,1 X i , F (X (i,1) ; ) ;<br />
i=1<br />
where is assumed to be positive denite and _ b and _ F denote the d k-<br />
dimensional matrices of partial derivatives with respect to the components<br />
of .<br />
35
The asymptotic properties of the estimator ^ n we obta<strong>in</strong> from the mart<strong>in</strong>gale<br />
estimat<strong>in</strong>g functions (3.48), (3.50) and (3.52), or more generally from the<br />
class of mart<strong>in</strong>gale estimat<strong>in</strong>g functions G n of the form (3.49), are discussed<br />
by Bibby and Srensen [8]. Under natural regularity conditions (see Bibby<br />
and Srensen [8], pp. 7{9) we have<br />
Theorem 4 An estimator ^ n , which solves the equation<br />
G n (^ n )=0;<br />
exists with probability tend<strong>in</strong>g to one as n ,! 1 under P 0 . Moreover, as<br />
n ,! 1,<br />
^ n ,! 0<br />
<strong>in</strong> probability under P 0 and ^ n is asymptotically normal <strong>in</strong> distribution under<br />
P 0 .<br />
For the proof we refer to [8].<br />
As a rst example we consider the Ornste<strong>in</strong>-Uhlenbeck process where the<br />
transition densities are well-known.<br />
Example 1<br />
The Ornste<strong>in</strong>-Uhlenbeck process is the solution of the stochastic dierential<br />
equation<br />
dX t = X t dt + dW t ; (3.53)<br />
with X 0 = x 0 . In this case the drift coecientisb(x; ) =x and the diusion<br />
coecient (x; ) is assumed to be known. The transition probability is<br />
normal with mean F (x; ) =xe and variance () = 2<br />
2 (e2 , 1). Hence<br />
the estimat<strong>in</strong>g function ~G n has the form<br />
~G n () = 1 2 nX<br />
i=1<br />
and we obta<strong>in</strong> ^ n as solution of ~G n () =0:<br />
X (i,1) (X i , X (i,1) e ) ;<br />
^ n = 1 P ni=1<br />
log X (i,1) X i<br />
Pni=1 :<br />
X 2 (i,1)<br />
The estimators ^ n we obta<strong>in</strong> from the mart<strong>in</strong>gale estimat<strong>in</strong>g functions G +<br />
and G are the same because G + and G are proportional to ~G.<br />
In the next example we consider a wider class of stochastic processes.<br />
36
Example 2<br />
The solutions of the stochastic dierential equation<br />
dX t =( + X t ) dt + (X t ) dW t ; (3.54)<br />
where X 0 = x 0 and the function takes positive values <strong>in</strong> IR, are called<br />
mean-revert<strong>in</strong>g processes (see also model (1.5)). The unknown parameters<br />
are and . Our aim is to be able to calculate the mart<strong>in</strong>gale estimat<strong>in</strong>g<br />
functions ~G n and G n.<br />
Lemma 1 The function<br />
f(t) E ; (X t jX 0 )<br />
solves<br />
f 0 (t) = + f(t): (3.55)<br />
Proof: Write (3.54) <strong>in</strong> <strong>in</strong>tegral form<br />
X t = X 0 +<br />
Condition<strong>in</strong>g on X 0 we have<br />
and equivalently<br />
Z t<br />
0<br />
( + X s )ds +<br />
Z t<br />
0<br />
(X s )dW s :<br />
Z t<br />
<br />
E ; (X t jX 0 ) = E ; (X 0 jX 0 )+E ; ( + X s )dsjX 0<br />
0<br />
<br />
(X s )dW s jX 0 ;<br />
0<br />
| {z }<br />
=0<br />
+E ;<br />
Z t<br />
E ; (X t jX 0 )=X 0 + t + <br />
Z t<br />
0<br />
E ; (X s jX 0 )ds:<br />
We conclude<br />
dE ; (X t jX 0 )<br />
= + E ; (X t jX 0 );<br />
dt<br />
and the claim follows. Note that the function f r (t) =E ; (X t jX r ), 0 r t,<br />
also solves (3.55), for the proof stays the same apart from<br />
E ;<br />
Z t<br />
0<br />
(X s )dW s jX r<br />
<br />
=<br />
Z r<br />
0<br />
(X s )dW s ;<br />
which is <strong>in</strong>dependent of t and thus plays no role <strong>in</strong> the derivative. 2<br />
37
Corollary 2 For<br />
we have<br />
F (X (i,1) ; ; ) =E ;<br />
<br />
Xi jX (i,1)<br />
<br />
F (x; ; ) =xe + (e , 1): (3.56)<br />
Proof: The solution of<br />
f 0 (t) = + f(t); f(t 0 )=f 0 ; t t 0 ;<br />
is<br />
f(t) =f 0 e (t,t 0) + (e(t,t 0) , 1):<br />
Hence we have for E ; (X ti jX ti,1 ) f(t i ) with constant t i , t i,1 for<br />
all i<br />
E ; (X ti jX ti,1 ) = E(X ti,1 jX ti,1 ) e + (e , 1)<br />
and the claim follows. 2<br />
= X ti,1 e + (e , 1);<br />
With (3.56) we are now able to calculate ~G n and G n <strong>in</strong> the follow<strong>in</strong>g way<br />
"<br />
X n <br />
1<br />
~G n (; ) = 2<br />
X i , X (i,1) e + <br />
(X (i,1) )<br />
(1 , e ) ;<br />
nX<br />
i=1<br />
nX<br />
i=1<br />
G n(; ) =<br />
i=1<br />
X (i,1)<br />
2<br />
(X (i,1) )<br />
" n X<br />
i=1<br />
<br />
X i , X (i,1) e + (1 , e )# T<br />
;<br />
<br />
e , 1<br />
X i , X (i,1) e + <br />
(X (i,1) ; ; )<br />
(1 , e ) ;<br />
e (X (i,1) + )+ 2 (1 , e )<br />
(X (i,1) ; ; )<br />
Consider<strong>in</strong>g G + is not <strong>in</strong>terest<strong>in</strong>g s<strong>in</strong>ce F is known.<br />
<br />
X i , X (i,1) e + (1 , e )# T<br />
:<br />
The estimation equation ~G n (; ) = 0 can be solved explicitly. Abbreviat<strong>in</strong>g<br />
2<br />
i,1<br />
2 (X (i,1) )we obta<strong>in</strong><br />
e ~ n<br />
=<br />
<br />
Pni=1 X (i,1)<br />
2<br />
i,1<br />
<br />
Pni=1 X i<br />
2<br />
i,1<br />
<br />
Pni=1 X (i,1)<br />
2<br />
i,1<br />
<br />
,<br />
<br />
Pni=1 X (i,1) X i<br />
2<br />
i,1<br />
2<br />
,<br />
<br />
Pni=1 X 2 (i,1)<br />
2<br />
i,1<br />
38<br />
<br />
Pni=1<br />
1<br />
<br />
Pni=1<br />
<br />
1<br />
2<br />
i,1<br />
<br />
2<br />
i,1<br />
(3.57)
and<br />
~ n =<br />
~ n<br />
1 , e ~ n<br />
<br />
Pni=1 X (i,1)<br />
2<br />
i,1<br />
<br />
e ~ n<br />
,<br />
<br />
Pni=1 X i<br />
2<br />
i,1<br />
<br />
Pni=1<br />
1<br />
2<br />
i,1<br />
<br />
: (3.58)<br />
As regard<strong>in</strong>g the mart<strong>in</strong>gale estimat<strong>in</strong>g function G n we are able to nd a<br />
closed expression for only <strong>in</strong> a few cases where the diusion coecient is<br />
rather simple. For <strong>in</strong>stance, if (x) = p x (as <strong>in</strong> the square root diusion<br />
model (1.5), the Cox Ingersoll Ross model) we have with a similar argument<br />
as for F (that is, the conditional second moment solves an ord<strong>in</strong>ary<br />
dierential equation)<br />
(x; ; ) = 2<br />
2 2 <br />
( +2x)e<br />
2<br />
, 2( + x)e + :<br />
As an extension we consider the mean-revert<strong>in</strong>g process where > 0 enters<br />
the diusion coecient<br />
q<br />
dX t = ,X t dt + + Xt 2 dW t :<br />
With the same argument as above we obta<strong>in</strong> the conditional variance<br />
(x; ; ) =x 2 e ,2 (e , 1) +<br />
<br />
2 , 1 (1 , e(1,2) ):<br />
We remark that <strong>in</strong> practical applications the estimat<strong>in</strong>g equations correspond<strong>in</strong>g<br />
to G n can be solved us<strong>in</strong>g a generalization of Newton's method.<br />
Furthermore, if no explicit expressions for the conditional mean F and the<br />
conditional variance are known, then F and can be approximated by the<br />
sample mean and sample variance of a large number of simulated realizations<br />
of the diusion process at the relevant time po<strong>in</strong>t.<br />
(4) As an extension we shall comb<strong>in</strong>e mart<strong>in</strong>gale estimat<strong>in</strong>g functions<br />
<strong>in</strong> an 'optimal' way.<br />
The follow<strong>in</strong>g approach is proposed by Bibby [5].<br />
If the unknown parameter is multi-dimensional and a part of is only found<br />
<strong>in</strong> the diusion coecient, then us<strong>in</strong>g the mart<strong>in</strong>gale estimat<strong>in</strong>g functions<br />
generated by the conditional mean, ~G n and G n, leads to fewer estimation<br />
equations than parameters. That is <strong>in</strong> this case neither G nor ~G can be<br />
used to estimate the part of that only enters the diusion coecient. This<br />
39
disadvantage of ~G n and G n motivates us to consider mart<strong>in</strong>gale estimat<strong>in</strong>g<br />
functions generated by higher order conditional moments for<br />
example by the conditional variance:<br />
H n () =<br />
nX <br />
2<br />
h(X (i,1) ; ) Xi , F (X (i,1) ; ) , (X(i,1) ; )<br />
i=1<br />
; (3.59)<br />
with given by (3.51) and F given by (3.47). In analogy to (3.50) we obta<strong>in</strong><br />
the optimal estimat<strong>in</strong>g function with<strong>in</strong> the class (3.59), <strong>in</strong> the sense of<br />
Godambe and Heyde [33]. This optimal function takes the form<br />
H n() =<br />
nX<br />
i=1<br />
_(X (i,1) ; )<br />
(X (i,1) ; )<br />
<br />
Xi , F (X (i,1) ; ) 2<br />
, (X(i,1) ; )<br />
; (3.60)<br />
where is assumed to be dierentiable <strong>in</strong> and is the fourth conditional<br />
cumulant<br />
(X (i,1) ; ) =E <br />
<br />
Xi , F (X (i,1) ; ) 4<br />
jX(i,1)<br />
<br />
, (X (i,1) ; ) 2 ;<br />
where i =1;:::;n.<br />
Comb<strong>in</strong><strong>in</strong>g the functions G n and H n leads to further mart<strong>in</strong>gale estimat<strong>in</strong>g<br />
functions that may have better properties. Follow<strong>in</strong>g the optimal way of<br />
comb<strong>in</strong><strong>in</strong>g G n and H n described <strong>in</strong> Heyde [35], the function Kn is obta<strong>in</strong>ed:<br />
" nX _()() , F () ()<br />
Kn() =<br />
(X<br />
() () , 2 i , F ())<br />
()<br />
i=1<br />
F<br />
+ _ ()() , ()() _ # (Xi , F ()) 2 , () ; (3.61)<br />
() () , 2 ()<br />
where for abbreviation the rst argument of all functions on the right hand<br />
side, that is X (i,1) , has been left out and where denotes the third conditional<br />
central moment<br />
(X (i,1) ; ) =E <br />
h<br />
(Xi , F (X (i,1) ; )) 3 jX (i,1)<br />
i<br />
; i =1;:::;n:<br />
For K n, as for G n , it can be shown that an estimator obta<strong>in</strong>ed from the estimat<strong>in</strong>g<br />
equation K n() =0exists, is consistent and asymptotically normal.<br />
Under some regularity conditions (see Bibby [5], pp. 4{6) we have<br />
Theorem 5 An estimator ^ n exists for every n, which on a set C n solves<br />
the equation<br />
K n (^ n )=0;<br />
40
where P 0 (C n ) ,! 1 as n ,! 1. Moreover, as n ,! 1,<br />
^ n ,! 0<br />
<strong>in</strong> probability under P 0 and ^ n is asymptotically normal <strong>in</strong> distribution under<br />
P 0 .<br />
For the proof we refer to [5, 8].<br />
41
3.2 Discrete models<br />
In section 1.2 we dealt with three k<strong>in</strong>ds of stochastic volatility models, follow<strong>in</strong>g<br />
an AR(p)-process, an ARCH or even a GARCH process. Now we<br />
discuss Maximum Likelihood (ML) estimation for these models and derive<br />
the properties of the estimators. F<strong>in</strong>ally, we treat the Bayesian analysis.<br />
3.2.1 AR models<br />
In our representation of estimation theory for the AR(1)-process we follow<br />
[67] x5. The parameter estimation problem for l<strong>in</strong>ear time series (i.e. ARMA<br />
processes) is to be found <strong>in</strong> many textbooks, see for <strong>in</strong>stance [16], [34], [64].<br />
Below we discuss a slightly dierent approach which will turn out to be<br />
useful <strong>in</strong> more general models. The AR(1) case should therefore be seen as a<br />
pedagogic example with respect to this more general methodology.<br />
Consider the autoregressive AR(1) model<br />
X n = X n,1 + " n ; n 1; (3.62)<br />
where 2 IR is the parameter to be estimated, X 0 is a random variable<br />
with E(X 0 ) = 0 and f" n ;n 1g is `noise', where " n N (0; 2 ) i.i.d. with<br />
known 2 .<br />
From equation (3.62) we conclude<br />
X n = " n + " n,1 + :::+ n,1 " 1 + n X 0 ;<br />
and thus the probabilistic properties of fX n g essentially depend on the jo<strong>in</strong>t<br />
distribution of X 0 ;" 1 ;" 2 ;:::.We have<br />
and<br />
Var X n = 2 (1 + 2 + :::+ 2(n,1) )+ 2n Var X 0 (3.63)<br />
E X n X n,k = 2 k (1 + 2 + :::+ 2(n,k,1) )+ k 2(n,k) Var X 0 : (3.64)<br />
In the case jj < 1, we obta<strong>in</strong> from (3.63) and (3.64) by choos<strong>in</strong>g X 0<br />
<br />
N (0; 2<br />
)<br />
1, 2<br />
<br />
E X n =0; Var X n = 2<br />
1 , 2 ; E X n X n,k = 2 k<br />
1 , 2 ;<br />
and fX n g has a stationary distribution.<br />
42
In the case jj 1 we have Var X n ,!1as n ,! 1, that is the process<br />
explodes.<br />
In both cases = 1, fX n g reduces to a random walk.<br />
From these considerations we suppose that as for probabilistic properties of<br />
estimators of , e.g. asymptotic behaviour, we have to dist<strong>in</strong>guish between<br />
the cases jj > 1, jj < 1 and jj =1.<br />
In the follow<strong>in</strong>g we assume for the AR(1) model (3.62) X 0 = 0 and 2 = 1<br />
for convenience.<br />
Denot<strong>in</strong>g by p (X 1 ;:::;X n ) the jo<strong>in</strong>t density of X 1 ;:::;X n , the Maximum<br />
Likelihood Estimator (MLE) ^ n is dened to be a value such that the jo<strong>in</strong>t<br />
density p reaches a maximum that is<br />
^ n = argmax p (X 1 ;:::;X n );<br />
(for the denition of the MLE <strong>in</strong> cont<strong>in</strong>uous time see x3.1.1, p.17). We know<br />
the jo<strong>in</strong>t density p of X 1 ;:::;X n<br />
"<br />
p (X 1 ;:::;X n )=(2) , n 2 exp , 1 #<br />
nX<br />
(X i , X i,1 ) 2 ;<br />
2<br />
and hence are able to calculate ^ n by solv<strong>in</strong>g d d p =0<br />
Insert<strong>in</strong>g (3.62) for X i we obta<strong>in</strong><br />
i=1<br />
P ni=1<br />
X i,1 X i<br />
^ n = Pni=1 :<br />
Xi,1<br />
2<br />
^ n = +<br />
P ni=1<br />
X i,1 " i<br />
Pni=1<br />
X 2 i,1<br />
P a:s: (3.65)<br />
Denot<strong>in</strong>g<br />
M n =<br />
nX<br />
i=1<br />
X i,1 " i ;<br />
we see immediately that the process M n is a mart<strong>in</strong>gale with respect to the<br />
ltration F n = (X 1 ;:::;X n ) under P for any . The mart<strong>in</strong>gale M n is<br />
square <strong>in</strong>tegrable, i.e. EMn<br />
2 < 1, n 0, and we know that the stochastic<br />
sequence Mn 2 is a submart<strong>in</strong>gale (see [66] x7.1, p.455). By means of the<br />
Doob decomposition (see e.g. [66], p.454) there is a mart<strong>in</strong>gale m n and a<br />
predictable 2 <strong>in</strong>creas<strong>in</strong>g 3 sequence hMi n such that<br />
M 2 n = m n + hMi n :<br />
2 A process X n is predictable if X n is F n,1 measurable.<br />
3 A process X n is <strong>in</strong>creas<strong>in</strong>g if X 0 =0,X n X n+1 P a.s.<br />
43
The sequence hMi n is called the square characteristic or the quadratic variation<br />
of M n . Here hMi n is<br />
see [66], p.455.<br />
hMi n =<br />
nX<br />
i=1<br />
Hence, equation (3.65) can be written as<br />
^ n , =<br />
X 2 i,1;<br />
M n<br />
hMi n<br />
: (3.66)<br />
Recall the denition of the Fisher <strong>in</strong>formation <strong>in</strong> the cont<strong>in</strong>uous time case<br />
(see 3.1.1, p.19). Here <strong>in</strong> discrete time the Fisher <strong>in</strong>formation equals<br />
Direct calculation shows<br />
hence<br />
"<br />
#<br />
I n () =E , @2 ln p (x 1 ;:::;x n )<br />
:<br />
@ 2<br />
I n () =E <br />
X<br />
X<br />
2<br />
i,1<br />
;<br />
I n () =E hMi n ;<br />
and therefore hMi n is often called the stochastic Fisher <strong>in</strong>formation. Calculations<br />
based on (3.63) show that for large n the Fisher <strong>in</strong>formation is<br />
approximately<br />
n<br />
; jj < 1;<br />
1, 2<br />
S<strong>in</strong>ce<br />
I n () <br />
8<br />
><<br />
>:<br />
; jj =1;<br />
;<br />
( 2 ,1) 2 jj > 1:<br />
n 2<br />
2<br />
2n<br />
hMi n ,!1<br />
P a:s:;<br />
we can apply the `law of large numbers for square <strong>in</strong>tegrable mart<strong>in</strong>gales'<br />
and obta<strong>in</strong><br />
M n<br />
,! 0 P a:s:;<br />
hMi n<br />
see [66], p.487, Theorem 4. Thus, with (3.66) we conclude that the estimator<br />
is strongly consistent, that is<br />
as n tends to <strong>in</strong>nity.<br />
^ n ,! <br />
P a.s.<br />
As for asymptotic behaviour of the estimator we only state the results and<br />
refer to [67] x5 for a detailed treatment.<br />
44
Theorem 6 Depend<strong>in</strong>g on the value of we have the follow<strong>in</strong>g asymptotic<br />
behaviour for the normalized deviation q I n ()(^ n , ):<br />
q<br />
<br />
lim P I<br />
n,!1 n ()(^ n , ) z =<br />
8<br />
><<br />
>:<br />
(z); jj < 1;<br />
H (z); jj =1;<br />
Ch(z); jj > 1;<br />
where (z) is the standard normal distribution, Ch(z) is the Cauchy distribution<br />
and H (z) is the distribution of the random variable<br />
<br />
2 3 2<br />
where W denotes a Wiener process.<br />
W 2 (1) , 1<br />
R 1<br />
0<br />
W 2 (s)ds ;<br />
Hence, <strong>in</strong> the stationary case jj < 1 the normalized deviation q I n ()(^ n ,)<br />
is asymptotically normally distributed, whereas <strong>in</strong> the other cases it has<br />
as limit distribution the Cauchy distribution or the quite unexpected H <br />
distribution. The question arises wether we can reduce the number of limit<br />
distributions by modify<strong>in</strong>g the normaliz<strong>in</strong>g factor q I n (). Indeed, <strong>in</strong>stead of<br />
us<strong>in</strong>g the Fisher <strong>in</strong>formation I n () =EhMi n , choos<strong>in</strong>g the stochastic Fisher<br />
<strong>in</strong>formation hMi n as normaliz<strong>in</strong>g factor we obta<strong>in</strong><br />
Theorem 7<br />
lim<br />
n,!1 P <br />
q<br />
(<br />
(z); jj 6=1;<br />
hMi n (^ n , ) z =<br />
H (z); jj =1:<br />
Summariz<strong>in</strong>g: the MLE ^ n is (strongly) consistent and the normalized deviations<br />
q I n ()(^ n , ) and q hMi n (^ n , ) are asymptotically distributed as<br />
shown <strong>in</strong> both theorems.<br />
3.2.2 ARCH and GARCH models<br />
In the follow<strong>in</strong>g we concentrate on ARCH and GARCH models and especially<br />
on estimation <strong>in</strong> these models. We refer to the fundamental papers by<br />
Engle [22] and Bollerslev [12] and to Bollerslev, Chou and Kroner [13] and<br />
Bollerslev, Engle and Nelson [14].<br />
Conventional econometric time series models assume constantvariance. However,<br />
over a decade ago risk and uncerta<strong>in</strong>ty considerations lead to the development<br />
of new econometric time series models that allow for the model<strong>in</strong>g<br />
45
of time vary<strong>in</strong>g variances and covariances. Engle [22] <strong>in</strong>troduced such a new<br />
class of stochastic processes, called the class of AutoRegressive Conditional<br />
Heteroscedastic (ARCH) processes where the conditional variances and covariances<br />
depend on the past. These are discrete time stochastic processes<br />
f" t g of the form<br />
" t = z t t (3.67)<br />
with z t i. i. d. , E(z t )=0, Var(z t )=1, and with time-vary<strong>in</strong>g positive t which<br />
is time (t , 1) measurable. Assume <strong>in</strong> the follow<strong>in</strong>g z t N(0; 1) and " t is a<br />
scalar process. The extension to the multivariate case is straightforward, see<br />
e.g. [13].<br />
First we focus directly on the process f" t g (3.67) and assume that " t is itself<br />
observable. However, note that <strong>in</strong> many applications " t corresponds to the<br />
<strong>in</strong>novations for some other stochastic process y t , where y t = f(b; x t )+" t with<br />
" t conditionally distributed N (0; 2 ), f a function of x t which is time (t , 1)<br />
measurable and of a parameter vector b. In this context we will consider<br />
later the ARCH regression model, see (3.75), where f(b; x t ) x 0 tb, that is<br />
the conditional mean of y t is given as x 0 tb.<br />
The conditional variance of " t equals t 2 . We will concentrate <strong>in</strong> the follow<strong>in</strong>g<br />
on some frequently used models for t 2 . Engle [22] suggests one possible<br />
parametrization for t<br />
2 as a l<strong>in</strong>ear function of the past q squared values of<br />
the process<br />
2 t = 0 +<br />
qX<br />
i=1<br />
i " 2 t,i; (3.68)<br />
where for the model to be well-dened 0 > 0 and i 0 for i = 1;:::;q.<br />
This model is known as the l<strong>in</strong>ear ARCH(q) model.<br />
Nowwe discuss Maximum Likelihood (ML) estimation <strong>in</strong> the l<strong>in</strong>ear ARCH(q)<br />
model. For a discussion of the ML estimation see section 3.1.1. Apart from<br />
some constants, the log-likelihood of the tth observation is<br />
l t = , 1 2 log 2 t , 1 2 "2 t = 2 t ; (3.69)<br />
where the term , 1 log 2 2 t arises from the transformation from z t to " t . The<br />
log-likelihood function for the full sample " T ;" T ,1 ;:::;" 1 is<br />
L = 1 T<br />
TX<br />
t=1<br />
l t : (3.70)<br />
To estimate the unknown parameters 0 ( 0 ; 1 ;:::; q ) we maximize the<br />
log-likelihood function, that is the rst order derivatives are set equal to zero.<br />
46
Denot<strong>in</strong>g u 0 t (1;" 2 t,1;:::;" 2 t,q) and h t t<br />
2 we abbreviate (3.68) by<br />
h t = u 0 t:<br />
With this notation the rst order derivatives are<br />
@l t<br />
@ = 1 u t<br />
2h t<br />
" 2 t<br />
h t<br />
, 1<br />
!<br />
: (3.71)<br />
In order to obta<strong>in</strong> the limit<strong>in</strong>g distributions for the normalized deviations of<br />
the estimators <strong>in</strong> (3.76) below, we have to estimate the Fisher <strong>in</strong>formation<br />
matrix F, see x3.1.1, p.19, which is the negative expectation of the Hessian<br />
averaged over all observations. Therefore we calculate the Hessian<br />
@ 2 l t<br />
@@ 0 = , 1<br />
2h 2 t<br />
u t u 0 t<br />
! "<br />
" 2 t "<br />
2<br />
+ t<br />
, 1<br />
h t h t<br />
# @<br />
@ 0 1<br />
2h t<br />
u t<br />
<br />
: (3.72)<br />
S<strong>in</strong>ce the conditional expectation of the factor " 2 t =h t is one and of the second<br />
term <strong>in</strong> (3.72) is zero, the Fisher <strong>in</strong>formation matrix is given by<br />
F = X t<br />
and consistently estimated by<br />
^F = 1 T<br />
1<br />
2T E " 1<br />
h 2 t<br />
X<br />
t<br />
" 1<br />
2h 2 t<br />
u t u 0 t<br />
u t u 0 t<br />
Later we will discuss the properties of the estimators, see p.49.<br />
In many applications with the l<strong>in</strong>ear ARCH(q) model a large number of parameters<br />
and a long lag length q are needed. These problems are avoided by an<br />
alternative, more general parametrization of h t <strong>in</strong>troduced by Bollerslev [12].<br />
This more general model is called the Generalized ARCH, or GARCH(p; q),<br />
model<br />
2 t = 0 +<br />
qX<br />
i=1<br />
i " 2 t,i +<br />
pX<br />
i=1<br />
#<br />
#<br />
:<br />
;<br />
i 2 t,i;<br />
where q > 0, p 0, 0 > 0, i 0 for i = 1;:::;q, and i 0 for<br />
i = 1;:::;p. The generalization <strong>in</strong> the GARCH(p; q) model <strong>in</strong> comparison<br />
to the ARCH(q) model is that beside past values of the process, also past<br />
conditional variances enter.<br />
Denote 0 ( 0 ; 1 ;:::; q ; 1 ;:::; p ), h t t<br />
2 and vt 0 (1;" 2 t,1;:::; " 2 t,q;<br />
h t,1 ;:::;h t,p ). As for the ARCH model we estimate by dierentiat<strong>in</strong>g the<br />
log-likelihood with respect to <br />
@l t<br />
@ = 1 !<br />
@h t " 2 t<br />
, 1 :<br />
2h t @ h t<br />
47
The Hessian is<br />
@ 2 l t<br />
@@ 0 = , 1 @h t @h t<br />
2h 2 t @ @ 0<br />
! " # " #<br />
" 2 t "<br />
2<br />
+ t @ 1 @h t<br />
, 1<br />
h t h t @ 0 ; (3.73)<br />
2h t @<br />
where<br />
@h t<br />
@ = v t +<br />
pX<br />
i=1<br />
i<br />
@h t,i<br />
@ : (3.74)<br />
The dierence from the ARCH model is the <strong>in</strong>clusion of the recursive part <strong>in</strong><br />
(3.74). As for the ARCH model, the Fisher <strong>in</strong>formation matrix is consistently<br />
estimated by the sample analogue of the rst term <strong>in</strong> (3.73).<br />
As mentioned earlier we consider a generalization of model (3.67), (3.68), the<br />
ARCH regression model<br />
" t = y t , x 0 tb;<br />
h t = u 0 t; (3.75)<br />
where " t is conditionally distributed as N (0;h t ), b is a parameter vector<br />
and x t is time (t , 1) measurable. As <strong>in</strong> the l<strong>in</strong>ear ARCH(q) model the<br />
conditional mean of " t is zero, but the conditional mean of y t is given as x 0 tb.<br />
The conditional variance for both " t and y t is h t .<br />
As for the l<strong>in</strong>ear ARCH(q) model we consider ML estimation for the ARCH<br />
regression model. In addition to the parameter estimated as <strong>in</strong> the l<strong>in</strong>ear<br />
ARCH(q) model wehave to estimate parameter b. The derivative with respect<br />
to b is given by<br />
The Hessian is<br />
@l t<br />
@b = " tx 0 t<br />
+ 1 @h t<br />
h t 2h t @b<br />
@ 2 l t<br />
@b@b 0 = , x0 tx t<br />
, 1 @h t @h t<br />
h t 2h 2 t @b @b 0<br />
, 2" tx 0 t<br />
h 2 t<br />
@h t<br />
@b +<br />
"2 t<br />
h t<br />
, 1<br />
" 2 t<br />
h t<br />
, 1<br />
!<br />
" 2 t<br />
h<br />
! t<br />
@<br />
!<br />
:<br />
" 1 @h t<br />
@b 0 2h t @b<br />
Tak<strong>in</strong>g conditional expectations, the last two terms of the Hessian vanish<br />
and " 2 t =h t becomes one, so that the part of the Fisher <strong>in</strong>formation matrix<br />
correspond<strong>in</strong>g to b is given by<br />
F bb = 1 T<br />
X<br />
t<br />
E<br />
" x<br />
0<br />
t x t<br />
+ 1 #<br />
@h t @h t<br />
h t 2h 2 t @b @b 0 ;<br />
#<br />
:<br />
48
and by substitut<strong>in</strong>g the l<strong>in</strong>ear variance function, F bb is consistently estimated<br />
by<br />
^F bb = 1 T<br />
2<br />
X<br />
t<br />
4 x0 tx t<br />
h t<br />
+2 X j<br />
2 j<br />
3<br />
" 2 t,j<br />
x 0<br />
h<br />
t,jx 2 t,j<br />
5 :<br />
t<br />
F<strong>in</strong>ally we give a remark to the ML estimation for the GARCH regression<br />
model<br />
" t = y t , x 0 tb;<br />
h t = v 0 t;<br />
with v t and as above. In order to estimate the mean parameters b we<br />
dierentiate with respect to b as shown for the ARCH regression model with<br />
the s<strong>in</strong>gle dierence<br />
@h t<br />
qX<br />
pX<br />
@b = ,2 @h t,j<br />
j x t,j " t,j + j<br />
@b :<br />
j=1<br />
Pay<strong>in</strong>g attention to this dierence, the part of the Fisher <strong>in</strong>formation matrix<br />
correspond<strong>in</strong>g to b is consistently estimated as <strong>in</strong> the ARCH regression model.<br />
We denote by F , respectively F , the part of the Fisher <strong>in</strong>formation matrix<br />
correspond<strong>in</strong>g to , respectively , and the elements <strong>in</strong> the o-diagonal<br />
block of the <strong>in</strong>formation matrix by F b , respectively by F b . The elements<br />
F b , respectively F b , may be shown to be zero. Because of this asymptotic<br />
<strong>in</strong>dependence , respectively , can be estimated without loss of asymptotic<br />
eciency based on a consistent estimate of b and vice versa. Us<strong>in</strong>g this fact<br />
Engle [22], x6, formulates a simple scor<strong>in</strong>g algorithm for the ML estimation<br />
of the parameters and b.<br />
As for the properties of the estimators, Bollerslev [14] remarks that for the<br />
general ARCH class of models the verication of sucient regularity conditions<br />
for the MLE to be consistent and asymptotically normally distributed<br />
is very dicult. A detailed proof is only worked out <strong>in</strong> a few cases. Normally<br />
one assumes that these regularity conditions are satised, such that<br />
the ML estimators ^, respectively ^, and ^b are consistent and asymptotically<br />
normally distributed with limit<strong>in</strong>g distribution<br />
p<br />
T (^ , ) ,! N(0; F<br />
,1<br />
); resp. p T (^ , ) ,! N(0; F ,1 );<br />
j=1<br />
and p T (^b , b) ,! N(0; F ,1<br />
bb ): (3.76)<br />
Clos<strong>in</strong>g our considerations about estimation <strong>in</strong> ARCH/GARCH models we<br />
remark that Geweke [31] developes Bayesian <strong>in</strong>ference procedures for ARCH<br />
models by us<strong>in</strong>g Monte Carlo methods to determ<strong>in</strong>e the a posteriori distribution.<br />
The Bayesian analysis is discussed <strong>in</strong> the next section.<br />
49
3.2.3 The Bayesian estimation method<br />
Suppose we have the data x = (x 1 ;:::;x n ) with distribution p(xj) where<br />
is the unknown parameter we want to estimate. The basic idea of the<br />
Bayesian approach is to treat the parameter as a random variable and to<br />
use a guess or an a priori knowledge of the distribution () of and then<br />
to estimate by calculat<strong>in</strong>g the a posteriori distribution (jx) of . For<br />
details about the Bayesian theory we refer to [3] and [55]. First of all the<br />
Bayesian method will be described <strong>in</strong> the case where the parameter is onedimensional.<br />
Furthermore, the k-dimensional case and the hierarchical model<br />
will be discussed.<br />
The one-dimensional case<br />
In the one-dimensional case the a posteriori distribution (jx) of is calculated<br />
by the so called Bayes formula us<strong>in</strong>g the a priori distribution () as<br />
follows<br />
(jx) = R p(xj)() ; (3.77)<br />
p(xj)()d<br />
where the denom<strong>in</strong>ator is a proportionality constant mak<strong>in</strong>g the total a posteriori<br />
probability equal to one. Now by us<strong>in</strong>g the a posteriori distribution<br />
(jx) the parameter can be estimated for example via the modus of the a<br />
posteriori distribution<br />
^ = arg max (jx)<br />
or by the mean<br />
or by the median<br />
^ =E[(jx)]<br />
^ = median [(jx)] :<br />
For the asymptotic properties of the estimator we refer to [3], x5.3 Asymptotic<br />
Analysis.<br />
Example (One-dimensional case)<br />
Suppose that x =(x 1 ;:::;x n ) and x i j N(; 2 ) i. i. d., where 2 is known.<br />
Choose an a priori distribution of N( 0 ;0). 2 With the Bayes formula<br />
( ) ( )<br />
,(xi , ) 2 ,( , 0 ) 2<br />
(jx i ) / exp<br />
exp<br />
2 2 20<br />
(<br />
2<br />
/ exp , 1 1<br />
2 + 1 !" 2 , 2 x i 2 + #)<br />
0 0 2<br />
+ ::: ;<br />
2 0<br />
2 2 + 0<br />
2<br />
50
where the terms and factors not written out do not <strong>in</strong>volve , we have<br />
!<br />
;<br />
and with<br />
we have<br />
where x = 1 n<br />
P<br />
xi .<br />
jx i N x i 2 0<br />
+ 0 2<br />
;<br />
2 + 0<br />
2<br />
nY<br />
(jx) / p(x i j) ()<br />
/<br />
i=1<br />
( ) ,(x , )<br />
2<br />
exp<br />
exp<br />
2 2 =n<br />
The multi-dimensional case<br />
1<br />
,2 + ,2<br />
0<br />
( ,( , 0 ) 2<br />
2 2 0<br />
!<br />
jx N xn2 0<br />
+ 0 2 1<br />
;<br />
;<br />
n0 2 + 2 n ,2 + ,2<br />
0<br />
In the multi-dimensional case = ( 1 ;:::; k ), the a posteriori distribution<br />
of can be calculated by the Bayes formula as follows<br />
(jx) =<br />
)<br />
;<br />
p(xj)()<br />
R :::<br />
R p(xj)() d1 :::d k<br />
: (3.78)<br />
By us<strong>in</strong>g the marg<strong>in</strong>al distributions ( i jx) of the jo<strong>in</strong>t a posteriori distribution<br />
(jx)<br />
Z Z<br />
( i jx) = ::: (jx) d 1 :::d i,1 d i+1 :::d k ; (3.79)<br />
we are able to estimate by the ways described <strong>in</strong> the one-dimensional case<br />
above.<br />
Usually problems arise <strong>in</strong> calculat<strong>in</strong>g the <strong>in</strong>tegrals <strong>in</strong> (3.79) which require<br />
approximation techniques. We will discuss simulations of distributions us<strong>in</strong>g<br />
so called Markov Cha<strong>in</strong> Monte Carlo (MCMC) methods. The key idea of<br />
the MCMC methods is described <strong>in</strong> the follow<strong>in</strong>g. For details we refer to<br />
[71], [3] p.353 or [55]. Suppose we want to generate a sample from an a<br />
posteriori distribution (jx) for 2 IR k , but cannot directly do this.<br />
However, suppose we are able to construct a Markov cha<strong>in</strong> with state space<br />
and with equilibrium distribution (jx). Then under suitable regularity<br />
conditions asymptotic results exist, show<strong>in</strong>g <strong>in</strong> which sense the sample output<br />
from such acha<strong>in</strong> with equilibrium distribution (jx) can be used to mimic<br />
a random sample from (jx) or to estimate the expected value of a function<br />
51
f() with respect to (jx). If 1 ; 2 ;:::; t ;:::is a realization from a suitable<br />
cha<strong>in</strong> then<br />
t ,! <br />
<strong>in</strong> distribution as t tends to <strong>in</strong>nity, (jx) and<br />
1<br />
t<br />
tX<br />
i=1<br />
f( i ) ,! E jx [f()] a.s.<br />
as t tends to <strong>in</strong>nity. Nowwe need algorithms to construct such cha<strong>in</strong>s with<br />
specied equilibrium distributions. We will discuss two particular forms of<br />
Markov cha<strong>in</strong> schemes, the Gibbs sampl<strong>in</strong>g algorithm and the Metropolis{<br />
Hast<strong>in</strong>gs algorithm.<br />
The Gibbs sampl<strong>in</strong>g algorithm<br />
Suppose = ( 1 ;:::; k ) is the vector of unknown quantities. We want to<br />
simulate ( i j x) for i = 1;:::;k. Denote by (j x) = ( 1 ;:::; k j x) the<br />
jo<strong>in</strong>t density and by ( i j x; j ;j 6= i) the so called <strong>in</strong>duced full conditional<br />
densities for each of the components i given values of the other components<br />
j , j 6= i, for i =1;:::;k. Suppose that we are able to sample from each of<br />
these one-dimensional distributions.<br />
The Gibbs sampl<strong>in</strong>g algorithm (see Geman and Geman [29]):<br />
1) choose arbitrary start<strong>in</strong>g po<strong>in</strong>ts 0 =( 0 1;:::; 0 k).<br />
2) make random draw<strong>in</strong>gs from the full conditional distribution as follows<br />
1 1<br />
from ( 1 jx; 0 j ;j 6= 1)<br />
1 2<br />
from ( 2 jx; 1 1; 0 3;:::; 0 k)<br />
1 3<br />
from ( 3 jx; 1 1; 1 2; 0 4;:::; 0 k)<br />
:::<br />
1 k from ( k jx; i j;j 6= k)<br />
This completes a transition from 0 =( 0 1;:::; 0 k) to 1 =( 1 1;:::; 1 k).<br />
3) iterat<strong>in</strong>g step 2) produces a sequence 0 ; 1 ;:::; t ;::: which is a realization<br />
of a Markov cha<strong>in</strong> with transition probability K from t to<br />
t+1 kY<br />
K( t ; t+1 )=<br />
l=1<br />
t+1<br />
l<br />
x; t j;j >l; t+1<br />
j ;j
The important property of the Gibbs sampl<strong>in</strong>g algorithm is that we only sample<br />
from the full conditional distributions. As t tends to <strong>in</strong>nity, ( t 1;:::; t k)<br />
tends <strong>in</strong> distribution to a random vector with jo<strong>in</strong>t density (jx), see e.g.<br />
[29]. In particular, t i tends <strong>in</strong> distribution to a random quantity with density<br />
( i jx). The above algorithm is assumed to be replicated m times <strong>in</strong>dependently,<br />
that means we have m replicates of t =( t 1;:::; t k). Then for large t<br />
the replicates ( t i1;:::; t im) are approximately a random sample from ( i jx).<br />
For m suciently large we obta<strong>in</strong> an estimate ^( i jx) for ( i jx) from<br />
^( i jx) = 1 m<br />
mX<br />
l=1<br />
For more details we refer to [55], p.226.<br />
The Metropolis{Hast<strong>in</strong>gs algorithm<br />
( i jx; t jl;j 6= i):<br />
In order to construct a Markov cha<strong>in</strong> 1 ;:::; t ;::: with state space and<br />
equilibrium distribution (jx) we nd the transition probability from t <br />
to the next state t+1 via the Metropolis{Hast<strong>in</strong>gs algorithm as follows:<br />
Denote by q(; 0 ) a transition probability function such that <strong>in</strong> the case<br />
t = , 0 drawn from q(; 0 ) is considered as a proposed possible value for<br />
t+1 .However, a further randomization takes place. With probability (; 0 )<br />
we accept t+1 = 0 , otherwise we reject the value generated from q(; 0 ) and<br />
set t+1 = .<br />
This construction denes a Markov cha<strong>in</strong> with transition probability given<br />
by<br />
(<br />
(; 0 q(; <br />
)=<br />
0 )(; 0 ); if 0 6= ;<br />
1 , P 00 q(; 00 )(; 00 ); if 0 = :<br />
With the denition<br />
(; 0 )=<br />
8<br />
><<br />
>:<br />
( )<br />
( 0 jx)q( 0 ;)<br />
m<strong>in</strong><br />
(jx)q(; 0 ) ; 1 ; if (jx) q(; 0 ) > 0;<br />
1; if (jx) q(; 0 )=0;<br />
provided that q(; 0 ) is chosen to be irreducible and aperiodic on a suitable<br />
state space, we have that (jx) is the equilibrium distribution of the<br />
constructed cha<strong>in</strong>. For more details see [71].<br />
Hierarchical models<br />
An hierarchical model forms a structure <strong>in</strong> the follow<strong>in</strong>g way: the distribution<br />
of the data x is written conditionally on parameters 1 as p(xj 1 ),<br />
and the distribution of 1 is written conditionally on 'hyperparameters' 2 as<br />
53
p( 1 j 2 ) and we have the a priori distribution of 2 , ( 2 ). This is the so called<br />
three-stage hierarchical model which is considered <strong>in</strong> many applications. The<br />
three stages are x, 1 and 2 as above, that means a three-stage model has<br />
the structure p(xj 1 ), p( 1 j 2 ), ( 2 ). We could cont<strong>in</strong>ue this process and<br />
write the distribution of 2 conditionally on other hyperparameters 3 as<br />
p( 2 j 3 ) and so on. We obta<strong>in</strong> the k-stage hierarchical model p(xj 1 ), p( 1 j 2 ),<br />
p( 2 j 3 );:::;p( k,2 j k,1 );( k,1 ).<br />
The model <strong>in</strong>cludes the assumption of conditionally <strong>in</strong>dependence: condition<strong>in</strong>g<br />
on j , the parameters x; 1 ; 2 ;:::; j,1 and the parameters j+1 ;:::; k,1<br />
are <strong>in</strong>dependent. Especially consider the three-stage model with p(xj 1 ),<br />
p( 1 j 2 ), ( 2 ). The distribution p(xj 1 ) is formally the distribution of x<br />
given 1 and 2 . However, if we know 1 then know<strong>in</strong>g 2 would not add<br />
any <strong>in</strong>formation about x, this means x and 2 are <strong>in</strong>dependent given 1 .<br />
By the Bayesian analysis of such a three-stage hierarchical model the special<br />
structure allows us to write the a posteriori distribution ( 1 ; 2 jx) <strong>in</strong> the<br />
form<br />
( 1 ; 2 jx) / p(xj 1 ; 2 )p( 1 ; 2 )<br />
/ p(xj 1 )p( 1 j 2 )( 2 );<br />
and <strong>in</strong>ference for 2 is given by its marg<strong>in</strong>al distribution<br />
( 2 jx) / p(xj 2 )( 2 ):<br />
The marg<strong>in</strong>al distributions to be used for <strong>in</strong>ference for the parameters can<br />
be calculated by MCMC methods.<br />
Example (Hierarchical model)<br />
An example of an hierarchical three-stage stochastic model can be found <strong>in</strong><br />
[42], where the conditional variance follows a log-AR(1) process:<br />
q<br />
y t = h t u t<br />
ln h t = + ln h t,1 + t ;<br />
where (u t ;v t ) N(0; 1) <strong>in</strong>dependent, or more generally we can consider the<br />
log-AR(p)-model<br />
q<br />
y t = h t u t<br />
ln h t = + 1 ln h t,1 + 2 ln h t,2 + :::+ p ln h t,p + t ;<br />
where (u t ;v t ) N(0; 1) <strong>in</strong>dependent. In this model the time series of the data<br />
y is generated from a probability model p(yjh) where h denotes the vector of<br />
54
volatilities. The volatilities h are unobserved and are assumed to be generated<br />
by p(hj) with T =(; ; ). Denote 1 = h, 2 = , T =( 1 ;:::; p ). Via<br />
the Bayes formula we have<br />
( 1 ; 2 jy) / p(yj 1 )p( 1 j 2 )( 2 ):<br />
The marg<strong>in</strong>al distributions p( 1 jy), p( 2 jy) can be calculated by us<strong>in</strong>g MCMC<br />
methods as Gibbs sampl<strong>in</strong>g or Metropolis{Hast<strong>in</strong>gs algorithm.<br />
55
Chapter 4<br />
Nonparametric <strong>Estimation</strong><br />
4.1 Diusion models<br />
Nonparametric estimation <strong>in</strong> general deals with estimat<strong>in</strong>g functionals <strong>in</strong> situations<br />
where the latter are not determ<strong>in</strong>ed by a nite number of parameters,<br />
e.g. estimat<strong>in</strong>g a probability density at some xed po<strong>in</strong>t, derivatives of a density,<br />
and others. For a thorough treatment of nonparametric estimation we<br />
refer to Ibragimov and Has'm<strong>in</strong>skii [38], x4 and x7.<br />
We consider estimation of a probability density at a po<strong>in</strong>t based on observations<br />
<strong>in</strong> IR. Suppose X 1 ;:::;X n is a sample from a population random<br />
variable X tak<strong>in</strong>g values <strong>in</strong> IR with unknown density f(x). If we only know<br />
that f(x) belongs to a class F of functions, estimat<strong>in</strong>g f(x) typically is an<br />
<strong>in</strong>nite dimensional nonparametric problem. Us<strong>in</strong>g the function<br />
(<br />
0; x
tion<br />
F (x) =<br />
Z x<br />
,1<br />
f(y)dy:<br />
The latter statement can be made precise for <strong>in</strong>stance through the Borel-<br />
Cantelli Lemma and its various renements (see [68]) .<br />
In order to construct a density estimator f n (x) start<strong>in</strong>g from the empirical<br />
distribution function F n (x) <strong>in</strong> (4.1), we have to smooth F n . One possibility<br />
is by us<strong>in</strong>g so-called kernel density estimators<br />
nX x , Xk<br />
f n (x) = 1<br />
nh n<br />
k=1<br />
where the kernel V :IR ,! IR + satises<br />
Z 1<br />
,1<br />
V<br />
V (x)dx =1;<br />
and where the so-called bandwidth sequence (h n )issuch that<br />
h n<br />
h n ,! 0; nh n ,!1:<br />
This class of estimators is referred to as the Parzen-Rosenblatt estimators.<br />
Under certa<strong>in</strong> restrictions on f(x) the sequence f n (x) converges <strong>in</strong> some<br />
probabilistic sense to f(x). See [38], x4.4, and [69] for a detailed discussion.<br />
Ibragimov and Has'm<strong>in</strong>skii [38], x7, present some <strong>in</strong>terest<strong>in</strong>g approaches to<br />
and examples of nonparametric estimators of a signal S(t), belong<strong>in</strong>g to a<br />
certa<strong>in</strong> functional space. Two models for signals are considered. A rst one<br />
has the form<br />
dX(t) =S(t)dt + "db(t); (4.2)<br />
with 0 t 1, (b(t)) a Wiener process, " small (typically " # 0) and X(t) is<br />
an observed signal on [0; 1]. A second model is of the form<br />
<br />
;<br />
dX(t) =S(t)dt + db(t); (4.3)<br />
with 0 t n, S(t) a one-periodic function and X(t) is observed over n<br />
time periods.<br />
As a basic example consider estimation of the functional F on S(t)<br />
F (S) =<br />
Z 1<br />
0<br />
f(t)S(t)dt;<br />
with S; f 2 L 2 (0; 1). The functional F has to be estimated based on observations<br />
of (4.2), respectively (4.3). The estimator<br />
^F 1 =<br />
Z 1<br />
0<br />
f(t)dX(t)<br />
57
for model (4.2) and the estimator<br />
^F 2 = 1 n<br />
Z n<br />
0<br />
f(t)dX(t)<br />
for model (4.3) both can be shown to have good properties. Both estimators<br />
are normally distributed with mean F (S) and variance " 2 jjfjj 2 , respectively<br />
jjfjj 2 =n. We refer to [38], x7.4 and x7.5, for more details.<br />
4.2 Discrete models<br />
In section 3.2 we dealt with parametrizations of volatility, e.g. with the<br />
ARCH(q) process f" t g (3.67), where " t jF t,1 N (0;t 2 ), F t be<strong>in</strong>g the <strong>in</strong>formation<br />
-algebra at time t, and<br />
2 t = 0 +<br />
qX<br />
i=1<br />
i " 2 t,i: (4.4)<br />
Instead of consider<strong>in</strong>g a parametric approach as (4.4) we now discuss nonparametric<br />
techniques for the estimation of the volatility 2 t .<br />
Basicly there are two dierent nonparametric approaches: kernel estimators<br />
and Fourier type estimators. For details we refer to Pagan and Schwert [57].<br />
In order to estimate s<br />
2 = E(" 2 sjF s,1 ), kernel methods essentially use a<br />
weighted sum over " 2 j, j =1;:::;T, j 6= s:<br />
^ 2 s =<br />
TX<br />
j=1<br />
j6=s<br />
w (s)<br />
j " 2 j; (4.5)<br />
where the sum over all (T , 1) weights w (s)<br />
j , j =1;:::;T, j 6= s, equals one.<br />
The aim is to obta<strong>in</strong> an estimate of E (" 2 sj" s,1 ;" s,2 ;:::;" s,m ) for m suitably<br />
chosen. If the preced<strong>in</strong>g values of " j , that is " j,1 ;" j,2 ;:::;" j,m , are similar<br />
to the preced<strong>in</strong>g values of " s , that is " s,1 ;" s,2 ;:::;" s,m , then " 2 j is expected<br />
to give useful <strong>in</strong>formation about E(" 2 sj" s,1 ;" s,2 ;:::;" s,m ). In this case, the<br />
weight w (s)<br />
j is large. If the values that preceded " j dier 'substantially' from<br />
the preced<strong>in</strong>g values of " s , then " 2 j is expected to give only little <strong>in</strong>formation<br />
about E (" 2 sj" s,1 ;" s,2 ;:::;" s,m ) and therefore w (s)<br />
j is close to zero. We remark<br />
that <strong>in</strong> the sum (4.5) the element w s<br />
(s) " 2 s is left out to avoid the situation where<br />
'outliers' <strong>in</strong> the data lead to a weight w s<br />
(s) close to unity, while all the other<br />
w (s)<br />
j are close to zero, so that only " 2 s determ<strong>in</strong>es ^ s.<br />
2<br />
58
There are many possible weight<strong>in</strong>g schemes. A popular one uses a Gaussian<br />
kernel. With the notation z 0 j =(" j,1 ;" j,2 ;:::;" j,m )we have<br />
w (s)<br />
j =<br />
where K() istheGaussian kernel<br />
K(z s , z j )=<br />
1<br />
q<br />
2jHj<br />
exp<br />
K(z s , z j )<br />
P Tr=1<br />
K(z r , z s ) ;<br />
<br />
, 1 <br />
2 (z s , z j ) 0 H (z s , z j ) ;<br />
with H = diag(h 1 ;:::;h m ) conta<strong>in</strong><strong>in</strong>g the bandwidths that were set to<br />
^ k T ,1=(4+m) , where ^ k is the sample standard deviation of " s,k , the kth<br />
component of z s , k =1;:::;m.<br />
Another approach to nonparametric estimation of volatility is an approximation<br />
us<strong>in</strong>g series expansion. The most frequently used series expansion <strong>in</strong><br />
economics is the Flexible Fourier Form (FFF) <strong>in</strong>troduced by Gallant [28],<br />
which leads to avolatility estimate of the form<br />
( mX j <br />
)<br />
2X<br />
^ t 2 = 0 + " t,j + j " 2 t,j + ( jk cos(k" t,j )+ jk s<strong>in</strong>(k" t,j )) ;<br />
j=1<br />
k=1<br />
that means t 2 is estimated by a sum of a low-order polynomial and trigonometric<br />
terms based on " t,j , j =1;:::;m. Note the disadvantage of the FFF<br />
that the estimates of t<br />
2 may benegative. We refer to [57] for more details.<br />
59
Chapter 5<br />
Some Diusion <strong>Models</strong> with<br />
Explicit Solutions<br />
In this chapter our representation follows [45], x4.2-4.4.<br />
5.1 L<strong>in</strong>ear stochastic dierential equations<br />
We rst consider the class of l<strong>in</strong>ear stochastic dierential equations. To simplify<br />
the exposition we concentrate on the scalar case and afterwards give<br />
the extension to the multivariate case.<br />
The general form of a scalar l<strong>in</strong>ear stochastic dierential equation is<br />
dX t =(a 1 (t)X t + a 2 (t)) dt +(b 1 (t)X t + b 2 (t)) dW t ; (5.1)<br />
where a 1 ;a 2 ;b 1 ;b 2 are "nice" 1 functions of time t or constants. As <strong>in</strong> the<br />
case of ord<strong>in</strong>ary dierential equations, the method of solution also <strong>in</strong>volves<br />
a fundamental solution of an associated homogeneous dierential equation.<br />
Here the homogeneous l<strong>in</strong>ear equation belong<strong>in</strong>g to the general equation (5.1)<br />
is<br />
dX t = a 1 (t)X t dt + b 1 (t)X t dW t : (5.2)<br />
In order to nd its fundamental solution t;t0 , that means the solution of<br />
(5.2) satisfy<strong>in</strong>g t0 ;t 0<br />
= 1, we rst look at the special case b 1 (t) 0, that<br />
means at the ord<strong>in</strong>ary dierential equation<br />
dX t<br />
dt<br />
= a 1 (t)X t : (5.3)<br />
1 They are Lebesgue measurable and bounded on an <strong>in</strong>terval 0 t T . Then a unique<br />
(strong) solution X t on t 0 t T exists for each 0 t 0
We know the fundamental solution of (5.3)<br />
or equivalently<br />
t;t0 = exp<br />
Z t<br />
t 0<br />
a 1 (s)ds<br />
<br />
; (5.4)<br />
d [ln t;t0 ]=a 1 (t)dt: (5.5)<br />
Equation (5.5) motivates us to consider ln t;t0 also <strong>in</strong> the general case with<br />
t;t0 solv<strong>in</strong>g equation (5.2). Apply<strong>in</strong>g the It^o formula 2 on ln t;t0 we obta<strong>in</strong><br />
<br />
d [ln t;t0 ]= a 1 (t) , 1 <br />
2 b2 1(t) dt + b 1 (t)dW t ; (5.6)<br />
thus, with t0 ;t 0<br />
=1,<br />
or<br />
ln t;t0 =<br />
Z t<br />
<br />
a 1 (s) , 1 Z t<br />
t 0 2 b2 1(s) ds + b 1 (s)dW s ;<br />
t 0<br />
Z t<br />
<br />
t;t0 = exp a 1 (s) , 1 Z t<br />
<br />
t 0 2 b2 1(s) ds + b 1 (s)dW s : (5.7)<br />
t 0<br />
Now that we derived the solution t;t0 of the homogeneous l<strong>in</strong>ear equation<br />
(5.2) we want to obta<strong>in</strong> the general solution X t of (5.1). In an analogous way<br />
to (5.6) we derive by means of the It^o formula 3<br />
d[ ,1<br />
t;t 0<br />
]= ,a 1 (t)+b 2 1(t) ,1<br />
t;t 0<br />
dt , b 1 (t) ,1<br />
t;t 0<br />
dW t : (5.8)<br />
As we will see later we have to consider the process ,1<br />
t;t 0<br />
X t <strong>in</strong> order to<br />
nd an explicit expression for X t . The processes ,1<br />
t;t 0<br />
and X t both <strong>in</strong>clude<br />
the same Wiener process W t , and (5.2) and (5.8) can be seen as a twodimensional<br />
stochastic dierential equation. In order to calculate d( ,1<br />
t;t 0<br />
X t )<br />
we therefore have to use the It^o formula for vector valued processes 4 with<br />
the transformation U(X t ; ,1<br />
t;t 0<br />
)= ,1<br />
t;t 0<br />
X t .We obta<strong>in</strong><br />
d[ ,1<br />
t;t 0<br />
X t ]=(a 2 (t) , b 1 (t)b 2 (t)) ,1<br />
t;t 0<br />
dt + b 2 (t) ,1<br />
t;t 0<br />
dW t :<br />
Hence with t0 ;t 0<br />
=1we get by <strong>in</strong>tegration<br />
,1<br />
t;t 0<br />
X t = X t0 +<br />
2 see Appendix C.1<br />
3 see Appendix C.1<br />
4 see Appendix C.2<br />
Z t<br />
t 0<br />
(a 2 (s) , b 1 (s)b 2 (s)) ,1<br />
s;t 0<br />
ds +<br />
Z t<br />
t 0<br />
b 2 (s) ,1<br />
s;t 0<br />
dW s ;<br />
61
and thus<br />
X t = t;t0<br />
X t0 +<br />
Z t<br />
t 0<br />
(a 2 (s) , b 1 (s)b 2 (s)) ,1<br />
s;t 0<br />
ds +<br />
Z t<br />
<br />
b 2 (s) ,1<br />
s;t 0<br />
dW s : (5.9)<br />
t 0<br />
Summariz<strong>in</strong>g: the solution X t of the general scalar l<strong>in</strong>ear stochastic dierential<br />
equation<br />
is<br />
X t = t;t0<br />
X t0 +<br />
dX t =(a 1 (t)X t + a 2 (t)) dt +(b 1 (t)X t + b 2 (t)) dW t ; (5.10)<br />
Z t<br />
with the fundamental solution<br />
t;t0 = exp<br />
t 0<br />
(a 2 (t) , b 1 (t)b 2 (t)) ,1<br />
t;t 0<br />
ds +<br />
Z t<br />
<br />
a 1 (s) , 1 <br />
t 0 2 b2 1(s)<br />
ds +<br />
Z t<br />
Z t<br />
<br />
b 2 (t) ,1<br />
s;t 0<br />
dW s ; (5.11)<br />
t 0<br />
<br />
b 1 (s)dW s : (5.12)<br />
t 0<br />
All the special cases { as only additive or only multiplicative noise, homogeneous<br />
or <strong>in</strong>homogeneous equations, constant or variable coecients { are<br />
conta<strong>in</strong>ed <strong>in</strong> the general case (5.10).<br />
At this po<strong>in</strong>t we give a short extension of the scalar to the multidimensional<br />
case. The general form of a d-dimensional l<strong>in</strong>ear stochastic dierential equation<br />
is<br />
dX t =(A(t)X t + a(t))dt +<br />
t 0<br />
,1<br />
s;t 0<br />
a(s) ,<br />
i=1<br />
mX<br />
i=1<br />
(B i (t)X t + b i (t))dWt i ; (5.13)<br />
where A(t);B 1 (t);B 2 (t);:::;B m (t) are d d-matrix functions, a(t), b 1 (t),<br />
b 2 (t) :::;b m (t) are d-dimensional vector functions and W = fW t ;t 0g is<br />
an m-dimensional Wiener process with components Wt 1 ;Wt 2 ;:::;Wt<br />
m , which<br />
are <strong>in</strong>dependent scalar Wiener processes. Us<strong>in</strong>g the same arguments as for<br />
the scalar case above we nd the solution of (5.13)<br />
Z t<br />
!<br />
mX<br />
mX Z t<br />
#<br />
X t = t;t0<br />
"X t0 +<br />
B i (s)b i (s) ds +<br />
;<br />
i=1<br />
t 0<br />
,1<br />
s;t 0<br />
b i (s)dW i s<br />
(5.14)<br />
where t;t0 is the d d fundamental matrix with t0 ;t 0<br />
= I and satisfy<strong>in</strong>g<br />
the homogeneous matrix stochastic dierential equation<br />
d t;t0 = A(t) t;t0 dt +<br />
mX<br />
i=1<br />
B i (t) t;t0 dW i<br />
t ; (5.15)<br />
which can be seen column vector by column vector as d vector stochastic differential<br />
equations. We remark that unlike <strong>in</strong> the case of scalar homogeneous<br />
l<strong>in</strong>ear equations we cannot generally solve (5.15) for t;t0 explicitly.<br />
62
5.2 Nonl<strong>in</strong>ear stochastic dierential equations<br />
In the previous section we saw how to solve a scalar l<strong>in</strong>ear stochastic dierential<br />
equation (5.10) explicitly. Under special conditions also some classes<br />
of nonl<strong>in</strong>ear stochastic dierential equations can be solved explicitly by a<br />
reduction to a l<strong>in</strong>ear equation. In the follow<strong>in</strong>g we will exam<strong>in</strong>e those classes<br />
and their reductions <strong>in</strong> the scalar case. We remark that the two models (1.4)<br />
and (1.5) considered <strong>in</strong> chapter 1 do not belong to these classes.<br />
Certa<strong>in</strong> nonl<strong>in</strong>ear stochastic dierential equations<br />
dX t = a(t; X t ) dt + b(t; X t ) dW t ; (5.16)<br />
with a(t; X t );b(t; X t ) dierentiable functions, can be reduced under special<br />
conditions to l<strong>in</strong>ear stochastic dierential equations<br />
dY t =(a 1 (t)Y t + a 2 (t))dt +(b 1 (t)Y t + b 2 (t)) dW t (5.17)<br />
by a transformation U(t; X t )=Y t .We knowby means of the Inverse Function<br />
Theorem that <strong>in</strong> case of @U (t; x) 6= 0 a local <strong>in</strong>verse x = V (t; y)ofy = U(t; x)<br />
@x<br />
exists.<br />
Our purpose is to determ<strong>in</strong>e conditions for a and b <strong>in</strong> (5.16), so that such<br />
a suitable transformation U(t; x) = y (with @U (t; x) 6= 0) and coecient<br />
@x<br />
functions a 1 (t);a 2 (t);b 1 (t) and b 2 (t) can be found. Then we solve the l<strong>in</strong>ear<br />
equation (5.17) explicitly (by means of (5.11)) and with X t = V (t; Y t )we get<br />
a solution of (5.16).<br />
In order to determ<strong>in</strong>e conditions for the reducibility of (5.16) to (5.17), that<br />
means conditions for a and b,we apply the It^o formula 5 to the transformation<br />
U(t; x)<br />
dU(t; X t )=<br />
" @U<br />
@t + a@U @x + 1 #<br />
@2 U<br />
2 b2 dt + b @U<br />
@x 2 @x dW t: (5.18)<br />
A transformation U exists, if equations (5.17) and (5.18) are the same, that<br />
means if the follow<strong>in</strong>g conditions for the coecients are satised:<br />
and<br />
@U<br />
(t; x)+a@U<br />
@t @x (t; x)+1 2 b2 (t; x) @2 U<br />
@x (t; x) =a 1(t)U(t; x)+a 2 2 (t) (5.19)<br />
5 see Appendix C.1<br />
b(t; x) @U<br />
@x (t; x) =b 1(t)U(t; x)+b 2 (t): (5.20)<br />
63
If we do not specialize at this po<strong>in</strong>t, we are not able to calculate much more<br />
further and therefore we restrict ourselves to two cases.<br />
First, we consider the case a 1 (t) b 1 (t) 0 and denote a 2 (t) =: (t)<br />
and b 2 (t) =: (t). In order to obta<strong>in</strong> conditions for a and b it is useful<br />
to dierentiate equation (5.19) with respect to x and equation (5.20) with<br />
respect to t:<br />
!<br />
@ 2 U @<br />
(t; x)+ a(t; x)@U<br />
@t@x @x @x (t; x)+1 2 b2 (t; x) @2 U<br />
@x (t; x) =0 (5.21)<br />
2<br />
and<br />
b(t; x) @2 U<br />
(t; x)+@b(t; x)@U<br />
@t@x @t @x (t; x) =0 (t): (5.22)<br />
By means of (5.21) and (5.22) we obta<strong>in</strong><br />
"<br />
0 1 @b(t; x)<br />
(t) =(t)<br />
(t; x) , b(t; x) @ a(t; x)<br />
b(t; x) @t<br />
@x b(t; x) , 1 !#<br />
@b<br />
2 @x (t; x) :<br />
(5.23)<br />
In (5.21) and hence <strong>in</strong> (5.23) we assumed that the left hand side of (5.19) is<br />
<strong>in</strong>dependent of x, <strong>in</strong> other words, we assumed that (t) can be determ<strong>in</strong>ed.<br />
S<strong>in</strong>ce is <strong>in</strong>dependent ofx we now follow the condition for the determ<strong>in</strong>ation<br />
of from equation (5.23)<br />
@f<br />
(t; x) =0; (5.24)<br />
@x<br />
where<br />
f(t; x) = 1<br />
b(t; x)<br />
@b(t; x)<br />
(t; x) , b(t; x) @<br />
@t<br />
@x<br />
a(t; x)<br />
b(t; x) , 1 !<br />
@b<br />
2 @x (t; x) : (5.25)<br />
If condition (5.24) is satised we are able to determ<strong>in</strong>e and . Thus condition<br />
(5.24) is sucient for the reducibility of equation (5.16) to the l<strong>in</strong>ear<br />
equation<br />
dX t = (t)dt + (t)dW t (5.26)<br />
by means of a transformation U. Integrat<strong>in</strong>g equations (5.20) and (5.23) we<br />
obta<strong>in</strong> the explicit expression<br />
U(t; x) =C exp<br />
with an arbitrary constant C.<br />
Z t<br />
0<br />
Z x<br />
f(s; x)ds<br />
0<br />
1<br />
dz; (5.27)<br />
b(t; z)<br />
64
The second case to consider is the time-<strong>in</strong>dependent case. We want to reduce<br />
the nonl<strong>in</strong>ear autonomous stochastic dierential equation<br />
to the l<strong>in</strong>ear stochastic dierential equation<br />
dX t = a(X t )dt + b(X t )dW t (5.28)<br />
dY t =(a 1 Y t + a 2 )dt +(b 1 Y t + b 2 )dW t ; (5.29)<br />
by means of a suitable transformation Y t = U(X t ).<br />
Equations (5.19) and (5.20) simplify to<br />
a(x) dU<br />
dx (x)+1 2 b2 (x) d2 U<br />
dx 2 (x) =a 1U(x)+a 2 (5.30)<br />
and<br />
b(x) dU<br />
dx (x) =b 1U(x)+b 2 : (5.31)<br />
We assume b(x) 6 0 and b 1 6= 0 and conclude from (5.31)<br />
with an arbitrary constant C and<br />
U(x) =C exp(b 1 h(x)) , b 2<br />
b 1<br />
(5.32)<br />
h(x) =<br />
Z x<br />
x 0<br />
ds<br />
b(s) : (5.33)<br />
As <strong>in</strong> the previous case we want to nd conditions for a and b. Therefore<br />
we substitute (5.32) <strong>in</strong> (5.30), dierentiate, multiply with<br />
,1<br />
dU<br />
dx =<br />
b(x)<br />
exp(,b 1 h(x)) and obta<strong>in</strong> an expression with only a 1 on the right hand<br />
b 1<br />
side. Dierentiat<strong>in</strong>g aga<strong>in</strong> we get the follow<strong>in</strong>g condition<br />
b 1<br />
dA<br />
dx (x)+ d<br />
dx<br />
!<br />
b(x)dA dx (x) =0 (5.34)<br />
with<br />
A(x) = a(x)<br />
b(x) , 1 db<br />
(x): (5.35)<br />
2 dx<br />
This condition is satised <strong>in</strong> the trivial case dA = 0 for any b dx 1, or <strong>in</strong> the case<br />
0 1<br />
d<br />
@ d<br />
dx b<br />
dA<br />
dx<br />
A =0; (5.36)<br />
dx<br />
dA<br />
dx<br />
65
where we assume the choice of b 1<br />
b 1 = ,<br />
d<br />
dx<br />
<br />
b<br />
dA<br />
dx<br />
dA<br />
dx<br />
<br />
: (5.37)<br />
If b 1 6= 0 then we may choose b 2 = 0 and use the transformation<br />
U(x) =C exp(b 1 h(x)): (5.38)<br />
If b 1 =0we use<br />
U(x) =b 2 h(x)+C: (5.39)<br />
Now we apply the derived theory to three groups of examples.<br />
Example 1 The stochastic dierential equation<br />
dX t = 1 2 g(X t)g 0 (X t )dt + g(X t )dW t ; (5.40)<br />
where g is a given dierentiable function, is reducible with the general solution<br />
X t = h ,1 (W t + h(X 0 )); (5.41)<br />
with<br />
h(x) =<br />
Z x<br />
x 0<br />
ds<br />
g(s) : (5.42)<br />
In the follow<strong>in</strong>g we show how to obta<strong>in</strong> the general solution (5.41). With<br />
the notation of the theory above we see that A(x) = 0, and hence (5.34) is<br />
satised for all b 1 . We choose b 1 =0,b 2 = 1 and obta<strong>in</strong><br />
U(x) =h(x)+C:<br />
Insert<strong>in</strong>g U <strong>in</strong> (5.30) gives a 1 (h(x)+C)+a 2 =0,and with a 1 =0,a 2 =0,<br />
(5.29) reduces to dY t = dW t with the solution Y t = Y 0 + W t : With C =0we<br />
have Y t = h(X t ), especially Y 0 = h(X 0 ), and hence obta<strong>in</strong> (5.41).<br />
We remark that <strong>in</strong> the special case (5.40) we may nd a solution <strong>in</strong> another<br />
more pleasantway. Equation (5.40) is equivalent to the Stratonovich stochastic<br />
dierential equation (for the Stratonovich <strong>in</strong>tegral see e.g. [56], p.16, or<br />
[45], x4.9)<br />
dX t = b(X t ) dW t :<br />
66
That means, <strong>in</strong>stead of reduc<strong>in</strong>g (5.40) to a l<strong>in</strong>ear stochastic dierential<br />
equation we may <strong>in</strong>tegrate the Stratonovich stochastic dierential equation<br />
directly as well, obta<strong>in</strong><strong>in</strong>g (5.41) at once.<br />
Applications of Example 1:<br />
dX t = 1 2 a2 X t dt + aX t dW t ;<br />
X t = X 0 exp(aW t ): (5.43)<br />
a(a , 1)X<br />
1,2=a<br />
t<br />
dX t = 1 2<br />
X t = W t + X 1=a<br />
0<br />
dt + aX 1,1=a<br />
t dW t ;<br />
a<br />
: (5.44)<br />
q<br />
dX t = 1dt +2<br />
X t =<br />
<br />
W t +<br />
X t dW t ;<br />
q<br />
X 0<br />
2<br />
: (5.45)<br />
dX t = 1 2 a2 mX 2m,1<br />
t dt + aX m t dW t ; m 6= 1;<br />
X t = X 1,m<br />
0<br />
, a(m , 1)W t<br />
1=(1,m)<br />
: (5.46)<br />
dX t = 1 2 X tdt +<br />
q<br />
X 2 t +1dW t ;<br />
X t = s<strong>in</strong>h(W t + arcs<strong>in</strong>hX 0 ): (5.47)<br />
Example 2 The stochastic dierential equation<br />
<br />
dX t = g(X t )+ 1 <br />
2 g(X t)g 0 (X t ) dt + g(X t )dW t ; (5.48)<br />
with a given dierentiable function g, is reducible with the general solution<br />
X t = h ,1 (t + W t + h(X 0 )); (5.49)<br />
67
where h is given by (5.42).<br />
Aga<strong>in</strong> we want to show how (5.49) can be derived. We see A(x) = , and<br />
hence b 1 can be chosen arbitrarily. We choose b 1 =0and b 2 = 1 and obta<strong>in</strong><br />
U(x) =h(x)+C:<br />
Insert<strong>in</strong>g U <strong>in</strong> (5.30) gives = h(x)a 1 + a 2 ; and hence a 1 =0,a 2 = . We<br />
have dY t = dt + dW t with its solution Y t = t + W t + Y 0 . With C =0we<br />
have Y t = h(x t ), especially Y 0 = h(X 0 ), and we obta<strong>in</strong> (5.49).<br />
As <strong>in</strong> the previous example we remark that (5.48) is equivalent to the<br />
Stratonovich stochastic dierential equation<br />
dX t = b(X t )dt + b(X t ) dW t ;<br />
which can be <strong>in</strong>tegrated directly <strong>in</strong> order to obta<strong>in</strong> the general solution (5.49).<br />
The applications of example 1 we gave can all be modied to represent applications<br />
to example 2. For <strong>in</strong>stance, consider<br />
dX t =<br />
1<br />
q q<br />
2 X t + Xt 2 +1 dt + Xt 2 +1dW t ;<br />
X t = s<strong>in</strong>h(t + W t + arcs<strong>in</strong>hX 0 ): (5.50)<br />
Example 3 The stochastic dierential equation<br />
<br />
dX t = g(X t )h(X t )+ 1 <br />
2 g(X t)g 0 (X t ) dt + g(X t )dW t ; (5.51)<br />
where g is a given dierentiable function and h is given by (5.42), can be<br />
reduced to the Langev<strong>in</strong> equation of the form<br />
dY t = ,aY t dt + bdW t ;<br />
with a = , and b = 1, and has the general solution<br />
X t = h ,1 <br />
e t h(X 0 )+e t Z t<br />
0<br />
e ,s dW s<br />
<br />
: (5.52)<br />
Aga<strong>in</strong> let us show how we derive the solution (5.52). We see A(x) = h(x)<br />
and by (5.37) we obta<strong>in</strong> b 1 =0.With b 2 =1we have<br />
U(x) =h(x)+C:<br />
68
Insert<strong>in</strong>g U <strong>in</strong> (5.30) gives h(x) =a 1 h(x)+a 2 and hence a 1 = and a 2 =0.<br />
We have dY t = Y t dt + dW t with its solution<br />
Z t<br />
Y t = e t Y 0 + e t e ,s dW s :<br />
With C =0wehave Y t = h(Y t ), especially Y 0 = h(X 0 ), and we obta<strong>in</strong> (5.52).<br />
The applications of example 1 we gave can all be modied to represent applications<br />
to example 3. For <strong>in</strong>stance, consider<br />
dX t =<br />
q<br />
2a<br />
X t = e at X 0 +<br />
X t X t +1<br />
Z t<br />
0<br />
0<br />
<br />
q<br />
dt +2 X t dW t ;<br />
e ,as dW s<br />
<br />
: (5.53)<br />
We remark that some examples of nonl<strong>in</strong>ear reducible stochastic dierential<br />
equations that are not <strong>in</strong>cluded <strong>in</strong> the preced<strong>in</strong>g three cases are listed <strong>in</strong> [45],<br />
p.124-126.<br />
F<strong>in</strong>ally, we aga<strong>in</strong> put stress on the fact that the two models (1.4) and (1.5)<br />
considered <strong>in</strong> chapter 1 are not reducible and cannot be solved explicitly.<br />
69
Appendix A<br />
The Kalman-Bucy Filter<br />
A.1 The cont<strong>in</strong>uous case<br />
For the Kalman-Bucy lter <strong>in</strong> the cont<strong>in</strong>uous case we refer to e.g. Kallianpur<br />
[43], x10. In our notation we follow ksendal [56], x4, pp. 46{68.<br />
A.1.1<br />
The general lter<strong>in</strong>g problem<br />
Consider<br />
dX t = b(t; X t ) dt + (t; X t ) dU t ; (System) (A.1)<br />
dZ t = c(t; X t ) dt + (t; X t ) dV t ; (Observations) (A.2)<br />
where U and V are <strong>in</strong>dependent Brownian Motions, U t p-dimensional, V t<br />
r-dimensional.<br />
The lter<strong>in</strong>g problem is the follow<strong>in</strong>g. Given the observations Z s satisfy<strong>in</strong>g<br />
(A.2) for 0 s t, what is the best estimate ^X t of the state X t of (A.1)<br />
based on these observations?<br />
To solve this problem we formulate it mathematically. Let G t be the -algebra<br />
generated by fZ s (); s tg, (; F;P) the probability space correspond<strong>in</strong>g<br />
to the (p + r)-dimensional Brownian motion (U t ;V t ) and<br />
K = fS : 7! IR n ; S 2 L 2 (; F;P); SG,measurableg :<br />
\Based on the observations Z s " means that the estimate ^X t is G t -measurable.<br />
^X t shall be \the best estimate based on the observations Z s " which means<br />
that<br />
E[jX t , ^X t j 2 ]= <strong>in</strong>f<br />
S2K fE[jX t , Sj 2 ]g:<br />
70
Let P Kt (X t ) denote the orthogonal projection from L 2 (; F;P)onto the subspace<br />
K t . The follow<strong>in</strong>g statement holds<br />
^X t = P Kt (X t )=E(X t jG) :<br />
We will concentrate on the l<strong>in</strong>ear case, which allows an explicit solution <strong>in</strong><br />
terms of a stochastic dierential equation for ^X t . This method is called the<br />
Kalman-Bucy lter.<br />
A.1.2<br />
The l<strong>in</strong>ear lter<strong>in</strong>g problem<br />
To focus on the ma<strong>in</strong> ideas consider only the one-dimensional case<br />
dX t = F (t)X t dt + C(t) dU t ; (System) (A.3)<br />
dZ t = G(t)X t dt + D(t) dV t ; (Observations) (A.4)<br />
where the real functions F (t), G(t), C(t), D(t) are bounded on bounded<br />
<strong>in</strong>tervals. Furthermore assume that D(t) is bounded away from 0 on bounded<br />
<strong>in</strong>tervals, Z 0 =0,X 0 is normally distributed and <strong>in</strong>dependent of fU t g, fV t g<br />
and E[X 0 ] = 0 (see ksendal [56], p. 48f). The ideas and techniques <strong>in</strong> the<br />
one-dimensional case can be extended <strong>in</strong> an analogous way to the multidimensional<br />
case.<br />
By means of some technical transformations of P Kt (X t ) and calculations we<br />
obta<strong>in</strong><br />
The Kalman-Bucy Filter<br />
The solution<br />
^X t = P Kt (X t )=E(X t jG)<br />
of the l<strong>in</strong>ear lter<strong>in</strong>g problem (A.3), (A.4) satises the stochastic dierential<br />
equation<br />
d ^X t =<br />
"<br />
#<br />
F (t) , G2 (t)S(t)<br />
D 2 (t)<br />
^X t dt + G(t)S(t)<br />
D 2 (t)<br />
dZ t ; ^X 0 =E[X 0 ] ;<br />
where S(t) =E[(X t , ^X t ) 2 ] satises the \Riccati-equation"<br />
(A.5)<br />
dS<br />
dt =2F (t)S(t) , G2 (t)<br />
D 2 (t) S2 (t)+C 2 (t) ;<br />
(A.6)<br />
with S(0) =E[(X 0 ,E[X 0 ]) 2 ].<br />
71
A.2 The discrete case<br />
For reasons of completeness we aga<strong>in</strong> note here the Kalman-Bucy lter for<br />
the discrete case, as already given <strong>in</strong> section 3.1.2.<br />
Consider the stochastic system<br />
X i = D i X i,1 + S i + " i ; i =1;:::;n; (System)<br />
(A.7)<br />
where fX i g n i=0<br />
are random d 1vectors, fD i g n i=0<br />
are non-random d d matrices,<br />
fS i g n i=1<br />
are non-random d1 vectors, X 0 N d (x 0 ;V 0 ), " i N d (0;V i ),<br />
i = 1;:::;n and X 0 , " 1 ;:::, " n are stochastically <strong>in</strong>dependent. Assume the<br />
observable quantities are Y 0 ;Y 1 ;:::;Y n given by<br />
Y i = T i X i + U i + e i ; i =0; 1;:::;n; (Observations)<br />
(A.8)<br />
where fT i g n i=0<br />
are non-random k d matrices (k d), fU i g n i=0<br />
are nonrandom<br />
k 1 vectors, e i N k (0;W i ) and X 0 ;" 1 ;:::;" m ;e 0 ;e 1 ;:::;e n are<br />
stochastically <strong>in</strong>dependent, i =0; 1;:::;n.<br />
The Kalman-Bucy lter<br />
Under some assumptions (see Pedersen [59], pp. 4{5), we have for given<br />
observations y 0 ;y 1 ;:::;y n of Y 0 ;Y 1 ;:::;Y n<br />
X i jY i = y i N d<br />
<br />
i (y i ); i<br />
<br />
; (A.9)<br />
X i jY i,1 = y i,1 N d<br />
<br />
Di i,1 (y i,1 )+S i ;R i<br />
<br />
; (A.10)<br />
Y i jY i,1 = y i,1 N d<br />
<br />
Ti (D i i,1 (y i,1 )+S i )+U i ;T i R i T T<br />
i<br />
+ W i<br />
<br />
;(A.11)<br />
where R i = D i i,1 D T i<br />
+ V i is positive denite, and where<br />
0 (y 0 ) = x 0 + V 0 T T 0<br />
<br />
T0 V 0 T T 0<br />
+ W 0<br />
,1<br />
(y0 , T 0 x 0 , U 0 ); (A.12)<br />
0 = V 0 , V 0 T T 0<br />
i (y i ) = D i i,1 (y i,1 )+S i + R i T T<br />
i<br />
<br />
T0 V 0 T T ,1<br />
0<br />
+ W 0 T0 V 0 ; (A.13)<br />
<br />
Ti R i Ti T ,1<br />
+ W i<br />
(y i , T i<br />
<br />
Di i,1 (y i,1 )+S i<br />
<br />
, Ui ) ; (A.14)<br />
i = R i , R i T T<br />
i (T i R i T T<br />
i + W i ) ,1 T i R i : (A.15)<br />
72
Appendix B<br />
Numerical Methods<br />
In our representation we follow [45], x9 and x10.<br />
B.1 The Euler Scheme<br />
The Euler approximation is a basic discrete time method to approximate an<br />
It^o process. Consider an It^o process X = fX(t);t 0 t T g follow<strong>in</strong>g the<br />
scalar stochastic dierential equation<br />
dX t = a(t; X t )dt + b(t; X t )dW t ;<br />
with t 0 t T and the <strong>in</strong>itial condition X t0 = X 0 . A discretization t 0 =<br />
t 0
we can write the Euler scheme (B.1) <strong>in</strong> the impressive form<br />
Y n+1 = Y n + a n + bW n<br />
(B.2)<br />
for n =0; 1;:::;N , 1.<br />
In order to compute the sequence fY n ;n = 0; 1;:::;N , 1g of values of<br />
the Euler approximation we have to generate the random <strong>in</strong>crements W n<br />
for n = 0; 1;:::;N , 1 of the Wiener process W = fW t ;t 0g. These<br />
<strong>in</strong>crements are <strong>in</strong>dependent Gaussian random variables with E(W n ) = 0<br />
and Var(W n )= n and can be generated by a random number generator<br />
(see e.g. [45], x1.3).<br />
For the multi-dimensional case of the Euler scheme see e.g. [45], x10.2.<br />
Note that when the diusion coecient b is identically zero the stochastic<br />
iterative scheme (B.2) reduces to the well-known determ<strong>in</strong>istic Euler scheme<br />
for the ord<strong>in</strong>ary dierential equation x 0 = a(t; x).<br />
We <strong>in</strong>troduce the notion of strong convergence.<br />
Denition 1 A time discrete approximation Y with maximum step size <br />
converges strongly to X at time T if<br />
lim E(jX T , Y (T )j) =0:<br />
#0<br />
The rate of strong convergence is crucial if we want to compare dierent time<br />
discrete approximation methods.<br />
Denition 2 A time discrete approximation Y converges strongly with<br />
order > 0 at time T , if there exists a constant C > 0, <strong>in</strong>dependent of ,<br />
and a 0 > 0 such that<br />
for each 2 (0; 0 ).<br />
E(jX T , Y (T )j) C <br />
Under some regularity assumptions the Euler scheme converges strongly<br />
with order =0:5 :<br />
Theorem 8 Suppose<br />
E(jX 0 j 2 ) < 1;<br />
E jX 0 + Y <br />
0<br />
j 2 1 2<br />
C 1 1 2 ;<br />
ja(t; x) , a(t; y)j + jb(t; x) , b(t; y)j C 2 jx , yj;<br />
ja(t; x)j + jb(t; x)j C 3 (1 + jxj)<br />
74
and<br />
ja(s; x) , a(t; x)j + jb(s; x) , b(t; x)j C 4 (1 + jxj)js , tj 1 2<br />
for all s; t 2 [0;T] and x; y 2 IR d . Then<br />
for the Euler approximation Y .<br />
E jX T , Y (T )j C 5 1 2<br />
For the proof see [45], x10.2. We will compare the Euler method with the<br />
Milste<strong>in</strong> method described <strong>in</strong> the next section.<br />
B.2 The Milste<strong>in</strong> Scheme<br />
By add<strong>in</strong>g to the Euler scheme (B.2) the term<br />
we obta<strong>in</strong> the Milste<strong>in</strong> scheme<br />
1<br />
2 bb0 [(W ) 2 , ]<br />
Y n+1 = Y n + a+bW + 1 2 bb0 [(W ) 2 , ]:<br />
(B.3)<br />
If a and b are smoothly enough the Milste<strong>in</strong> scheme can be shown to<br />
have order of strong convergence =1:0. In comparison with the Euler<br />
method the strong convergence order is <strong>in</strong>creased from = 0:5 to = 1:0<br />
and we conclude that the Milste<strong>in</strong> method is an improvement of the Euler<br />
method.<br />
We remark that <strong>in</strong> the case of the diusion term b =0,that is <strong>in</strong> the determ<strong>in</strong>istic<br />
case, the (determ<strong>in</strong>istic) Euler scheme has strong convergence order<br />
=1:0. Therefore, as to the order of strong convergence, the Milste<strong>in</strong> scheme<br />
can be seen as a generalization of the determ<strong>in</strong>istic Euler scheme.<br />
75
Appendix C<br />
The It^o Formula<br />
C.1 The one-dimensional case<br />
The function U :[0;T] IR 7! IR have cont<strong>in</strong>uous partial derivatives @U , @U<br />
@t @x<br />
and @2 U<br />
and X<br />
@x 2 t satisfy the one-dimensional It^o stochastic dierential equation<br />
dX t = a(t; !)dt + b(t; !)dW t ;<br />
q<br />
where jaj and b are <strong>in</strong> the space L 2 . Dene a process Y t by Y t = U(t; X t )<br />
for 0 t T . Then<br />
" @U<br />
dY t =<br />
@t (t; X @U<br />
t)+a t<br />
@x (t; X t)+ 1 #<br />
@ 2 U<br />
2 b2 t<br />
@x (t; X t) dt<br />
2<br />
+ b t<br />
@U<br />
@x (t; X t) dW t ;<br />
w. p. 1 for 0 t T , and with the notation a t = a(t; !);b t = b(t; !).<br />
C.2 The multi-dimensional case<br />
The process X t satisfy the d-dimensional It^o stochastic dierential equation<br />
dX t = a(t; !)dt + B(t; !)dW t ;<br />
where fW t ;t 0g is an m-dimensional Wiener process with <strong>in</strong>dependent<br />
components, W t =(Wt 1<br />
q ;Wt 2 ;:::;Wt<br />
m ), and a :[0;T] 7! IR d , B :[0;T] <br />
7! IR dm , satisfy<strong>in</strong>g ja k j and B k;j 2 L 2 for k =1;:::;d, j =1;:::;m.<br />
76
The function U :[0;T] IR d 7! IR have cont<strong>in</strong>uous partial derivatives @U<br />
@t , @U<br />
@x k<br />
and<br />
@2 U<br />
@x k @x i<br />
for k; i = 1; 2;:::;d. Dene a scalar process fY t ; 0 t T g by<br />
Y t = U(t; X t )=U(t; Xt 1 ;Xt 2 ;:::;Xt d ) w. p. 1. Then<br />
2<br />
3<br />
dY t = 4 @U dX<br />
@t + a k @U<br />
t + 1 mX dX<br />
Bt<br />
i;j B k;j @ 2 U<br />
5<br />
t dt<br />
@x k 2<br />
@x i @x k<br />
+<br />
mX<br />
dX<br />
j=1 i=1<br />
k=1<br />
Bt<br />
i;j @U<br />
dWt j ;<br />
@x i<br />
j=1 i;k=1<br />
w. p. 1 for 0 t T , where the partial derivatives are evaluated at (t; X t )<br />
and where we denote a k t = a k (t; !), B i;j<br />
t = B i;j (t; !).<br />
For an extension of the It^o formula to a wider class of non-smooth functions<br />
we refer to the recently published paper by Follmer, Protter and Shiryaev<br />
[26].<br />
77
Appendix D<br />
The Radon-Nikodym Theorem<br />
In our notation we follow Bill<strong>in</strong>gsley [9], x32.<br />
1) A probability measure on a eld F <strong>in</strong> is called nite, if() < 1.<br />
2) If = A 1 [ A 2 [ ::: for some nite or countable sequence of F-sets<br />
satisfy<strong>in</strong>g (A k ) < 1, then is -nite.<br />
3) A measure is absolutely cont<strong>in</strong>uous with respect to if for each A<br />
<strong>in</strong> F<br />
(A) =0) (A) =0:<br />
This relation is <strong>in</strong>dicated by
Bibliography<br />
[1] Ball, C.A., 1993, \A Review of Stochastic Volatility <strong>Models</strong> with Application<br />
to Option Pric<strong>in</strong>g", F<strong>in</strong>ancial Markets, Institutions and Instruments,<br />
Vol. 2, No. 5, pp. 55-71.<br />
[2] Basawa, I.V. and B.L.S. Prakasa Rao, 1980, \Statistical Inference for<br />
Stochastic Processes", Academic Press Inc.<br />
[3] Bernardo, J.M. and A.F.M. Smith, 1994, \Bayesian Theory", Wiley,<br />
New York.<br />
[4] Bibby, B.M, 1994, \A Two-Compartment Model with Additive White<br />
Noise", Research Report No.290, Dept. Theor. Statist., University of<br />
Aarhus.<br />
[5] Bibby, B.M, 1994, \Optimal Comb<strong>in</strong>ation of Mart<strong>in</strong>gale Estimat<strong>in</strong>g<br />
Functions for Discretely Observed Diusion Processes", Research Report<br />
No.298, Dept. Theor. Statist., University of Aarhus.<br />
[6] Bibby, B.M., 1995, \Analysis of a Tracer Experiment Based on Compartmental<br />
Diusion <strong>Models</strong>", Research Report No.303, Dept. Theor.<br />
Statist., University of Aarhus.<br />
[7] Bibby, B.M., 1995, \On <strong>Estimation</strong> <strong>in</strong> Compartmental Diusion<br />
<strong>Models</strong>", Research Report No.305, Dept. Theor. Statist., University of<br />
Aarhus.<br />
[8] Bibby, B.M and M. Srensen, 1995, \Mart<strong>in</strong>gale <strong>Estimation</strong> Functions<br />
for Discretely Observed Diusion Processes", Bernoulli Vol. 1 No.(1/2),<br />
pp. 17-39.<br />
[9] Bill<strong>in</strong>gsley, P., 1986, \Probability and Measure", 2nd ed., Wiley, New<br />
York.<br />
79
[10] Bill<strong>in</strong>gsley,P., 1968, \Convergence of Probability Measures", Wiley, New<br />
York.<br />
[11] Black, F., 1990, \Liv<strong>in</strong>g up to the model", Risk Magaz<strong>in</strong>e, Mar 1990.<br />
[12] Bollerslev, T., 1986, \Generalized Autoregressive Conditional Heteroskedasticity",<br />
Journal of Econometrics 31, pp. 307-327.<br />
[13] Bollerslev, T., R.Y. Chou and K.F. Kroner, 1992, \ARCH Model<strong>in</strong>g <strong>in</strong><br />
F<strong>in</strong>ance", Journal of Econometrics 52, pp. 5-59.<br />
[14] Bollerslev, T., R.F. Engle and D.B. Nelson, 1994, \ARCH <strong>Models</strong>", The<br />
Handbook of Econometrics, Vol. 4.<br />
[15] Breiman, L., 1968, \Probability", Addison-Wesley Publish<strong>in</strong>g Company.<br />
[16] Brockwell, P.J. and R.A. Davis, 1987, \Time Series: Theory and Methods",<br />
Spr<strong>in</strong>ger-Verlag, New York.<br />
[17] Cramer, H., 1946, \Mathematical Methods of Statistics", Pr<strong>in</strong>ceton University<br />
Press.<br />
[18] Dacunha-Castelle, D. and M. Duo, 1993, \Probabilites et Statistiques",<br />
Masson, Paris.<br />
[19] Dacunha-Castelle, D. and D. Florens-Zmirou, 1986, \<strong>Estimation</strong> of the<br />
Coecients of a Diusion from Discrete Observations", Stochastics 19,<br />
pp. 263-284.<br />
[20] Deelstra, G. and G. Parker, 1995, \A Covariance Equivalent Discretisation<br />
of the CIR Model", Proceed<strong>in</strong>gs of the 5th AFIR International<br />
Colloquium, Vol.2, pp. 731-747.<br />
[21] Due, D. and K.J. S<strong>in</strong>gleton, 1995, \An Econometric Model of the<br />
Term Structure of Interest Rate Swap Yields", Work<strong>in</strong>g paper, Graduate<br />
School of Bus<strong>in</strong>ess, Stanford University.<br />
[22] Engle, R.F., 1982, \Autoregressive Conditional Heteroskedasticity with<br />
Estimates of the Variance of U.K. Ination", Econometrica 50, pp. 987-<br />
1008.<br />
[23] Engle, R.F. and M. Rothschild, 1992, \ARCH <strong>Models</strong> <strong>in</strong> F<strong>in</strong>ance", Journal<br />
of Econometrics, Annals 1992-1, Vol. 52.<br />
[24] Feig<strong>in</strong>, P.D., 1976, \Maximum Likelihood <strong>Estimation</strong> for Cont<strong>in</strong>uoustime<br />
Stochastic Processes', Adv. Appl. Probab., Vol. 8, pp. 712-736.<br />
80
[25] Florens-Zmirou, D., 1989, \Approximate Discrete-Time Schemes for Statistics<br />
of Diusion Processes", Statistics 20, pp. 547-557.<br />
[26] Follmer, H., P. Protter and A.N. Shiryayev, 1995, \Quadratic covariation<br />
and an extension of It^o's formula", Bernoulli Vol. 1 No.(1/2), pp. 149-<br />
169.<br />
[27] Fournie, E. and D. Talay, \Application de la Statistique des Diusions<br />
a un Modele de Taux d'Inter^et", INRIA, forthcom<strong>in</strong>g <strong>in</strong> F<strong>in</strong>ance.<br />
[28] Gallant, A.R., 1981, \On the Bias <strong>in</strong> Flexible Functional Forms and<br />
an Essentially Unbiased Form: The Fourier Flexible Form", Journal of<br />
Economics 15, pp. 211-244.<br />
[29] Geman, S. and D. Geman, 1984, \Stochastic Relaxation, Gibbs Distributions,<br />
and the Bayesian Restoration of Images", IEEE Trans. Pattern<br />
Anal. Mach<strong>in</strong>e Intell., Vol. PAMI-6, No. 6., pp. 721-741.<br />
[30] Genon-Catalot, V. and D. Picard, 1993, \Elements de statistique asymptotique",<br />
Spr<strong>in</strong>ger-Verlag France, Paris.<br />
[31] Geweke, J., 1989, \Bayesian Inference <strong>in</strong> Econometric models us<strong>in</strong>g<br />
Monte Carlo <strong>in</strong>tegration", Econometrica 57, pp. 1317-1339.<br />
[32] Gihman, I.I. and A.V. Skorokhod, 1972, \Stochastic Dierential Equations",<br />
Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>.<br />
[33] Godambe, V.P. and C.C. Heyde, 1987, \Quasi-likelihood and Optimal<br />
<strong>Estimation</strong>", Int. Statist. Rev. 55, pp. 231-244.<br />
[34] Hamilton, J.D., 1994, \Time Series Analysis", Pr<strong>in</strong>ceton University<br />
Press.<br />
[35] Heyde, C.C., 1987, \On Comb<strong>in</strong><strong>in</strong>g Quasi-likelihood Estimat<strong>in</strong>g Functions",<br />
Stochastic Processes and their Applications 25, pp. 281-287.<br />
[36] Heyde, C.C., 1989, \Quasi-Likelihood and Optimality for Estimat<strong>in</strong>g<br />
Functions: Some Current Unify<strong>in</strong>g Themes", Fisher Lecture, I.S.I, 47th<br />
Session, Paris Aug 29 { Sep 6, 1989.<br />
[37] Hutton, J.E. and P.I. Nelson, 1986, \Quasi-Likelihood <strong>Estimation</strong> for<br />
Semimart<strong>in</strong>gales", Stochastic Processes and their Applications 22, pp.<br />
245-257.<br />
81
[38] Ibragimov, I.A. and R.Z. Has'm<strong>in</strong>skii, 1981, \Statistical <strong>Estimation</strong>",<br />
Spr<strong>in</strong>ger-Verlag, New York.<br />
[39] Jacod, J., 1994, \Statistics of Diusion Processes: Some Elements", Istituto<br />
per le Applicazioni della Matematica e dell'Informatica 94.17.<br />
[40] Jacod, J., \La variation quadratique du brownien en pr'esence d'erreurs<br />
d'arrondi", Work<strong>in</strong>g Paper, Laboratoire deProbabilites, Universite<br />
Pierre et Marie Curie, Paris.<br />
[41] Jacod, J. and A.N. Shiryaev, 1987, \Limit Theorems for Stochastic<br />
Processes", Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>.<br />
[42] Jacquier, E., N.G. Polson and P.E. Rossi, 1994, \Bayesian Analysis of<br />
Stochastic Volatility <strong>Models</strong>", Journal of Bus<strong>in</strong>ess and Economic Statistics,<br />
Oct 1994, Vol. 12, No. 4.<br />
[43] Kallianpur, G., 1980, \Stochastic Filter<strong>in</strong>g Theory", Spr<strong>in</strong>ger-Verlag,<br />
New York.<br />
[44] Kallianpur, G. and R.L. Karandikar, 1988, \White Noise Theory of Prediction,<br />
Filter<strong>in</strong>g and Smooth<strong>in</strong>g", Gordon and Breach Science Publishers.<br />
[45] Kloeden, P.E. and E. Platen, 1992, \Numerical Solution of Stochastic<br />
Dierential Equations", Spr<strong>in</strong>ger-Verlag, New York.<br />
[46] Kloeden, P.E., H. Schurz, E. Platen and M. Srensen, 1992, \On Effects<br />
of Discretization on Estimators of Drift Parameters for Diusion<br />
Processes", Research Report No.249, Dept. Theor. Statist., University<br />
of Aarhus.<br />
[47] Kutoyants, Yu.A., 1994, \Identication of Dynamical Systems with<br />
Small Noise", Kluwer Academic Publishers.<br />
[48] Kutoyants, A.Y., 1984, \Parameter <strong>Estimation</strong> for Stochastic Processes",<br />
Heldermann-Verlag, Berl<strong>in</strong>.<br />
[49] L<strong>in</strong>dgren, B.W., 1976, \Statistical Theory", 3rd ed., Macmillan Publish<strong>in</strong>g<br />
Co.<br />
[50] L<strong>in</strong>ton, O., 1993, \Adaptive <strong>Estimation</strong> <strong>in</strong> ARCH <strong>Models</strong>", Econometric<br />
Theory 9, pp. 539-569.<br />
82
[51] Liptser, R.S. and A.N. Shiryaev, 1977, \Statistics of Random Processes<br />
I, II", Spr<strong>in</strong>ger-Verlag, New York.<br />
[52] Luschgy, H. and A.L. Rukh<strong>in</strong>, 1993, \Asymptotic Properties of Tests for<br />
a Class of Diusion Processes: Optimality and Adaption", Mathematical<br />
Methods of Statistics, Allerton Press, Inc., Vol. 2, No. 1, pp. 42-51.<br />
[53] Mills, T.C., 1993, \The Econometric Modell<strong>in</strong>g of F<strong>in</strong>ancial Time Series",<br />
Cambridge University Press.<br />
[54] Nelson, D.B., 1990, \ARCH <strong>Models</strong> as Diusion Approximations", Journal<br />
of Econometrics 45, pp. 7-38.<br />
[55] O'Hagan, A., 1994, \Bayesian <strong>in</strong>ference", Kendall's Advanced Theory of<br />
Statistics, Arnold, Vol. 2B.<br />
[56] ksendal, B., 1989, \Stochastic Dierential Equations", Second Edition,<br />
Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>.<br />
[57] Pagan, A.R. and G.W. Schwert, 1990, \Alternative <strong>Models</strong> for Conditional<br />
Stock Volatility", Journal of Econometrics 45, pp. 267-290.<br />
[58] Pedersen, A.R., 1993, \A New Approach to Maximum Likelihood <strong>Estimation</strong><br />
for Stochastic Dierential Equations Based on Discrete Observations",<br />
Research Report No.264, Dept. Theor. Statist., University of<br />
Aarhus.<br />
[59] Pedersen, A.R., 1993, \Maximum Likelihood <strong>Estimation</strong> Based on Incomplete<br />
Observations for a Class of Discrete Time Stochastic Processes<br />
by Means of the Kalman Filter", Research Report No.272, Dept. Theor.<br />
Statist., University of Aarhus.<br />
[60] Pedersen, A.R., 1995, \Consistency and Asymptotic Normality of an<br />
Approximate Maximum Likelihood Estimator for Discretely Observed<br />
Diusion Processes", Bernoulli Vol.1 No.3, pp. 257-279.<br />
[61] Pedersen, A.R., 1994, \Uniform Residuals for Discretely Observed Diffusion<br />
Processes", Research Report No.290, Dept. Theor. Statist., University<br />
of Aarhus.<br />
[62] Pedersen, A.R., 1994, \Quasi-Likelihood Inference for Discretely Observed<br />
Diusion Processes", Research Report No.295, Dept. Theor. Statist.,<br />
University of Aarhus.<br />
83
[63] Pedersen, A.R., 1994, \Statistical Analysis of Gaussian Diusion<br />
Processes Based on Incomplete Discrete Observations", Research Report<br />
No.297, Dept. Theor. Statist., University of Aarhus.<br />
[64] Priestley, M.B., 1981, \Spectral analysis and time series", Vol.1 and 2,<br />
Academic Press.<br />
[65] Rogers, L.C.G. and D. Williams, 1987, \Diusions, Markov Processes<br />
and Mart<strong>in</strong>gales, Vol. 2: It^o Calculus", Wiley, Chichester.<br />
[66] Shiryayev, A.N., 1984, \Probability", Spr<strong>in</strong>ger-Verlag, New York.<br />
[67] Shiryayev, A.N. and V.G. Spoko<strong>in</strong>y, \Statistical Experiments and Decisions:<br />
Asymptotic Theory", forthcom<strong>in</strong>g <strong>in</strong> Spr<strong>in</strong>ger-Verlag.<br />
[68] Shorack, G.R. and J.A. Wellner, 1986, \Empirical Processes with Applications<br />
to Statistics", Wiley, New York.<br />
[69] Silverman, B.W., 1986, \Density <strong>Estimation</strong> for Statistics and Data<br />
Analysis", Chapman and Hall, London.<br />
[70] Skorokhod, A.V., 1989, \Asymptotic Methods <strong>in</strong> the Theory of Stochastic<br />
Dierential Equations", American Mathematical Society, Translations<br />
of Mathematical Monographs Vol. 78.<br />
[71] Smith, A.F.M. and G.O.Roberts, 1993, \Bayesian Computation via the<br />
Gibbs Sampler and Related Markov Cha<strong>in</strong> Monte Carlo Methods",<br />
J.R.Statist.Soc. B, 55, No. 1, pp. 3-23.<br />
[72] Ste<strong>in</strong>, E.M. and J.C. Ste<strong>in</strong>, 1991, \Stock Price Distributions with Stochastic<br />
Volatility: An Analytic Approach", The Review of F<strong>in</strong>ancial<br />
Studies, Vol. 4, No. 4, pp. 727-752.<br />
[73] Stroock, D.W. and S.R.S. Varadhan, 1979, \Multidimensional Diusion<br />
Processes", Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>.<br />
[74] Taylor, S., 1986, \Modell<strong>in</strong>g F<strong>in</strong>ancial Time Series", Wiley, Chichester.<br />
[75] Wigg<strong>in</strong>s, J.B., 1987, \Option Values Under Stochastic Volatility: Theory<br />
and Empirical Estimates", Journal of F<strong>in</strong>ancial Economics, 19, pp. 351-<br />
372.<br />
[76] Wong, E. and B. Hajek, 1985, \Stochastic Processes <strong>in</strong> Eng<strong>in</strong>eer<strong>in</strong>g Systems",<br />
Spr<strong>in</strong>ger-Verlag, New York.<br />
84