14.08.2013 Views

Introduction to Local Level Model and Kalman Filter

Introduction to Local Level Model and Kalman Filter

Introduction to Local Level Model and Kalman Filter

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Introduction</strong> <strong>to</strong><br />

<strong>Local</strong> <strong>Level</strong> <strong>Model</strong> <strong>and</strong> <strong>Kalman</strong> <strong>Filter</strong><br />

S.J. Koopman<br />

http://staff.feweb.vu.nl/koopman<br />

January 2011


U.K. Yearly Inflation : signal extraction<br />

0.3<br />

0.2<br />

0.1<br />

0.0<br />

−0.1<br />

−0.2<br />

1760 1780 1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000


Dow Jones<br />

15000<br />

10000<br />

5000<br />

0.25<br />

0.00<br />

−0.25<br />

Dow Jones (Industrial Avg, Monthly Closures)<br />

1930 1940 1950 1960 1970 1980 1990 2000 2010<br />

LogReturn<br />

1930 1940 1950 1960 1970 1980 1990 2000 2010<br />

0<br />

−5<br />

LSqrLReturn<br />

−10<br />

−15<br />

1930 1940 1950 1960 1970 1980 1990 2000 2010


LogSqr LogReturns Dow Jones : signal extraction<br />

−5<br />

−10<br />

−15<br />

−5<br />

−10<br />

−15<br />

1940 1960 1980 2000<br />

1950 1960 1970 1980 1990 2000 2010<br />

2<br />

1<br />

0<br />

0.1<br />

0.0<br />

−0.1<br />

−0.2<br />

1940 1960 1980 2000<br />

1950 1960 1970 1980 1990 2000 2010


Classical Decomposition<br />

Basic <strong>Model</strong><br />

A basic model for representing a time series is the additive model<br />

yt = µt + γt + εt, t = 1, . . . , n,<br />

also known as the Classical Decomposition.<br />

yt = observation,<br />

µt = slowly changing component (trend),<br />

γt = periodic component (seasonal),<br />

εt = irregular component (disturbance).<br />

Unobserved Components Time Series <strong>Model</strong><br />

In a Structural Time Series <strong>Model</strong> (STSM) or Unobserved<br />

Components <strong>Model</strong> (UCM), the various components are modelled<br />

explicitly as s<strong>to</strong>chastic processes.


<strong>Local</strong> <strong>Level</strong> <strong>Model</strong><br />

◮ Components can be deterministic functions of time (e.g.<br />

polynomials), or s<strong>to</strong>chastic processes;<br />

◮ Deterministic example: yt = µ + εt with εt ∼ N ID(0, σ 2 ε).<br />

◮ S<strong>to</strong>chastic example: the R<strong>and</strong>om Walk plus Noise, or<br />

<strong>Local</strong> <strong>Level</strong> model:<br />

yt = µt + εt, εt ∼ N ID(0, σ 2 ε)<br />

µt+1 = µt + ηt, ηt ∼ N ID(0, σ 2 η),<br />

◮ The disturbances εt, ηs are independent for all s, t;<br />

◮ The model is incomplete without a specification for µ1 (note<br />

the non-stationarity):<br />

µ1 ∼ N (a, P)


<strong>Local</strong> <strong>Level</strong> <strong>Model</strong><br />

General framework<br />

yt = µt + εt, εt ∼ N ID(0, σ 2 ε)<br />

µt+1 = µt + ηt, ηt ∼ N ID(0, σ 2 η),<br />

µ1 ∼ N (a, P)<br />

◮ The level µt <strong>and</strong> the irregular εt are unobservables;<br />

◮ Parameters: σ 2 ε <strong>and</strong> σ 2 η;<br />

◮ Trivial special cases:<br />

◮ σ 2 η = 0 =⇒ yt ∼ N ID(µ1, σ 2 ε) (WN with constant level);<br />

◮ σ 2 ε = 0 =⇒ yt+1 = yt + ηt (pure RW);<br />

◮ <strong>Local</strong> <strong>Level</strong> is a model representation for EWMA forecasting.


Simulated LL Data<br />

6<br />

4<br />

2<br />

0<br />

−2<br />

−4<br />

−6<br />

σ 2 =0.1 ση<br />

2 =1 y μ<br />

ε<br />

0 10 20 30 40 50 60 70 80 90 100


Simulated LL Data<br />

6<br />

4<br />

2<br />

0<br />

−2<br />

−4<br />

−6<br />

σ 2 =1 ση<br />

2 =1<br />

ε<br />

0 10 20 30 40 50 60 70 80 90 100


Simulated LL Data<br />

6<br />

4<br />

2<br />

0<br />

−2<br />

−4<br />

−6<br />

σ 2 =1 ση<br />

2 =0.1<br />

ε<br />

0 10 20 30 40 50 60 70 80 90 100


Simulated LL Data<br />

5 σε<br />

2 =0.1 ση<br />

2 =1 y μ<br />

0<br />

−5<br />

0<br />

−5<br />

−2<br />

0 10 20 30 40 50 60 70 80 90 100<br />

5 σ 2<br />

ε =1 ση<br />

2 =1<br />

0 10 20 30 40 50 60 70 80 90 100<br />

2 σε<br />

2 =1 ση<br />

2 =0.1<br />

0<br />

0 10 20 30 40 50 60 70 80 90 100


<strong>Local</strong> <strong>Level</strong> <strong>Model</strong><br />

Its properties<br />

yt = µt + εt, εt ∼ N ID(0, σ 2 ε),<br />

µt+1 = µt + ηt, ηt ∼ N ID(0, σ 2 η),<br />

◮ First difference is stationary:<br />

∆yt = ∆µt + ∆εt = ηt−1 + εt − εt−1.<br />

◮ Dynamic properties of ∆yt:<br />

E(∆yt) = 0,<br />

γ0 = E(∆yt∆yt) = σ 2 η + 2σ 2 ε,<br />

γ1 = E(∆yt∆yt−1) = −σ 2 ε,<br />

γτ = E(∆yt∆yt−τ ) = 0 for τ ≥ 2.


Properties of the LL model<br />

◮ The ACF of ∆yt is<br />

ρ1 = −σ2 ε<br />

σ 2 η + 2σ 2 ε<br />

ρτ = 0, τ ≥ 2.<br />

◮ q is called the signal-noise ratio;<br />

= − 1<br />

q + 2 , q = σ2 η/σ 2 ε,<br />

◮ The model for ∆yt is MA(1) with restricted parameters such<br />

that<br />

−1/2 ≤ ρ1 ≤ 0<br />

i.e., yt is ARIMA(0,1,1);<br />

◮ Write ∆yt = ξt + θξt−1, ξt ∼ N ID(0, σ 2 ) <strong>to</strong> solve θ:<br />

θ = 1<br />

<br />

q2 + 4q − 2 − q .<br />

2


<strong>Local</strong> <strong>Level</strong> <strong>Model</strong><br />

◮ The model parameters are estimated by Maximum Likelihood;<br />

◮ Advantages of model based approach: assumptions can be<br />

tested, parameters are estimated...;<br />

◮ The model with estimated parameters is used for the signal<br />

extraction of components;<br />

◮ The estimated level µt is effectively a locally weighted average<br />

of the data;<br />

◮ The distribution of weights can be compared with Kernel<br />

functions in nonparametric regressions;<br />

◮ On basis of model, the methods yield minimum mean square<br />

error (MMSE) forecasts <strong>and</strong> the associated confidence<br />

intervals.


Nile Data: decomposition<br />

1250<br />

1000<br />

750<br />

Nile <strong>Level</strong> +/− 2SE<br />

500<br />

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970<br />

200<br />

0<br />

−200<br />

Nile−Irregular<br />

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970


Nile Data: decomposition weights<br />

1100<br />

1000<br />

900<br />

Nile−<strong>Level</strong><br />

800<br />

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970<br />

0.15 Nile−Weights of <strong>Level</strong><br />

0.10<br />

0.05<br />

−20 −15 −10 −5 0 5 10 15 20


Nile Data: forecasts<br />

1400<br />

1300<br />

1200<br />

1100<br />

1000<br />

900<br />

800<br />

700<br />

600<br />

500<br />

Nile Forecast−Nile +/− SE<br />

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980


<strong>Kalman</strong> <strong>Filter</strong><br />

◮ The <strong>Kalman</strong> filter calculates the mean <strong>and</strong> variance of the<br />

unobserved state, given the observations.<br />

◮ The state is Gaussian: the complete distribution is<br />

characterized by the mean <strong>and</strong> variance.<br />

◮ The filter is a recursive algorithm; the current best estimate is<br />

updated whenever a new observation is obtained.<br />

◮ To start the recursion, we need a1 <strong>and</strong> P1, which we assumed<br />

given.<br />

◮ There are various ways <strong>to</strong> initialize when a1 <strong>and</strong> P1 are<br />

unknown, which we will not discuss here. See discussion in<br />

DK book, Chapter 2.


<strong>Kalman</strong> <strong>Filter</strong><br />

The unobserved variable µt can be estimated from the<br />

observations with the <strong>Kalman</strong> filter:<br />

vt = yt − at,<br />

Ft = Pt + σ 2 ε,<br />

Kt = PtF −1<br />

t ,<br />

at+1 = at + Ktvt,<br />

Pt+1 = Pt + σ 2 η − K 2 t Ft,<br />

for t = 1, . . . , n <strong>and</strong> starting with given values for a1 <strong>and</strong> P1.<br />

◮ Writing Yt = {y1, . . . , yt}, define<br />

at+1 = E(µt+1|Yt), Pt+1 = var(µt+1|Yt).


<strong>Kalman</strong> <strong>Filter</strong><br />

<strong>Local</strong> level model: µt+1 = µt + ηt, yt = µt + εt.<br />

◮ Writing Yt = {y1, . . . , yt}, define<br />

at+1 = E(µt+1|Yt), Pt+1 = var(µt+1|Yt);<br />

◮ The prediction error is<br />

vt = yt − E(yt|Yt−1)<br />

= yt − E(µt + εt|Yt−1)<br />

= yt − E(µt|Yt−1)<br />

= yt − at;<br />

◮ It follows that vt = (µt − at) + εt <strong>and</strong> E(vt) = 0;<br />

◮ The prediction error variance is Ft = var(vt) = Pt + σ 2 ε.


Regression theory<br />

The proof of the <strong>Kalman</strong> filter uses lemmas from the multivariate<br />

Normal regression theory.<br />

Lemma 1<br />

Suppose x, y are jointly Normally distributed vec<strong>to</strong>rs. Then<br />

E(x|y) = E(x) + ΣxyΣ −1<br />

y y,<br />

var(x|y) = Σxx − ΣxyΣ −1<br />

yy Σ ′ xy.<br />

Lemma 2<br />

Suppose x, y <strong>and</strong> z are jointly Normally distributed vec<strong>to</strong>rs with<br />

E(z) = 0 <strong>and</strong> Σyz = 0. Then<br />

E(x|y, z) = E(x|y) + ΣxzΣ −1<br />

zz z,<br />

var(x|y, z) = var(x|y) − ΣxzΣ −1<br />

zz Σ ′ xz.


Derivation <strong>Kalman</strong> <strong>Filter</strong><br />

<strong>Local</strong> level model: µt+1 = µt + ηt, yt = µt + εt.<br />

◮ We have Yt = {Yt−1, yt} = {Yt−1, vt} <strong>and</strong> E(vtyt−j) = 0 for<br />

j = 1, . . . , t − 1;<br />

◮ The lemma is E(x|y, z) = E(x|y) + ΣxzΣ −1<br />

zz z.<br />

In our case, take x = µt+1, y = Yt−1 <strong>and</strong><br />

z = vt = (µt − at) + εt;<br />

◮ E(x|y) implies that<br />

E(µt+1|Yt−1) = E(µt|Yt−1) + E(ηt|Yt−1) = at;<br />

◮ Further, Σxz provides the expression<br />

E(µt+1vt) = E(µtvt) + E(ηtvt) = E[(µt − at)(yt − at)] +<br />

E(ηtvt) = E[(µt −at)(µt −at)]+E[(µt −at)εt]+E(ηtvt) = Pt;<br />

◮ Since Σzz = Ft, we can apply lemma <strong>and</strong> obtain the state<br />

update<br />

at+1 = E(µt+1|Yt−1, yt)<br />

= at + PtF −1<br />

t vt<br />

= at + Ktvt; with Kt = PtF −1<br />

t .


<strong>Kalman</strong> <strong>Filter</strong> Derived<br />

◮ Our best prediction of yt based on its past is at. When the<br />

actual observation arrives, calculate the prediction error<br />

vt = yt − at <strong>and</strong> its variance Ft = Pt + σ 2 ε.<br />

◮ The best estimate of the state mean for the next period is<br />

based on both the current estimate at <strong>and</strong> the new<br />

information vt:<br />

at+1 = at + Ktvt,<br />

similarly for the variance:<br />

◮ The <strong>Kalman</strong> gain<br />

Pt+1 = Pt + σ 2 η − KtFtK ′ t.<br />

Kt = PtF −1<br />

t<br />

is the optimal weighting matrix for the new evidence.<br />

◮ You should be able <strong>to</strong> replicate the proof of the <strong>Kalman</strong> filter<br />

for the <strong>Local</strong> <strong>Level</strong> <strong>Model</strong> (DK, Chapter 2).


<strong>Kalman</strong> filter for Nile Data: (i) at; (ii) Pt; (iii) vt <strong>and</strong> (iv) Ft.<br />

1250<br />

1000<br />

750<br />

500<br />

250<br />

0<br />

−250<br />

(i)<br />

1880 1900 1920 1940 1960<br />

(iii)<br />

1880 1900 1920 1940 1960<br />

17500<br />

15000<br />

12500<br />

10000<br />

7500<br />

30000<br />

27500<br />

25000<br />

22500<br />

(ii)<br />

1880 1900 1920 1940 1960<br />

(iv)<br />

32500<br />

1880 1900 1920 1940 1960


Steady State <strong>Kalman</strong> <strong>Filter</strong><br />

<strong>Kalman</strong> filter converges <strong>to</strong> a positive value, say Pt → ¯ P. We would<br />

then have<br />

Ft → ¯P + σ 2 ε, Kt → ¯P/(¯P + σ 2 ε).<br />

The state prediction variance updating leads <strong>to</strong><br />

¯P = ¯P<br />

<br />

1 −<br />

which reduces <strong>to</strong> the quadratic<br />

¯P<br />

¯P + σ 2 ε<br />

x 2 − xq − q = 0,<br />

<br />

+ σ 2 η,<br />

where x = ¯ P/σ 2 ε <strong>and</strong> q = σ 2 η/σ 2 ε, with solution<br />

¯P = σ 2 <br />

ε q + q2 + 4q /2.


Smoothing<br />

◮ The filter calculates the mean <strong>and</strong> variance conditional on Yt;<br />

◮ The <strong>Kalman</strong> smoother calculates the mean <strong>and</strong> variance<br />

conditional on the full set of observations Yn;<br />

◮ After the filtered estimates are calculated, the smoothing<br />

recursion starts at the last observations <strong>and</strong> runs until the<br />

first.<br />

ˆµt = E(µt|Yn), Vt = var(µt|Yn),<br />

rt = weighted sum of future innovations, Nt = var(rt),<br />

Lt = 1 − Kt.<br />

Starting with rn = 0, Nn = 0, the smoothing recursions are given<br />

by<br />

rt−1 = F −1<br />

t vt + Ltrt, Nt−1 = F −1<br />

t + L 2 t Nt,<br />

ˆµt = at + Ptrt−1, Vt = Pt − P 2 t Nt−1.


<strong>Kalman</strong> smoothing for Nile Data: (i) ˆµt; (ii) Vt; (iii) rt <strong>and</strong> (iv) Nt.<br />

1250<br />

1000<br />

750<br />

500<br />

0.02<br />

0.00<br />

−0.02<br />

(i)<br />

1880 1900 1920 1940 1960<br />

(iii)<br />

1880 1900 1920 1940 1960<br />

4000<br />

3500<br />

3000<br />

2500<br />

0.0001<br />

8e−5<br />

6e−5<br />

(ii)<br />

1880 1900 1920 1940 1960<br />

(iv)<br />

1880 1900 1920 1940 1960


Missing Observations<br />

Missing observations are very easy <strong>to</strong> h<strong>and</strong>le in <strong>Kalman</strong> filtering:<br />

◮ suppose yj is missing<br />

◮ put vj = 0, Kj = 0 <strong>and</strong> Fj = ∞ in the algorithm<br />

◮ proceed further calculations as normal<br />

The filter algorithm extrapolates according <strong>to</strong> the state equation<br />

until a new observation arrives. The smoother interpolates<br />

between observations.


Nile Data with missing observations : (i) at, (ii) Pt, (iii) ˆµt <strong>and</strong><br />

(iv) Vt.<br />

1250<br />

1000<br />

750<br />

500<br />

1250<br />

1000<br />

750<br />

500<br />

(i)<br />

1880 1900 1920 1940 1960<br />

(iii)<br />

1880 1900 1920 1940 1960<br />

30000<br />

20000<br />

10000<br />

7500<br />

5000<br />

2500<br />

(ii)<br />

1880 1900 1920 1940 1960<br />

(iv)<br />

10000<br />

1880 1900 1920 1940 1960


Forecasting<br />

Forecasting requires no extra theory: just treat future observations<br />

as missing:<br />

◮ put vj = 0, Kj = 0 <strong>and</strong> Fj = ∞ for j = n + 1, . . . , n + k<br />

◮ proceed further calculations as normal<br />

◮ forecast for yj is aj


Nile Data: forecasting<br />

1250<br />

1000<br />

750<br />

500<br />

1200<br />

1100<br />

1000<br />

900<br />

800<br />

(i)<br />

(iii)<br />

1900 1950 2000<br />

1900 1950 2000<br />

50000<br />

40000<br />

30000<br />

20000<br />

10000<br />

60000<br />

50000<br />

40000<br />

30000<br />

(ii)<br />

(iv)<br />

1900 1950 2000<br />

1900 1950 2000


Parameters in <strong>Local</strong> <strong>Level</strong> <strong>Model</strong><br />

We recall the <strong>Local</strong> <strong>Level</strong> <strong>Model</strong> as<br />

General framework<br />

yt = µt + εt, εt ∼ N ID(0, σ 2 ε)<br />

µt+1 = µt + ηt, ηt ∼ N ID(0, σ 2 η),<br />

µ1 ∼ N (a, P)<br />

◮ The unknown µt’s can be estimated by prediction, filtering<br />

<strong>and</strong> smoothing;<br />

◮ The other parameters are given by the variances σ 2 ε <strong>and</strong> σ 2 η;<br />

◮ We estimate these parameters by Maximum Likelihood;<br />

◮ Parameters can be transformed : σ 2 ε = exp(ψε) <strong>and</strong><br />

σ 2 η = exp(ψη);<br />

◮ Parameter vec<strong>to</strong>r ψ = (ψε , ψη) ′ .


Parameter Estimation by ML<br />

The parameters in any state space model can be collected in some<br />

vec<strong>to</strong>r ψ. When model is linear <strong>and</strong> Gaussian; we can estimate ψ<br />

by Maximum Likelihood.<br />

The loglikelihood af a time series is<br />

log L =<br />

n<br />

log p(yt|Yt−1).<br />

t=1<br />

In the state space model, p(yt|Yt−1) is a Gaussian density with<br />

mean at <strong>and</strong> variance Ft:<br />

log L = − n 1<br />

log 2π −<br />

2 2<br />

n<br />

t=1<br />

log Ft + F −1<br />

t v 2 t<br />

with vt <strong>and</strong> Ft from the <strong>Kalman</strong> filter. This is called the prediction<br />

error decomposition of the likelihood. Estimation proceeds by<br />

numerically maximising log L.<br />

,


Diagnostics<br />

◮ Null hypothesis: st<strong>and</strong>ardised residuals<br />

vt/ √ F t ∼ N ID(0, 1)<br />

◮ Apply st<strong>and</strong>ard test for Normality, heteroskedasticity, serial<br />

correlation;<br />

◮ A recursive algorithm is available <strong>to</strong> calculate smoothed<br />

disturbances (auxilliary residuals), which can be used <strong>to</strong> detect<br />

breaks <strong>and</strong> outliers;<br />

◮ <strong>Model</strong> comparison <strong>and</strong> parameter restrictions: use likelihood<br />

based procedures (LR test, AIC, BIC).


Nile Data: diagnostics<br />

2<br />

0<br />

−2<br />

2<br />

0<br />

−2<br />

(i)<br />

1880 1900 1920 1940 1960<br />

(iii)<br />

−2 −1 0 1 2<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0.5<br />

0.0<br />

−0.5<br />

(ii)<br />

−3 −2 −1 0 1 2 3<br />

(iv)<br />

1.0<br />

0 5 10


Three exercises<br />

1. Consider LL model (see slides, see DK chapter 2).<br />

◮ Reduced form is ARIMA(0,1,1) process. Derive the<br />

relationship between signal-<strong>to</strong>-noise ratio q of LL model <strong>and</strong><br />

the θ coefficient of the ARIMA model;<br />

◮ Derive the reduced form in the case ηt = √ qεt <strong>and</strong> notice the<br />

difference in the general case.<br />

◮ Give the elements of the mean vec<strong>to</strong>r <strong>and</strong> variance matrix of<br />

y = (y1, . . . , yn) ′ when yt is generated by a LL model.<br />

◮ Show that the forecasts of the <strong>Kalman</strong> filter (in a steady state)<br />

are the same as those generated by the exponentially weighted<br />

moving average (EWMA) method of forecasting:<br />

ˆyt+1 = ˆyt + λ(yt − ˆyt) for t = 1, . . . , n. Derive the relationship<br />

between λ <strong>and</strong> the signal-<strong>to</strong>-noise ratio q ?


Three exercises (cont.)<br />

2. Derive a <strong>Kalman</strong> filter for the local level model<br />

yt = µt + εt, εt ∼ N(0, σ 2 ε), ∆µt+1 = ηt ∼ N(0, σ 2 η),<br />

with E(εtηt) = σεη = 0 <strong>and</strong> E(εtηs) = 0 for all t, s <strong>and</strong> t = s.<br />

Also discuss the problem of missing obervations in this case.<br />

3. Write Ox program(s) that produce all Figures in Ch 2 of DK<br />

except Fig. 2.4. Data:<br />

http://www.ssfpack.com/dkbook.html


Selected references<br />

◮ A.C.Harvey (1989). Forecasting, Structural Time Series<br />

<strong>Model</strong>s <strong>and</strong> the <strong>Kalman</strong> <strong>Filter</strong>. Cambridge University Press<br />

◮ G.Kitagawa & W.Gersch (1996). Smoothness Priors Analysis<br />

of Time Series. Springer-Verlag<br />

◮ J.Harrison & M.West (1997). Bayesian Forecasting <strong>and</strong><br />

Dynamic <strong>Model</strong>s. Springer-Verlag<br />

◮ R.H. Shumway <strong>and</strong> D.S. S<strong>to</strong>ffer (2000). Time Series Analysis<br />

<strong>and</strong> its Applications. Springer-Verlag.<br />

◮ J.Durbin & S.J.Koopman (2011). Time Series Analysis by<br />

State Space Methods. Second Edition, Oxford University<br />

Press<br />

◮ J.Comm<strong>and</strong>eur & S.J.Koopman (2007). An <strong>Introduction</strong> <strong>to</strong><br />

State Space Time Series Analysis. Oxford University Press

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!