Introduction to Local Level Model and Kalman Filter

Introduction to 

Local Level Model and Kalman Filter 

S.J. Koopman 

http://staff.feweb.vu.nl/koopman 

January 2011

U.K. Yearly Inflation : signal extraction 

0.3 

0.2 

0.1 

0.0 

−0.1 

−0.2 

1760 1780 1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000

Dow Jones 

15000 

10000 

5000 

0.25 

0.00 

−0.25 

Dow Jones (Industrial Avg, Monthly Closures) 

1930 1940 1950 1960 1970 1980 1990 2000 2010 

LogReturn 

1930 1940 1950 1960 1970 1980 1990 2000 2010 

0 

−5 

LSqrLReturn 

−10 

−15 

1930 1940 1950 1960 1970 1980 1990 2000 2010

LogSqr LogReturns Dow Jones : signal extraction 

−5 

−10 

−15 

−5 

−10 

−15 

1940 1960 1980 2000 

1950 1960 1970 1980 1990 2000 2010 

2 

1 

0 

0.1 

0.0 

−0.1 

−0.2 

1940 1960 1980 2000 

1950 1960 1970 1980 1990 2000 2010

Classical Decomposition 

Basic Model 

A basic model for representing a time series is the additive model 

yt = µt + γt + εt, t = 1, . . . , n, 

also known as the Classical Decomposition. 

yt = observation, 

µt = slowly changing component (trend), 

γt = periodic component (seasonal), 

εt = irregular component (disturbance). 

Unobserved Components Time Series Model 

In a Structural Time Series Model (STSM) or Unobserved 

Components Model (UCM), the various components are modelled 

explicitly as stochastic processes.

Local Level Model 

◮ Components can be deterministic functions of time (e.g. 

polynomials), or stochastic processes; 

◮ Deterministic example: yt = µ + εt with εt ∼ N ID(0, σ 2 ε). 

◮ Stochastic example: the Random Walk plus Noise, or 

Local Level model: 

yt = µt + εt, εt ∼ N ID(0, σ 2 ε) 

µt+1 = µt + ηt, ηt ∼ N ID(0, σ 2 η), 

◮ The disturbances εt, ηs are independent for all s, t; 

◮ The model is incomplete without a specification for µ1 (note 

the non-stationarity): 

µ1 ∼ N (a, P)


General framework 



µ1 ∼ N (a, P) 

◮ The level µt and the irregular εt are unobservables; 

◮ Parameters: σ 2 ε and σ 2 η; 

◮ Trivial special cases: 

◮ σ 2 η = 0 =⇒ yt ∼ N ID(µ1, σ 2 ε) (WN with constant level); 

◮ σ 2 ε = 0 =⇒ yt+1 = yt + ηt (pure RW); 

◮ Local Level is a model representation for EWMA forecasting.

Simulated LL Data 

6 

4 

2 

0 

−2 

−4 

−6 

σ 2 =0.1 ση 

2 =1 y μ 

ε 

0 10 20 30 40 50 60 70 80 90 100


6 

4 

2 

0 

−2 

−4 

−6 

σ 2 =1 ση 

2 =1 

ε 

0 10 20 30 40 50 60 70 80 90 100


6 

4 

2 

0 

−2 

−4 

−6 

σ 2 =1 ση 

2 =0.1 

ε 

0 10 20 30 40 50 60 70 80 90 100


5 σε 

2 =0.1 ση 

2 =1 y μ 

0 

−5 

0 

−5 

−2 

0 10 20 30 40 50 60 70 80 90 100 

5 σ 2 

ε =1 ση 

2 =1 

0 10 20 30 40 50 60 70 80 90 100 

2 σε 

2 =1 ση 

2 =0.1 

0 

0 10 20 30 40 50 60 70 80 90 100


Its properties 

yt = µt + εt, εt ∼ N ID(0, σ 2 ε), 


◮ First difference is stationary: 

∆yt = ∆µt + ∆εt = ηt−1 + εt − εt−1. 

◮ Dynamic properties of ∆yt: 

E(∆yt) = 0, 

γ0 = E(∆yt∆yt) = σ 2 η + 2σ 2 ε, 

γ1 = E(∆yt∆yt−1) = −σ 2 ε, 

γτ = E(∆yt∆yt−τ ) = 0 for τ ≥ 2.

Properties of the LL model 

◮ The ACF of ∆yt is 

ρ1 = −σ2 ε 

σ 2 η + 2σ 2 ε 

ρτ = 0, τ ≥ 2. 

◮ q is called the signal-noise ratio; 

= − 1 

q + 2 , q = σ2 η/σ 2 ε, 

◮ The model for ∆yt is MA(1) with restricted parameters such 

that 

−1/2 ≤ ρ1 ≤ 0 

i.e., yt is ARIMA(0,1,1); 

◮ Write ∆yt = ξt + θξt−1, ξt ∼ N ID(0, σ 2 ) to solve θ: 

θ = 1 

 

q2 + 4q − 2 − q . 

2


◮ The model parameters are estimated by Maximum Likelihood; 

◮ Advantages of model based approach: assumptions can be 

tested, parameters are estimated...; 

◮ The model with estimated parameters is used for the signal 

extraction of components; 

◮ The estimated level µt is effectively a locally weighted average 

of the data; 

◮ The distribution of weights can be compared with Kernel 

functions in nonparametric regressions; 

◮ On basis of model, the methods yield minimum mean square 

error (MMSE) forecasts and the associated confidence 

intervals.

Nile Data: decomposition 

1250 

1000 

750 

Nile Level +/− 2SE 

500 

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 

200 

0 

−200 

Nile−Irregular 

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

Nile Data: decomposition weights 

1100 

1000 

900 

Nile−Level 

800 

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 

0.15 Nile−Weights of Level 

0.10 

0.05 

−20 −15 −10 −5 0 5 10 15 20

Nile Data: forecasts 

1400 

1300 

1200 

1100 

1000 

900 

800 

700 

600 

500 

Nile Forecast−Nile +/− SE 

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980

Kalman Filter 

◮ The Kalman filter calculates the mean and variance of the 

unobserved state, given the observations. 

◮ The state is Gaussian: the complete distribution is 

characterized by the mean and variance. 

◮ The filter is a recursive algorithm; the current best estimate is 

updated whenever a new observation is obtained. 

◮ To start the recursion, we need a1 and P1, which we assumed 

given. 

◮ There are various ways to initialize when a1 and P1 are 

unknown, which we will not discuss here. See discussion in 

DK book, Chapter 2.


The unobserved variable µt can be estimated from the 

observations with the Kalman filter: 

vt = yt − at, 

Ft = Pt + σ 2 ε, 

Kt = PtF −1 

t , 

at+1 = at + Ktvt, 

Pt+1 = Pt + σ 2 η − K 2 t Ft, 

for t = 1, . . . , n and starting with given values for a1 and P1. 

◮ Writing Yt = {y1, . . . , yt}, define 

at+1 = E(µt+1|Yt), Pt+1 = var(µt+1|Yt).


Local level model: µt+1 = µt + ηt, yt = µt + εt. 

◮ Writing Yt = {y1, . . . , yt}, define 

at+1 = E(µt+1|Yt), Pt+1 = var(µt+1|Yt); 

◮ The prediction error is 

vt = yt − E(yt|Yt−1) 

= yt − E(µt + εt|Yt−1) 

= yt − E(µt|Yt−1) 

= yt − at; 

◮ It follows that vt = (µt − at) + εt and E(vt) = 0; 

◮ The prediction error variance is Ft = var(vt) = Pt + σ 2 ε.

Regression theory 

The proof of the Kalman filter uses lemmas from the multivariate 

Normal regression theory. 

Lemma 1 

Suppose x, y are jointly Normally distributed vectors. Then 

E(x|y) = E(x) + ΣxyΣ −1 

y y, 

var(x|y) = Σxx − ΣxyΣ −1 

yy Σ ′ xy. 

Lemma 2 

Suppose x, y and z are jointly Normally distributed vectors with 

E(z) = 0 and Σyz = 0. Then 

E(x|y, z) = E(x|y) + ΣxzΣ −1 

zz z, 

var(x|y, z) = var(x|y) − ΣxzΣ −1 

zz Σ ′ xz.

Derivation Kalman Filter 

Local level model: µt+1 = µt + ηt, yt = µt + εt. 

◮ We have Yt = {Yt−1, yt} = {Yt−1, vt} and E(vtyt−j) = 0 for 

j = 1, . . . , t − 1; 

◮ The lemma is E(x|y, z) = E(x|y) + ΣxzΣ −1 

zz z. 

In our case, take x = µt+1, y = Yt−1 and 

z = vt = (µt − at) + εt; 

◮ E(x|y) implies that 

E(µt+1|Yt−1) = E(µt|Yt−1) + E(ηt|Yt−1) = at; 

◮ Further, Σxz provides the expression 

E(µt+1vt) = E(µtvt) + E(ηtvt) = E[(µt − at)(yt − at)] + 

E(ηtvt) = E[(µt −at)(µt −at)]+E[(µt −at)εt]+E(ηtvt) = Pt; 

◮ Since Σzz = Ft, we can apply lemma and obtain the state 

update 

at+1 = E(µt+1|Yt−1, yt) 

= at + PtF −1 

t vt 

= at + Ktvt; with Kt = PtF −1 

t .

Kalman Filter Derived 

◮ Our best prediction of yt based on its past is at. When the 

actual observation arrives, calculate the prediction error 

vt = yt − at and its variance Ft = Pt + σ 2 ε. 

◮ The best estimate of the state mean for the next period is 

based on both the current estimate at and the new 

information vt: 

at+1 = at + Ktvt, 

similarly for the variance: 

◮ The Kalman gain 

Pt+1 = Pt + σ 2 η − KtFtK ′ t. 

Kt = PtF −1 

t 

is the optimal weighting matrix for the new evidence. 

◮ You should be able to replicate the proof of the Kalman filter 

for the Local Level Model (DK, Chapter 2).

Kalman filter for Nile Data: (i) at; (ii) Pt; (iii) vt and (iv) Ft. 

1250 

1000 

750 

500 

250 

0 

−250 

(i) 

1880 1900 1920 1940 1960 

(iii) 

1880 1900 1920 1940 1960 

17500 

15000 

12500 

10000 

7500 

30000 

27500 

25000 

22500 

(ii) 

1880 1900 1920 1940 1960 

(iv) 

32500 

1880 1900 1920 1940 1960

Steady State Kalman Filter 

Kalman filter converges to a positive value, say Pt → ¯ P. We would 

then have 

Ft → ¯P + σ 2 ε, Kt → ¯P/(¯P + σ 2 ε). 

The state prediction variance updating leads to 

¯P = ¯P 

 

1 − 

which reduces to the quadratic 

¯P 

¯P + σ 2 ε 

x 2 − xq − q = 0, 

 

+ σ 2 η, 

where x = ¯ P/σ 2 ε and q = σ 2 η/σ 2 ε, with solution 

¯P = σ 2 

ε q + q2 + 4q /2.

Smoothing 

◮ The filter calculates the mean and variance conditional on Yt; 

◮ The Kalman smoother calculates the mean and variance 

conditional on the full set of observations Yn; 

◮ After the filtered estimates are calculated, the smoothing 

recursion starts at the last observations and runs until the 

first. 

ˆµt = E(µt|Yn), Vt = var(µt|Yn), 

rt = weighted sum of future innovations, Nt = var(rt), 

Lt = 1 − Kt. 

Starting with rn = 0, Nn = 0, the smoothing recursions are given 

by 

rt−1 = F −1 

t vt + Ltrt, Nt−1 = F −1 

t + L 2 t Nt, 

ˆµt = at + Ptrt−1, Vt = Pt − P 2 t Nt−1.

Kalman smoothing for Nile Data: (i) ˆµt; (ii) Vt; (iii) rt and (iv) Nt. 

1250 

1000 

750 

500 

0.02 

0.00 

−0.02 

(i) 

1880 1900 1920 1940 1960 

(iii) 

1880 1900 1920 1940 1960 

4000 

3500 

3000 

2500 

0.0001 

8e−5 

6e−5 

(ii) 

1880 1900 1920 1940 1960 

(iv) 

1880 1900 1920 1940 1960

Missing Observations 

Missing observations are very easy to handle in Kalman filtering: 

◮ suppose yj is missing 

◮ put vj = 0, Kj = 0 and Fj = ∞ in the algorithm 

◮ proceed further calculations as normal 

The filter algorithm extrapolates according to the state equation 

until a new observation arrives. The smoother interpolates 

between observations.

Nile Data with missing observations : (i) at, (ii) Pt, (iii) ˆµt and 

(iv) Vt. 

1250 

1000 

750 

500 

1250 

1000 

750 

500 

(i) 

1880 1900 1920 1940 1960 

(iii) 

1880 1900 1920 1940 1960 

30000 

20000 

10000 

7500 

5000 

2500 

(ii) 

1880 1900 1920 1940 1960 

(iv) 

10000 

1880 1900 1920 1940 1960

Forecasting 

Forecasting requires no extra theory: just treat future observations 

as missing: 

◮ put vj = 0, Kj = 0 and Fj = ∞ for j = n + 1, . . . , n + k 

◮ proceed further calculations as normal 

◮ forecast for yj is aj

Nile Data: forecasting 

1250 

1000 

750 

500 

1200 

1100 

1000 

900 

800 

(i) 

(iii) 

1900 1950 2000 

1900 1950 2000 

50000 

40000 

30000 

20000 

10000 

60000 

50000 

40000 

30000 

(ii) 

(iv) 

1900 1950 2000 

1900 1950 2000

Parameters in Local Level Model 

We recall the Local Level Model as 

General framework 



µ1 ∼ N (a, P) 

◮ The unknown µt’s can be estimated by prediction, filtering 

and smoothing; 

◮ The other parameters are given by the variances σ 2 ε and σ 2 η; 

◮ We estimate these parameters by Maximum Likelihood; 

◮ Parameters can be transformed : σ 2 ε = exp(ψε) and 

σ 2 η = exp(ψη); 

◮ Parameter vector ψ = (ψε , ψη) ′ .

Parameter Estimation by ML 

The parameters in any state space model can be collected in some 

vector ψ. When model is linear and Gaussian; we can estimate ψ 

by Maximum Likelihood. 

The loglikelihood af a time series is 

log L = 

n 

log p(yt|Yt−1). 

t=1 

In the state space model, p(yt|Yt−1) is a Gaussian density with 

mean at and variance Ft: 

log L = − n 1 

log 2π − 

2 2 

n 

t=1 

log Ft + F −1 

t v 2 t 

with vt and Ft from the Kalman filter. This is called the prediction 

error decomposition of the likelihood. Estimation proceeds by 

numerically maximising log L. 

,

Diagnostics 

◮ Null hypothesis: standardised residuals 

vt/ √ F t ∼ N ID(0, 1) 

◮ Apply standard test for Normality, heteroskedasticity, serial 

correlation; 

◮ A recursive algorithm is available to calculate smoothed 

disturbances (auxilliary residuals), which can be used to detect 

breaks and outliers; 

◮ Model comparison and parameter restrictions: use likelihood 

based procedures (LR test, AIC, BIC).

Nile Data: diagnostics 

2 

0 

−2 

2 

0 

−2 

(i) 

1880 1900 1920 1940 1960 

(iii) 

−2 −1 0 1 2 

0.5 

0.4 

0.3 

0.2 

0.1 

0.5 

0.0 

−0.5 

(ii) 

−3 −2 −1 0 1 2 3 

(iv) 

1.0 

0 5 10

Three exercises 

1. Consider LL model (see slides, see DK chapter 2). 

◮ Reduced form is ARIMA(0,1,1) process. Derive the 

relationship between signal-to-noise ratio q of LL model and 

the θ coefficient of the ARIMA model; 

◮ Derive the reduced form in the case ηt = √ qεt and notice the 

difference in the general case. 

◮ Give the elements of the mean vector and variance matrix of 

y = (y1, . . . , yn) ′ when yt is generated by a LL model. 

◮ Show that the forecasts of the Kalman filter (in a steady state) 

are the same as those generated by the exponentially weighted 

moving average (EWMA) method of forecasting: 

ˆyt+1 = ˆyt + λ(yt − ˆyt) for t = 1, . . . , n. Derive the relationship 

between λ and the signal-to-noise ratio q ?

Three exercises (cont.) 

2. Derive a Kalman filter for the local level model 

yt = µt + εt, εt ∼ N(0, σ 2 ε), ∆µt+1 = ηt ∼ N(0, σ 2 η), 

with E(εtηt) = σεη = 0 and E(εtηs) = 0 for all t, s and t = s. 

Also discuss the problem of missing obervations in this case. 

3. Write Ox program(s) that produce all Figures in Ch 2 of DK 

except Fig. 2.4. Data: 

http://www.ssfpack.com/dkbook.html

Selected references 

◮ A.C.Harvey (1989). Forecasting, Structural Time Series 

Models and the Kalman Filter. Cambridge University Press 

◮ G.Kitagawa & W.Gersch (1996). Smoothness Priors Analysis 

of Time Series. Springer-Verlag 

◮ J.Harrison & M.West (1997). Bayesian Forecasting and 

Dynamic Models. Springer-Verlag 

◮ R.H. Shumway and D.S. Stoffer (2000). Time Series Analysis 

and its Applications. Springer-Verlag. 

◮ J.Durbin & S.J.Koopman (2011). Time Series Analysis by 

State Space Methods. Second Edition, Oxford University 

Press 

◮ J.Commandeur & S.J.Koopman (2007). An Introduction to 

State Space Time Series Analysis. Oxford University Press

Introduction to Local Level Model and Kalman Filter

Create successful ePaper yourself

Delete template?

Save as template?