Time Series - STAT - EPFL

Time Series 

Anthony Davison 

c○2010 

http://stat.epfl.ch 

Second-order theory of stationary random processes 126 

Reminder: Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 

Spectrum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 

Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 

Normalized spectra for AR(1) models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 

Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 

Spectral distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 

Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 

Linear filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 

Effect of filtering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 

General linear process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 

ARMA models 137 

Autoregressive process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 

Moving average process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 

Invertibility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 

ARMA models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 

Causality, invertibility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 

Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 

Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 

1

Second-order theory of stationary random processes slide 126 

Reminder: Basic definitions 

Definition 28 (a) A random function is a set of random variables {Y (t)} such that the time-index t 

can take any real value. 

(b) The trend of {Y t } is the non-random function µ(t) = E{Y (t)}, and its autocovariance function 

is 

γ(t,s) = cov{Y (t),Y (s)} = E[{Y (t) − µ(t)}{Y (s) − µ(s)}], t,s ∈ R. 

The mean and covariance functions constitute the second-order properties of {Y (t)}. They 

determine the entire distribution of the random function if the joint distribution of any finite collection 

of random variables {Y (t 1 ),... ,Y (t k )} is multivariate normal. 

(c) The random function is stationary if µ(t) = µ and γ(t,s) = γ(|t − s|): the trend is constant and 

the covariance between Y (t) and Y (s) depends only on their time separation t − s. In this case we 

can define the autocorrelation function ρ(t) = γ(t,0)/γ(0,0). 

(d) A random sequence is a collection of random variables {Y t } in which the time index takes only 

integer values: t ∈ Z. We use a subscript notation to make this clear. 

(e) A white noise sequence is a random sequence consisting of mutually independent random 

variables each with mean zero and variance σ 2 . 

Time Series Spring 2010 – slide 127 

Spectrum 

Definition 29 (a) The autocovariance generating function of a stationary random sequence {Y t } 

with autocovariances γ k = cov(Y t ,Y t+k ) is 

(b) The spectrum of {Y t } is 

G(z) = 

∞∑ 

k=−∞ 

f(ω) = G(e −2πiω ) = 

γ k z k , z ∈ C. 

∞∑ 

k=−∞ 

γ k e −2πikω , ω ∈ R, 

where i 2 = −1; this may also be written as the real-valued function 

f(ω) = γ 0 + 2 

∞∑ 

γ k cos(2πkω), ω ∈ R. (5) 

k=1 

The normalised spectrum is f ∗ (ω) = f(ω)/γ 0 . 

The spectrum provides a convenient summary of the second-order properties of the process in a single 

function, and also shows the effect of linear operations on the series very simply. 


127

Examples 

Example 30 Find the spectrum and normalised spectrum for white noise. 

Example 31 Show that the spectrum of the AR(1) process Y t = αY t−1 + ε t with |α| < 1 is 

and that the normalised spectrum is 

f(ω) = 

f ∗ (ω) = 

σ 2 

1 − 2α cos(2πω) + α 2, 

1 − α 2 

1 − 2α cos(2πω) + α 2. 


Normalized spectra for AR(1) models 

Y 

−4 −2 0 1 2 3 

Y 

−4 −2 0 1 2 3 

0 50 

alpha=−0.5 

100 150 200 

Time 

alpha=0.5 

0 50 100 150 200 


f*(w) 

0.5 1.0 1.5 2.0 2.5 3.0 

f*(w) 

0.5 1.0 1.5 2.0 2.5 3.0 

0.0 0.1 0.2 0.3 0.4 0.5 

w 

0.0 0.1 0.2 0.3 0.4 0.5 

w 

alpha=0.9 

Y 

−4 −2 0 2 4 

f*(w) 

0 5 10 15 

0 50 100 150 200 0.0 0.1 0.2 0.3 0.4 0.5 


w 


128

Comments 

□ The form of the spectrum suggests that 

f(ω) = f(−ω), f(ω) = f(ω + k), k ∈ Z, 

so f need only be defined in 0 < ω < 1 2 

—recall the definition of the periodogram. 

□ The variance of the average Y of data Y 1 ,... ,Y n from a stationary process satisfies 

{ 

lim nvar(Y ) = lim 

n→∞ n→∞ 

} 

n−1 

∑ 

∞∑ 

γ 0 + 2 (1 − h/n)γ h = γ 0 + 2 γ h = f(0) 

h=1 

for large n. Thus the spectral density at the origin, if finite, equals the (rescaled) variance of Y . 

□ We easily see that ∫ 1/2 

−1/2 f(ω)dω = γ 0. 

□ The spectrum is the discrete Fourier transform of the autocovariance sequence. 

□ The inverse Fourier transform gives the autocovariance function by 

γ k = 

∫ 1/2 

−1/2 

e 2πikω f(ω)dω = 2 

∫ 1/2 

□ The spectral density need not always exist; cf. Example 33. 

0 

h=1 

cos(2πkω)f(ω)dω, k ∈ Z. (6) 


Spectral distribution 

Theorem 32 (a) A set of numbers {γ h } h∈Z is the autocovariance function of a stationary random 

sequence iff there exists a unique bounded function F defined on [− 1 2 , 1 2 

] such that F(−1/2) = 0, F 

is right-continuous and non-decreasing, with symmetric increments about zero, and 

∫ 

γ h = e 2πihu dF(u), h ∈ Z. 

(−1/2,1/2] 

The function F is called the spectral distribution function of γ h , and its derivative, if it exists, is 

called the spectral density function, or spectrum. 

(b) If ∑ h |γ h| < ∞, then f exists. 

(c) A function f(ω) defined on [−1/2,1/2] is the spectrum of a stationary process if and only if 

f(ω) = f(−ω), f(ω) ≥ 0, and ∫ 1/2 

−1/2 

f(ω)dω < ∞. 

□ The symmetric increments property means that if 0 ≤ a 

F(b) − F(a) = F(−a) − F(−b). 

□ The interpretation of F is that F(ω 2 ) − F(ω 1 ) measures the variation accounted for by 

fluctuations in frequency in the interval (ω 1 ,ω 2 ). 

□ Part (c) of the theorem suggests how we may construct covariance functions with desired 

properties, by choosing an appropriate spectrum—note that f may be any scaled density function. 


129

Example 

Example 33 Show that the covariance function of the stationary random sequence given by 

Y t = U 1 cos(2πω 0 t) + U 2 sin(2πω 0 t), 

U 1 ,U 2 

iid ∼ N(0,σ 2 ), 

may be written as 

γ h = 

∫ 1/2 

−1/2 

⎧ 

⎪⎨ 0, ω < −ω 0 , 

e 2πiωh dF(ω), F(ω) = σ 

⎪⎩ 

2 /2, −ω 0 ≤ ω < ω 0 , 

σ 2 , ω 0 ≤ ω. 


Linear filters 

Definition 34 A linear filter is a transformation of the random sequence {U t } of the form 

Y t = 

∞∑ 

j=−∞ 

□ If {U t } is stationary and 

a j U t−j . (7) 

– only a finite number of the a j are non-zero, then {Y t } is stationary; 

– infinitely many of the a j are non-zero, then the properties of {Y t } depend on their values. 

□ The relation between the spectra of the sequences is given by the following theorem: 

Theorem 35 The spectra of two stationary random sequences {U t } and {Y t } satisfying (7) are 

related by 

f Y (ω) = |a(ω)| 2 f U (ω), 

where a(ω) = ∑ ∞ 

j=−∞ a je −2πijω is the transfer function of the linear filter. 


Effect of filtering 

Example 36 Find the spectrum of a three-point moving average of an AR(1) process. 

Spectrum 

Squared modulus of transfer function 

Filtered spectrum 

f(w) 

0 1 2 3 4 

|a(w)|^2 

0.0 0.2 0.4 0.6 0.8 1.0 

f(w)|a(w)|^2 

0.0 0.1 0.2 0.3 0.4 

0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 

w 

w 

w 


130

General linear process 

Theorem 37 The spectrum of the general linear process 

Y t = 

∞∑ 

a j ε t−j , (8) 

j=0 

where {ε t } is white noise, may be written as 

f(ω) = b 0 + 

∞∑ 

b m cos(2πmω), −1/2 ≤ ω ≤ 1/2. (9) 

m=1 

□ Any real-valued even continuous function that satisfies f(ω) = f(ω + k) for integer k can be 

expressed as the (implicit) Fourier series (9), so any stationary random sequence with a 

continuous spectrum can be represented as a general linear process—at least so far as 

second-order properties are concerned. 

□ The general linear representation (8) is only useful if it involves only a few parameters—hence the 

use of ARMA models, which are quite flexible linear models with finite parameters. 

□ The computations leading to (9) implicitly presuppose that ∑ a 2 j < ∞, which then implies that 

{Y t } is stationary with finite variance and covariance function. 


ARMA models slide 137 

Autoregressive process 

Definition 38 An autoregressive process of order p, AR(p), model, is of the form 

Y t = φ 1 Y t−1 + φ 2 Y t−2 + · · · + φ p Y t−p + ε t , (10) 

where {Y t } is stationary and φ 1 ,... ,φ p are constants and φ p ≠ 0. Unless otherwise mentioned, we 

iid 

assume here that ε t ∼ N(0,σ 2 ). A process with non-zero mean is obtained by replacing Y t in (10) by 

Y t − µ, etc. 

The backshift operator B can be used to write (10) in the form 

(1 − φ 1 B − · · · − φ p B p )Y t = φ(B)Y t = ε t , 

where φ(B) is the autoregressive operator, and this suggests writing 

Y t = φ(B) −1 ε t 

to get the causal representation Y t = ∑ ∞ 

j=0 ψ jε t−j = ψ(B)ε t , if it exists. To find the coefficients of 

ψ(B), we suppose that such a representation exists, and then match terms on the left and right of the 

equation φ(B)Y t = φ(B)ψ(B)ε t = ε t . 


131

Moving average process 

Definition 39 A moving average model of order q, MA(q), is of the form 

Y t = ε t + θ 1 ε t−1 + · · · + θ q ε t−q , (11) 

where θ 1 ,... ,θ q are constants, θ q ≠ 0, and ε t 

iid ∼ N(0,σ 2 ). A process with non-zero mean is obtained 

by replacing Y t in (11) by Y t − µ. 

The backshift operator B can be used to write (11) in the form 

Y t = (1 + θ 1 B + · · · + θ q B q )ε t = θ(B)ε t , 

where θ(B) is the moving average operator. The process (11) is stationary for any values of the θ r . 

Example 40 Show that the MA(1) processes with parameters θ 1 and 1/θ 1 are statistically 

indistinguishable. 


Invertibility 

Definition 41 A moving average process {Y t } is called invertible if it has an infinite autoregressive 

representation 

∞∑ 

ε t = a j Y t−j . 

□ This definition is needed simply in order to ensure the identifiability of MA processes. 

□ In Example 40 it is easy to check which version is invertible, we write 

ε t = (1 + θ 1 B) −1 Y t = 

which is convergent iff |θ 1 | < 1. 

j=0 

∞∑ 

∞∑ 

(−θ 1 B) j Y t = (−θ 1 ) j Y t−j , 

j=0 


j=0 

132

ARMA models 

Definition 42 A time series {Y t } is an autoregressive-moving average process of order p,q, 

ARMA(p,q), model, if it is stationary and of the form 

Y t = φ 1 Y t−1 + φ 2 Y t−2 + · · · + φ p Y t−p + ε t + θ 1 ε t−1 + · · · + θ q ε t−q , (12) 

where φ 1 ,...,φ p ,θ 1 ,... ,θ q are constants with φ p ,θ q ≠ 0. Unless otherwise mentioned, we assume 

iid 

that ε t ∼ N(0,σ 2 ). A process with non-zero mean is obtained by replacing the Y s in (12) by Y t − µ, 

etc. 

□ We use the autoregressive and moving average operators to write (12) as 

φ(B)Y t = θ(B)ε t . 

The properties of the process are intimately tied to the polynomials φ(z),θ(z), where we take 

z ∈ C. Let D = {z ∈ C : |z| ≤ 1} denote the unit disk in the complex plane. 

□ We remove common factors from φ(B) and θ(B) to eliminate overparametrisation, also called 

parameter redundancy. 


Causality, invertibility 

Definition 43 An ARMA(p,q) process φ(B)Y t = θ(B)ε t is causal if it can be written as a linear 

process 

∞∑ 

Y t = ψ j ε t−j = ψ(B)ε t , 

where ∑ |ψ j | < ∞, and we set ψ 0 = 1. It is invertible if it can be written as 

where ∑ |π j | < ∞, and we set π 0 = 1, 

ε t = 

j=0 

∞∑ 

π j Y t−j = π(B)Y t , 

j=0 

Theorem 44 (a) An ARMA(p,q) process φ(B)Y t = θ(B)ε t is causal iff φ(z) ≠ 0 within D. If so, 

then the coefficients of ψ(z) satisfy ψ(z) = θ(z)/φ(z) for z ∈ D. 

(b) The process is invertible iff θ(z) ≠ 0 for z ∈ D. If so, then the coefficients of π(z) satisfy 

π(z) = φ(z)/θ(z) for for z ∈ D. 

Thus the process is causal iff the roots of φ(z) lie outside D, and invertible iff the roots of θ(z) lie 

outside D. 


Example 

Example 45 Investigate the properties of the process 

Y t = 0.4Y t−1 + 0.45Y t−2 + ε t + ε t−1 + 0.25ε t−2 . 


133

Comments 

□ Why use ARMA processes 

– usually an empirical model, using φ 1 ,...φ p ,θ 1 ,...,θ q as summary statistics, but with no 

implication that the model has a ‘scientific’, explanatory, basis in terms of the underlying data 

generating mechanism 

– the spectrum of an ARMA process can take many forms without p or q being very large, so 

they provide a flexible and parsimonious way to approximate a wide range of second-order 

properties 

– they are useful for forecasting, or for other settings where the autocorrelation structure of the 

data is not of primary interest 

□ ARMA models are not usually useful when the focus is on understanding the underlying 

mechanism that generates the data 

□ AR and MA models separately may provide more interpretable models in such cases: 

– AR models have Markov structure, which may be interpretable 

– MA models stem from weighted moving averages, which may be interpretable 


134

Time Series - STAT - EPFL

Create successful ePaper yourself

Delete template?

Save as template?