CHAPTER 2 Univariate Time Series Models 2.1 Least ... - Statistics

CHAPTER 2 

Univariate Time Series Models 

2.1 Least Squares Regression 

We begin our discussion of univariate and multivariate time series methods by 

considering the idea of a simple regression model, which we have met before in 

other contexts. All of the multivariate methods follow, in some sense, from the 

ideas involved in simple univariate linear regression. In this case, we assume 

that there is some collection of fixed known functions of time, say zt1, zt2, . . . ztq 

that are influencing our output yt which we know to be random. We express 

this relation between the inputs and outputs as 

yt = β1zt1 + β2zt2 + · · · + βqztq + et 

(2.1) 

at the time points t = 1, 2, . . . , n, where β1, . . . , βq are unknown fixed regression 

coefficients and et is a random error or noise, assumed to be white noise; 

this means that the observations have zero means, equal variances σ 2 and are 

independent. We traditionally assume also that the white noise series, et, is 

Gaussian or normally distributed. 

Example 2.1: 

We have assumed implicitly that the model 

yt = β1 + β2t + et 

is reasonable in our discussion of detrending in Chapter 1. This is in 

the form of the regression model (2.1) when one makes the identification 

zt1 = 1, zt2 = t. The problem in detrending is to estimate the coefficients 

β1 and β2 in the above equation and detrend by constructing the 

estimated residual series et. We discuss the precise way in which this is 

accomplished below. 

The linear regresssion model described by Equation (2.1) can be conveniently 

written in slightly more general matrix notation by defining the column

2.1: Least Squares Regression 27 

vectors zt = (zt1, . . . , ztq) ′ and β = (β1, . . . , βq) ′ so that we write (2.1) in the 

alternate form 

yt = β ′ zt + et. (2.2) 

To find estimators for β and σ 2 it is natural to determine the coefficient vector 

β minimizing e 2 t with respect to β. This yields least squares or maximum 

likelihood estimator ˆ β and the maximum likelihood estimator for σ 2 which is 

proportional to the unbiased 

ˆσ 2 n−1 

1 

= 

(yt − 

(n − q) 

ˆ β ′ 

zt) 2 

t=0 

An alternate way of writing the model (2.2) is as 

(2.3) 

y = Zβ + e (2.4) 

where Z ′ = (z1, z2, . . . , zn) is a q×n matrix composed of the values of the input 

variables at the observed time points and y ′ = (y1, y2, . . . , yn) is the vector of 

observed outputs with the errors stacked in the vector e ′ = (e1, e2, . . . , en) 

.The ordinary least squares estimators ˆ β are the solutions to the normal 

equations 

Z ′ Z ˆ β = Z ′ y, 

You need not be concerned as to how the above equation is solved in practice 

as all computer packages have efficient software for inverting the q × q matrix 

Z ′ Z to obtain 

ˆβ = (Z ′ Z) −1 Z ′ y. (2.5) 

An important quantity that all software produces is a measure of uncertainty 

for the estimated regression coefficients, say 

cov{ ˆ ˆ β} = ˆσ 2 (Z ′ Z) −1 . (2.6) 

If cij denotes an element of C = (Z ′ Z) −1 , then cov( ˆ βi, ˆ βj) = σ 2 cij and a 

100(1 − α)% confidence interval for βi is 

ˆβi ± tn−q(α/2)ˆσ √ cii, (2.7) 

where tdf (α/2) denotes the upper 100(1 − α)% point on a t distribution with 

df degrees of freedom. 

Example 2.2: 

Consider estimating the possible global warming trend alluded to in Section 

1.1.2. The global temperature series, shown previously in Figure 

1.3 suggests the possibility of a gradually increasing average temperature 

over the 123 year period covered by the land-based series. If we 

fit the model in Example 2.1, replacing t by t/100 to convert to a 100

28 1 Univariate Time Series Models 

year base so that the increase will be in degrees per 100 years, we obtain 

ˆβ1 = 38.72, ˆ β2 = .9501 using (2.5). The error variance, from (2.3), is 

.0752, with q = 2 and n = 123. Then (2.6) yields 

cov( ˆ ˆ β1, ˆ β2) = 

1.8272 −.0941 

−.0941 .0048 

leading to an estimated standard error of √ .0048 = .0696. The value of t 

with n−q = 123−2 = 121 degrees of freedom for α = .025 is about 1.98, 

leading to a narrow confidence interval of .95 ± .138 for the slope leading 

to a confidence interval on the one hundred year increase of about .81 

to 1.09 degrees. We would conclude from this analysis that there is a 

substantial increase in global temperature amounting to an increase of 

roughly one degree F per 100 years. 

1 

0.5 

0 

Detrended Temperature 

ACF = γ x (h) 

−0.5 

0 5 10 

lag 

15 20 

1 

0.5 

0 

Differenced Temperature 


−0.5 

0 5 10 

lag 

15 20 

1 

0.5 

0 

 

, 

PACF = Φ hh 

−0.5 

0 5 10 

lag 

15 20 

1 

0.5 

0 

PACF = Φ hh 

−0.5 

0 5 10 

lag 

15 20 

Figure 2.1 Autocorrelation functions (ACF) and partial autocorrelation functions 

(PACF) for the detrended (top panel) and differenced (bottom panel) global 

temperature series. 

If the model is reasonable, the residuals êt = yt − ˆ β1 − ˆ β2 t should be 

essentially independent and identically distributed with no correlation evident. 

The plot that we have made in Figure 1.3 of the detrended global temperature 

series shows that this is probably not the case because of the long low frequency

2.1: Least Squares Regression 29 

in the observed residuals. However, the differenced series, also shown in Figure 

1.3 (second panel), appears to be more independent suggesting that perhaps 

the apparent global warming is more consistent with a long term swing in 

an underlying random walk than it is of a fixed 100 year trend. If we check 

the autocorrelation function of the regression residuals, shown here in Figure 

2.1, it is clear that the significant values at higher lags imply that there is 

significant correlation in the residuals. Such correlation can be important 

since the estimated standard errors of the coefficients under the assumption 

that the least squares residuals are uncorrelated is often too small. We can 

partially repair the damage caused by the correlated residuals by looking at a 

model with correlated errors. The procedure and techniques for dealing with 

correlated errors are based on the Autoregressive Moving Average (ARMA) 

models to be considered in the next sections. Another method of reducing 

correlation is to apply a first difference ∆xt = xt − xt−1 to the global trend 

data. The ACF of the differenced series, also shown in Figure 2.1, seems to 

have lower correlations at the higher lags. Figure 1.3 shows qualitatively that 

this transformation also eliminates the trend in the original series. 

Since we have again made some rather arbitrary looking specifications for 

the configuration of dependent variables in the above regression examples, the 

reader may wonder how to select among various plausible models. We mention 

that two criteria which reward reducing the squared error and penalize for 

additional parameters are the Akaike Information Criterion 

AIC(K) = log ˆσ 2 + 2K 

n 

and the Schwarz Information Criterion 

SIC(K) = log ˆσ 2 + 

(2.8) 

K log n 

, (2.9) 

n 

(Schwarz, 1978) where K is the number of parameters fitted (exclusive of variance 

parameters) and ˆσ 2 is the maximum likelihood estimator for the variance. 

This is sometimes termed the Bayesian Information Criterion, BIC and will 

often yield models with fewer parameters than the other selection methods. A 

modification to AIC(K) that is particularly well suited for small samples was 

suggested by Hurvich and Tsai (1989). This is the corrected AIC, given by 

AICC(K) = log ˆσ 2 + 

n + K 

n − K − 2 

(2.10) 

The rule for all three measures above is to choose the value of K leading to the 

smallest value of AIC(K) or SIC(K) or AICC(K). We will give an example 

later comparing the above simple least squares model with a model where the 

errors have a time series correlation structure. 

The organization of this chapter is patterned after the landmark approach 

to developing models for time series data pioneered by Box and Jenkins (see


Box et al, 1994). This assumes that there will be a representation of time 

series data in terms of a difference equation that relates the current value 

to its past. Such models should be flexible enough to include non-stationary 

realizations like the random walk given above and seasonal behavior, where 

the current value is related to past values at multiples of an underlying season; 

a common one might be multiples of 12 months (1 year) for monthly data. 

The models are constructed from difference equations driven by random input 

shocks and are labeled in the most general formulation as ARIMA , i.e., 

AutoRegressive Integrated Moving Average processes. The analogies 

with differential equations, which model many physical processes, are obvious. 

For clarity, we develop the separate components of the model sequentially, 

considering the integrated, autoregressive and moving average in order, followed 

by the seasonal modification. The Box-Jenkins approach suggests three 

steps in a procedure that they summarize as l identification, estimation 

and forecasting. Identification uses model selection techniques, combining 

the ACF and PACF as diagnostics with the versions of AIC given above to 

find a parsimonious (simple) model for the data. Estimation of parameters in 

the model will be the next step. Statistical techniques based on maximum likelihood 

and least squares are paramount for this stage and will only be sketched 

in this course. Finally, forecasting of time series based on the estimated parameters, 

with sensible estimates of uncertainty, is the bottom line, for any 

assumed model. 

2.2 Integrated (I) Models 

We begin our study of time correlation by mentioning a simple model that will 

introduce strong correlations over time. This is the random walk model which 

defines the current value of the time series as just the immediately preceding 

value with additive noise. The model forms the basis, for example, of the 

random walk theory of stock price behavior. In this model we define 

xt = xt−1 + wt, (2.11) 

where wt is a white noise series with mean zero and variance σ 2 . Figure 2.2 

shows a typical realization of such a series and we observe that it bears a 

passing resemblance to the global temperature series. Appealing to (2.11), 

the best prediction of the current value would be expected to be given by its 

immediately preceding value. The model is, in a sense, unsatisfactory, because 

one would think that better results would be possible by a more efficient use 

of the past. 

The ACF of the original series, shown in Figure 2.3, exhibits a slow decay 

as lags increase. In order to model such a series without knowing that it is 

necessarily generated by (2.11), one might try looking at a first difference and 

comparing the result to a white noise or completely independent process. It is

2.2 I Models 31 

5 

0 

−5 

−10 

Random walk: x t =x t−1 +w t 

−15 

0 20 40 60 80 100 120 140 160 180 200 

2 

1 

0 

−1 

−2 

3 First Difference: x t −x t−1 

−3 

0 20 40 60 80 100 120 140 160 180 200 

Figure 2.2 A typical realization of the random walk series (top panel and the first 

difference of the series (bottom panel) 

clear from (2.11) that the first difference would be ∆xt = xt −xt−1 = wt which 

is just white noise. The ACF of the differenced process, in this case, would be 

expected to be zero at all lags h = 0 and the sample ACF should reflect this 

behavior. The first difference of the random walk in Figure 2.2 is also shown 

in Figure 2.3 and we note that it appears to be much more random. The ACF, 

shown in Figure 2.3, reflects this predicted behavior, with no significant values 

for lags other than zero. It is clear that (2.11) is a reasonable model for this 

data. The original series is nonstationary, with an autocorrelation function 

that depends on time of the form 

⎧ 

⎪⎨ 

t 

t+h , h ≥ 0 

ρ(xt+h, xt) = 

⎪⎩ 

 

t+h 

t , h < 0


1 

0.5 

0 

Random Walk 


−0.5 

0 5 10 

lag 

15 20 

1 

0.5 

0 

First Difference 


−0.5 

0 5 10 

lag 

15 20 

1 

0.5 

0 

PACF = Φ hh 

−0.5 

0 5 10 

lag 

15 20 

1 

0.5 

0 

PACF = Φ hh 

−0.5 

0 5 10 

lag 

15 20 

Figure 2.3 Autocorrelation functions (ACF) and partial autocorrelation functions 

(PACF) for the random walk (top panel) and the first difference (bottom 

panel) series. 

The above example, using a difference transformation to make a random 

walk stationary, shows a very particular case of the model identification procedure 

advocated by Box et al (1994). Namely, we seek a linearly filtered 

transformation of the original series, based strictly on the past values, that 

will reduce it to completely random white noise. This gives a model that 

enables prediction to be done with a residual noise that satisfies the usual 

statistical assumptions about model error. 

We will introduce, in the following discussion, more general versions of 

this simple model that are useful for modeling and forecasting series with 

observations that are correlated in time. The notation and terminology were 

introduced in the landmark work by Box and Jenkins (1970) (see Box et al, 

1994). A requirement for the ARMA model of Box and Jenkins is that the 

underlying process be stationary. Clearly the first difference of the random 

walk is stationary but the ACF of the first difference shows relatively little 

dependence on the past, meaning that the differenced process is not predictable 

in terms of its past behavior. 

To introduce a notation that has advantages for treating more general models, 

define the backshift operator B as the result of shifting the series back 

by one time unit, i.e. 

Bxt = xt−1, (2.12)

2.2 AR Models 33 

and applying successively higher powers, B k xt = xt−k. The operator has many 

of the usual algebraic properties and allows, for example, writing the random 

walk model (2.11) as 

(1 − B)xt = wt. 

Note that the difference operator discussed previously in 1.2.2 is just ∇ = 1−B. 

Identifying nonstationarity is an important first step in the Box-Jenkins 

procedure. From the above discussion, we note that the ACF of a nonstationary 

process will tend to decay rather slowly as a function of lag h. For example, 

a straightly line would be perfectly correlated, regardless of lag. Based on 

this observation, we mention the following properties that aid in identifying 

non-stationarity. 

Property P2.1: ACF and PACF of a non-stationary time series 

The ACF of a non-stationary time series decays very slowly as 

a function of lag h. The PACF of a non-stationary time series 

tends to have a peak very near unity at lag 1, with other values 

less than the significance level. 

2.3 Autoregressive (AR) Models 

Now, extending the notions above to more general linear combinations of past 

values might suggest writing 

xt = φ1xt−1 + φ2xt−2 + . . . φpxt−p + wt 


as a function of p past values and an additive noise component wt. The model 

given by (2.12) is called an autoregressive model of order p, since it is assumed 

that one needs p past values to predict xt. The coefficients φ1, φ2, . . . , φp 

are autoregressive coefficients, chosen to produce a good fit between the observed 

xt and its prediction based on xt−1, xt−2, . . . , xt−p. It is convenient to 

rewrite (2.13), using the backshift operator, as 

where 

φ(B)xt = wt, (2.14) 

φ(B) = 1 − φ1B − φ2B 2 − . . . − φpB p 


is a polynomial with roots (solutions of φ(B) = 0) outside the unit circle 

(|Bk| > 1). The restrictions are necessary for expressing the solution xt of 

(2.14) in terms of present and past values of wt. That solution has the form 

xt = ψ(B)wt 

(2.16)


where 

ψ(B) = 

∞ 

ψkB k , (2.17) 

k=0 

is an infinite polynomial (ψ0 = 1), with coefficients determined by equating 

coefficients of B in 

ψ(B)φ(B) = 1. (2.18) 

Equation (2.16) can be obtained formally by noting that choosing ψ(B) satisfying 

(2.18), and multiplying both sides of (2.16) by ψ(B) gives the representation 

(2.16). It is clear that the random walk has B1 = 1, which does not 

satisfy the restriction and the process is nonstationary. 

Example 2.2 

Suppose that we have an autoregressive model (2.13) with p = 1, i.e., 

xt − φ1xt−1 = (1 − φ1B)xt = wt. Then (2.18) becomes 

(1 + ψ1B + ψ2B 2 + . . .)(1 − φ1B) = 1 

Equating coefficients of B implies that ψ1 − φ1 = 0 or ψ1 = φ1. For B 2 , 

we would get ψ2 − φ1ψ1 = 0, or ψ2 = φ 2 1. Continuing, we obtain ψk = φ k 1 

and the representation is 

and we have 

ψ(B) = 1 + 

xt = 

∞ 

k=0 

∞ 

φ k 1B k 

k=1 

φ k 1wt−k. (2.19) 

The representation (2.16) is fundamental for developing approximate 

forecasts and also exhibits the series as a linear process of the form considered 

in Problem 1.4. 

For data involving such autoregressive (AR) models as defined above, the 

main selection problems are deciding that the autoregressive structure is appropriate 

and then in determining the value of p for the model. The ACF of 

the process is a potential aid for determining the order of the process as are 

the model selection measures (2.8)-(2.10). To determine the ACF of the pth 

order AR in (2.13), , write the equation as 

xt − 

p 

k=1 

φkxt−k = wt


and multiply both sides by xt−h, h = 1, 2, . . .. Assuming that the mean E(xt) = 

0, and using the definition of the autocovariance function (1.2) leads to the 

equation 

p 

E[(xt − φkxt−k)xt−h] = E[wtxt−h] 

k=1 

The left-hand side immediately becomes 

γx(h) − 

p 

φkγx(h − k). 

k=1 

The representation (2.16) implies that 

E[wtxt−h] = E[wt(wt−h + ψ1wt−h−1 + ψ2wt−h−2 + . . .)] 

For h = 0, we get σ 2 w. For all other h, the fact that the wt are independent 

implies that the right-hand side will be zero. Hence, we may write the equations 

for determining γx(h) as 

and 

γx(0) − 

γx(h) − 

p 

φkγx(h − k) = σ 2 w 

k=1 

(2.20) 

p 

φkγx(h − k) = 0 (2.21) 

k=1 

for h = 1, 2, 3, . . .. Note that one will need the property γx(−h) = γx(h) 

in solving these equations. Equations (2.20) and (2.21) are called the Yule- 

Walker Equations (see Yule, 1927, Walker, 1931). 

Example 2.3 

Consider finding the ACF of the first-order autoregressive model. First, 

(2.21) implies that γx(0) − φ1γx(1) = σ 2 w. For h = 1, 2, . . ., we obtain 

γx(h) − φ1γx(h − 1) = 0 Solving these successively gives 

Combining with (2.20) yields 

γx(h) = γx(0)φ h 1 

γx(0) = σ2 w 

1 − φ 2 1 

It follows that the autocovariance function is 

γx(h) = σ2 w 

1 − φ2 φ 

1 

h 1


Taking into account that γx(−h) = γx(h) and using (1.3), we obtain 

for h = 0, ±1, ±2, . . .. 

ρx(h) = φ |h| 

1 

The exponential decay is typical of autoregressive behavior and there may 

also be some periodic structure. However, the most effective diagnostic of AR 

structure is in the PACF and is summarized by the following identification 

property: 

Property P2.2: PACF for AR Process 

The partial autocorrelation function φhh as a function of lag h 

is zero for h > p, the order of the autoregressive process. This 

enables one to make a preliminary identification of the order p 

of the process using the partial autocorrelation function PACF. 

Simply choose the order beyond which most of the sample values 

of the PACF are approximately zero. 

To verify the above, note that the PACF (see Section 1.3.3) is basically the 

last cofficient obtained when minimizing the squared error 

MSE = E[(xt+h − 

h 

k=1 

akxt+h−k) 2 ]. 

Setting the derivatives with respect to aj equal to zero leads to the equations 

This can be written as 

E[((xt+h − 

γx(j) − 

h 

akxt+h−k)xt+h−j] = 0 

k=1 

h 

akγx(j − k) = 0 

i=1 

for j = 1, 2, . . . , h. Now, from Equation and (2.21), it is clear that, for an 

AR(p), we may take ak = φk for k ≤ p and ak = 0 for k > p to get a solution 

for the above equation. This implies Property P2.3 above. 

Having decided on the order p of the model, it is clear that, for the estimation 

step, one may write the model (2.13) in the regression form 

xt = φ ′ zt + wt, (2.22)


where φ = (φ1, φ2, . . . , φp) ′ corresponds to β and zt = (xt−1, xt−2, . . . , xt−p) ′ 

is the vector of dependent variables in (2.2). Taking into account the fact that 

xt is not observed for t ≤ 0, we may run the regression approach in Section 3.1 

for t = p, p + 1, . . . , n − 1 to get estimators for φ and for σ 2 , the variance of the 

white noise process. These so-called conditional maximum likelihood estimators 

are commonly used because the exact maximum likelihood estimators involve 

solving nonlinear equations. 

Example 2.4 

We consider the simple problem of modeling the recruit series shown in 

Figure 1.1 using an autoregressive model. The bottom panel of Figure 1.9 

shows the autocorrelation ACF and partial autocorelation PACF functions 

of the recruit series. The PACF has large values for h = 1, 2 and 

then is essentially zero for higher order lags. This implies by Property 

P2.2 above that a second order (p = 2) AR model might provide a good 

fit. Running the regression program for the model 

leads to the estimators 

xt = β0 + φ1xt−1 + φ2xt−2 + wt 

ˆβ0 = 6.74(1.11), ˆ φ1 = 1.35(.04), ˆ φ2 = −.46(.04), ˆσ 2 = 90.31 

where the estimated standard deviations are in parentheses. To determine 

whether the above order is the best choice, we fitted models for 

p = 1, . . . , 10, obtaining corrected AICC values of 5.75, 5.52, 5.53, 5.54, 

5.54, 5.55, 5.55, 5.56, 5.57, and 5.58 respectively using (2.10) with K = 2. 

This shows that the minimum AICC obtains for p = 2 and we choose 

the second-order model. 

Example 2.5 

The previous example used various autoregressive models for the recruits 

series, fitting a second-order regression model. We may also use this regression 

idea to fit the model to other series such as a detrended version 

of the Southern Oscillation Index (SOI) given in previous discussions. 

We have noted in our discussion of Figure 1.9 from the partial autocorrelation 

function (PACF) that a plausible model for this series might be 

a first order autoregression of the form given above with p = 1. Again, 

putting the model above into the regression framework (2.2) for a single 

coefficient leads to the estimators ˆ φ1 = .59 with standard error .04, 

ˆσ 2 = .09218 and AICC(1) = −1.375. The ACF of these residuals (not 

shown), however, will still show cyclical variation and it is clear that 

they still have a number of values exceeding the ±1.96/ √ n threshold


(see Equation 1.14). A suggested procedure is to try higher order autoregressive 

models and successive models for p = 1, 2, . . . , 30 were fitted 

and the AICC(K) values are plotted in Figure 3.10 of Chapter 3 so we do 

not repeat it here. There is a clear minimum for a p = 16th order model. 

The coefficient vector is ˆ φ with components .40, .07, .15, .08, -.04, -.08, 

-.09, -.08, .00, .11, .16, .15, .03, -.20, -.14 and -.06 and ˆσ 2 = .07354. 

Finally, we give a general approach to forecasting for any process that can 

be written in the form (2.16). This includes the AR, MA and ARMA processes. 

We begin by defining an h-step forecast of the process xt as 

x t t+h = E[xt+h|xt, xt−1, . . .] (2.23) 

Note that this is not exactly right because we only have x1, xt, . . . , xt available, 

so that conditioning on the infinite past is only an approximation. From this 

definition is reasonable to intuit that x t s = xt, s ≤ t and 

for s ≤ t. For s > t, use x t s and 

E[ws|xt, xt−1, . . .] = E[ws|wt, wt−1 . . .] = ws, (2.24) 

E[ws|xt, xt−1, . . .] = E[ws|wt, wt−1, . . .] = E[ws] = 0, (2.25) 

since ws will be independent of past values of wt. We define the h-step forecast 

variance as 

P t t+h = E[(xt+h − x t t+h) 2 |xt, xt−1, . . .]. (2.26) 

To develop an expression for this mean square error, note that, with ψ0 = 1, 

we can write 

∞ 

xt+h = ψkwt+h−k. 

Then, since w t t+h−k 

so that the residual is 

k=0 

= 0 for t + h − k > t, i.e. k < h, we have 

x t t+h = 

∞ 

k=h 

xt+h − x t h−1 

t+h = 

ψkwt+h−k, 

k=0 

ψkwt+h−k, 

Hence, the mean square error (2.26) is just the variance of a linear combination 

of independent zero mean errors, with common variance σ 2 w 

h−1 

P t t+h = σ 2 w ψ 

k=0 

2 k 

(2.27)


As an example, we consider forecasting the second order model developed for 

the recruit series in Example 2.5. 

Example 2.6 

Consider the one-step forecast x t t+1 first. Writing the defining equation 

for t + 1 gives 

xt+1 = φ1xt + φ2xt−1 + wt+1, 

so that 

x t t+1 

Continuing in this vein, we obtain 

Then, 

x t t+h 

x t t+2 

= φ1x t t + φ2x t t−1 + w t t+1 

= φ1xt + φ2xt−1 + 0 

= φ1x t t+1 + φ2x t t + w t t+2 

= φ1x t t+1 + φ2xt + 0. 

= φ1x t t+h−1 + φ2x t t+h−2 + wt t+h 

= φ1x t t+h−1 + φ2x t t+h−2 

for h > 2. Forecasts out to lag h = 4 and beyond, if necessary, can be 

found by solving (2.18) for ψ1, ψ2 and ψ3, and substituting into (2.26). 

By equating coefficients of B, B 2 and B 3 in 

+ 0 

(1 − φ1B − φ2B 2 )(1 + ψ1B + ψ2B 2 + ψ3B 3 + . . .) = 1, 

we obtain ψ1 = φ1, ψ2 − φ2 + φ1ψ1 = 0 and ψ3 − φ1ψ2 − φ2ψ1 = 0. 

This gives the coefficients ψ1 = φ1, ψ2 = φ2 − φ 2 1, ψ3 = 2φ2φ1 − φ 2 1 From 

Example 2.5, we have ˆ φ1 = 1.35, ˆ φ2 = −.46, ˆσ 2 w = 90.31 and ˆ β0 = 6.74. 

The forecasts are of the form 

x t t+h = 6.74 + 1.35x t t+h−1 − .46x t t+h−2 

For the forecast variance, we evaluate ψ1 = 1.35, ψ2 = −2.282, ψ3 = 

−3.065, leading to 90.31, 90.31(2.288), 90.31(7.495) and 90.31(16.890) for 

forecasts at h = 1, 2, 3, 4. The standard deviations of the forecasts are 

9.50, 14.37, 26.02 and 39.06 for the standard errors of the forecasts. The 

recruit series values range from 20 to 100 so the forecast uncertainty will 

be rather large.


2.4 Moving Average (MA) Models 

We may also consider processes that contain linear combinations of underlying 

unobserved shocks, say, represented by white noise series wt. These moving 

average components generate a series of the form 

xt = wt − θ1wt−1 − θ2wt−2 − . . . − θqwt−q 

(2.28) 

where q denotes the order of the moving average component and θ1, θ2, . . . , θq 

are parameters to be estimated. Using the shift notation, the above equation 

can be written in the form 

xt = θ(B)wt 

(2.29) 

where 

θ(B) = 1 − θ1B − θ2B 2 − . . . − θqB q 

(2.30) 

is another polynomial in the shift operator B. It should be noted that the MA 

process of order q is a linear process of the form considered earlier in Problem 

1.4 with ψ0 = 1, ψ2 = −θ1, . . . , ψq = θq. This implies that the ACF will be 

zero for lags larger than q because terms in the form of the covariance function 

given in Problem 1.4 of Chapter 1 will all be zero. Specifically, he exact forms 

are 

for h + 0 and 

γx(0) = σ 2 w 

γx(h) = σ 2 w 

 

1 + 

q 

k=1 

q−h 

−θh + 

k=1 

θ 2 k 

 

θk+hθk 

for h = 1, . . . , q − 1, with γx(q) = −σ 2 wθq, and γx(h) = 0 for h > q. 

Hence, we will have 

 

(2.31) 

(2.32) 

P2.3: ACF for MA Series 

For a moving average series of order q, note that the autocorrelation 

function (ACF) is zero for lags h > q, i.e. ρx(h) = 0 

for h > q. Such a result enables us to diagnose the order of a 

moving average component by examining ˆρx(h) and choosing q 

as the value beyond which the coefficients are essentially zero. 

Example 2.7 

Consider the varve thicknesses in Figure 1.10, which is described in Problem 

1.7 of Chapter 1. Figure 2.4 shows the ACF and PACF of the original 

log-transformed varve series and the first differences. The ACF of the 

original series indicates a possible non-stationary behavior, and suggests 

taking a first difference, interpreted hear as the percentage yearly change

2.2 MA Models 41 

in deposition. The ACF of the first difference shows a clear peak at h = 1 

and no other significant peaks, suggesting a first-order moving average. 

Fitting the first order moving average model xt = wt − θ1wt−1 to this 

data using the Gauss-Newton procedure described next leads to ˆ θ1 = .77 

and ˆσ 2 w = .2358. 

1 

0.5 

0 

log varves 

−0.5 

0 10 20 30 

1 

0.5 

0 

ACF 

First difference 

−0.5 

0 10 20 30 

1 

0.5 

0 

−0.5 

0 10 20 30 

1 

0.5 

0 

PACF 

−0.5 

0 10 20 30 


(PACF) for the log varve series (top panel) and the first difference (bottom 

panel), showing a peak in the ACF at lag h = 1. 

Fitting the pure moving average term turns into a nonlinear problem as we 

can see by noting that either maximum likelihood or regression involves solving 

(2.28) or (2.29) for wt, and minimizing the sum of the squared errors. Suppose 

that the roots of π(B) = 0 are all outside the unit circle, then this is possible 

by solving π(B)θ(B) = 1, so that, for the vector parameter θ = (θ1, . . . , θq) ′ , 

we may write 

wt(θ) = π(B)xt 

(2.33) 

and minimize 

SSE(θ) = 

n 

t=q+1 

w 2 t (θ) 

as a function of the vector parameter θ. We don’t really need to find the operator 

π(B) but can simply solve (2.23) recursively for wt, with w1, w2, . . . wq = 0


and 

wt(θ) = xt + 

q 

k=1 

θkwt−k 

for t = q+1, . . . , n. It is easy to verify that SSE(θ) will be a nonlinear function 

of θ1, θ2, . . . , θq. However, note that 

′ 

∂wt 

wt(θ) ≈ wt(θ0) + (θ − θ0), 

∂t 

where the derivative is evaluated at the previous guess θ0. Rearranging the 

above equation leads to 

 

wt(θ0) ≈ − ∂wt 

′ 

(θ − θ0) + wt(θ), (2.34) 

∂θ 

which is just the regression model (2.2). Hence, we can begin with an initial 

guess θ0 = (.1, .1, . . . , .1) ′ , say and successively minimize SSE(θ) until 

convergence. 

In order to forecast a moving average series, note that 

q 

xt+h = wt+h − θkwt+h−k. 

The results below (2.24) imply that 

x t t+h = − 

q 

k=1 

k=h+1 

θkw t t+h−k, 

where the wt values needed for the above are computed recursively as before. 

Because of (2.17), it is clear that ψ0 = 1 and ψk = −θk, k = 1, 2, . . . , q and 

these values can be substituted directly into the variance formula (2.27). 

2.5 Autoregressive Integrated Moving Average 

(ARIMA) Models 

Now combining the autoregressive and moving average components leads 

to the autoregressive moving average ARMA(p, q) model, written as 

φ(B)xt = θ(B)wt, (2.35) 

where the polynomials in B are as defined earlier in (2.15) and (2.29), with p 

autoregressive coefficients and q moving average coefficients. In the difference 

equation form, this becomes 

p 

q 

xt − φkxt−k = wt − θkwt−k. (2.36) 

k=1 

k=1

2.5 ARIMA Models 43 

The mixed processes do not satisfy the properties P2.1-P2.3 any more but 

they tend to behave in approximately the same way, even for the mixed cases. 

Estimation and forecasting for such problems are treated in essentially the 

same manner as for the AR and MA processes. We note that we can formally 

divide both sides of (2.25) by φ(B) and note that the usual representation 

(2.16) holds when 

ψ(B)φ(B) = θ(B). (2.37) 

For forecasting, we determine the ψ1, ψ2, . . . by equating coefficients of B, B 2 , B 3 , . . . 

in (2.37), as before, assuming the all the roots of φ(B) = 0 are greater than 

one in absolute value. Similarly, we can always solve for the residuals, say 

wt = xt − 

p 

φkxt−k + 

k=1 

q 

k=1 

θkwt−k 

to get the terms needed for forecasting and estimation. 

(2.38) 

Example 2.8 

Consider the above mixed process with p = q = 1, i.e. ARMA(1, 1). By 

(2.26), we may write 

Now, 

so that 

xt = φ1xt−1 + wt − θ1wt−1. 

xt+1 = φ1xt + wt+1 − θ1wt 

x t t+1 = φ1xt + 0 − θ1wt 

and xt t+h = φxt t+h−1 for h > 1, leading to very simple forecasts in this 

case. Equating coefficients of Bk in 

leads to 

(1 − φB)(1 + ψ1B + ψ2B 2 + . . .) = (1 − θ1B) 

ψk = (φ1 − θ1)φ k−1 

1 

for k = 1, 2, . . .. Using (2.26) leads to the expression 

P t t+h 

= σ 2 w 

for the forecast variance. 

 

1 + (φ1 − θ1) 2 h−1 k=1 φ2(k−1) 

 

1 

 

 

= σ2 w 1 + (φ1−θ1) 2 (1−φ 2(h−1) 

1 ) 

(1−φ2 1 )


In the first example of this chapter, it was noted that nonstationary processes 

are characterized by a slow decay in the ACF as in Figure 2.3. In many of the 

cases where slow decay is present, the use of a first order difference 

∆xt = xt − xt−1 

= (1 − B)xt 

will reduce the nonstationary process xt to a stationary series ∆xt. On can 

check to see whether the slow decay has been eliminated in the ACF of the 

transformed series. Higher order differences, ∆ d xt = ∆∆ d−1 xt are possible 

and we call the process obtained when the d th difference is an ARMA series 

an ARIMA(p, d, q) series where p is the order of the autoregressive component, 

d is the order of differencing needed and q is the order of the moving average 

component. Symbolically, the form is 

φ(B)∆ d xt = θ(B)wt 

(2.39) 

The principles of model selection for ARIMA(p, d, q) series are obtained using 

the extensions of (2.8)-(2.10) which replace K by K = p + q the total number 

of ARMA parameters. 

2.6 Seasonal ARIMA Models 

When the autoregressive, differencing, or seasonal moving average behavior 

seems to occur at multiples of some underlying period s, a seasonal ARIMA 

series may result. The seasonal nonstationarity is characterized by slow decay 

at multiples of s and can often be eliminated by a seasonal differencing operator 

of the form 

∇ D s xt = (1 − B s ) D xt. 

For example, when we have monthly data, it is reasonable that a yearly phenomenon 

will induce s = 12 and the ACF will be characterized by slowly 

decaying spikes at 12, 24, 36, 48, . . . and we can obtain a stationary series by 

transforming with the operator (1−B 12 )xt = xt −xt−12 which is the difference 

between the current month and the value one year or 12 months ago. 

If the autoregressive or moving average behavior is seasonal at period s, we 

define formally the operators 

and 

Φ(B s ) = 1 − Φ1(B s ) − Φ2(B 2s ) − . . . − ΦP (B P s ) (2.40) 

Θ(B s ) = 1 − Θ1(B s ) − Θ2(B 2s ) − . . . − ΘQ(B Qs ). (2.41) 

The final form of the ARIMA(p, d, q) × ARIMA(P, D, Q)s model is 

Φ(B s )φ(B)∆ s D∆ d xt = Θ(B s )θ(B)wt 

(2.42)

2.5 SARIMA Models 45 

We may also note the properties below corresponding to P2.1-P2.3 

Property P2.1’: ACF and PACF of a seasonally non-stationary time 

series 

The ACF of a seasonally non-stationary time series decays 

very slowly at lag multiples s, 2s, 3s, . . . with zeros in between, 

where s denotes a seasonal period ,usually 12. The PACF of a 

non-stationary time series tends to have a peak very near unity 

at lag s. 

Property P2.2’: PACF for Seasonal AR Series 

The partial autocorrelation function φhh as a function of lag 

h has nonzero values at s, 2s, 3s, . . . , P s, with zeros in between, 

and is zero for h > P s, the order of the seasonal autoregressive 

process. There should be some exponential decay. 

P2.3’: ACF for a Seasonal MA Series 

For a seasonal moving average series of order Q, note that the 

autocorrelation function (ACF) has nonzero values at s, 2s, 3s, . . . , Qs 

and is zero for h > Qs 

Example 2.9: 

We illustrate by fitting the monthly birth series from 1948-1979 shown in 

Figure 2.5. The period encompasses the boom that followed the Second 

World War and there is the expected rise which persists for about 13 

years followed by a decline to around 1974, The series appears to have 

long-term swings, with seasonal effects super-imposed. The long-term 

swings indicate possible non-stationarity and we verify that this is the 

case by checking the ACF and PACF shown in the top panel of Figure 

2.6. Note, that by Property 2.1, slow decay of the ACF indicates nonstationarity 

and we respond by taking a first difference. The results 

shown in the second panel of Figure 2.5 indicate that the first difference 

has eliminated the strong low frequency swing. The ACF, shown in the 

second panel from the top in Figure 2.6 shows peaks at 12, 24, 36, 48, 

..., with now decay. This behavior implies seasonal non-stationarity, by 

Property P2.1’ above, with s = 12. A seasonal difference of the first 

difference generates an ACF and PACF in Figure 2.6 that we expect for 

stationary series. 

Taking the seasonal difference of the first difference gives a series that 

looks stationary and has an ACF with peaks at 1 and 12 and a PACF with 

a substantial peak at 12 and lesser peaks at 12,24, .... This suggests trying 

either a first order moving average term, by Property P2.3, or a first order


400 

300 

200 

50 

0 

−50 

50 

0 

−50 

50 

0 

−50 

Births 

50 100 150 200 250 300 350 

1st diff. 

50 100 150 200 250 300 350 

ARIMA(0,1,0)X(0,1,0) 12 

50 100 150 200 250 300 350 

ARIMA(0,1,1)X(0,1,1) 12 

50 100 150 200 250 300 350 

month 

Figure 2.5 Number of live births 1948(1)-1979(1) and residuals from models with a 

first difference, a first difference and a seasonal difference of order 12 and a 

fitted ARIMA(0, 1, 1) × (0, 1, 1)12 model.


1 

0.5 

0 

−0.5 

0 20 40 60 

1 

0.5 

0 

−0.5 

0 20 40 60 

1 

0.5 

0 

−0.5 

0 20 40 60 

1 

0.5 

0 

−0.5 

0 20 40 60 

1 

0.5 

0 

ACF 

−0.5 

0 20 40 60 

lag lag 

1 

0.5 

0 

PACF 

data 

−0.5 

0 20 40 60 

1 

0.5 

0 

ARIMA(0,1,0) 

−0.5 

0 20 40 60 

1 

0.5 

0 

ARIMA(0,1,0)X(0,1,0) 12 

−0.5 

0 20 40 60 

1 

0.5 

0 

ARIMA(0,1,0)X(0,1,1) 12 

−0.5 

0 20 40 60 

1 

0.5 

0 

ARIMA(0,1,1)X(0,1,1) 12 

−0.5 

0 20 40 60 


(PACF) for the birth series (top two panels), the first difference (second 

two panels) an ARIMA(0, 1, 0 × (0, 1, 1)12 model (third two panels) and an 

ARIMA(0, 1, 1) × (0, 1, 1)12 model (last two panels. 

seasonal moving average term with s = 12, by Property P2.3’ above. We 

choose to eliminate the largest peak first by applying a first-order seasonal 

moving average model with s = 12. The ACF and PACF of the residual 

series from this model, i.e. from ARIMA(0, 1, 0) × (0, 1, 112, written as 

(1 − B)(1 − B 12 )xt = (1 − Θ1B 12 )wt,


Forecast 1979(2)−1982(1) 

450 

400 

350 

300 

250 

200 

lower 95% 

forecast 

upper 95% 

150 

370 375 380 385 390 

month 

395 400 405 410 

Figure 2.7 A 36 month forecast for the birth series with 95% uncertainty limits. 

is shown in the fourth panel from the top in Figure 2.6. We note that 

the peak at lag one is still there, with attending exponential decay in 

the PACF. This can be eliminated by fitting a first-order moving average 

term and we consider the model ARIMA(0, 1, 1) × (0, 1, 1)12, written as 

(1 − B)(1 − B 12 )xt = (1 − θ1B)(1 − Θ1B 12 )wt 

The ACF of the residuals from this model are relatively well behaved 

with a number of peaks either near or exceeding the 95% test of no 

correlation. Fitting this final ARIMA(0, 1, 1) × (0, 1, 1)12 model leads to 

the model 

(1 − B)(1 − B 12 )xt = (1 − .4896B)(1 − .6844B 12 )wt 

AICc = 4.95, R 2 = .9804 2 = .961, P − values = .000, .000 

R 2 is computed from saving the predicted values and then plotting against 

the observed values using the 2-D plot option. The format that ASTSA 

puts out these results is shown below. 

ARIMA(0,1,1)x(0,1,1)x12 from U.S. Births AICc = 4.94684 variance = 

51.1906 d.f. = 358 Start values = .1


predictor coef st. error t-ratio p-value 

MA(1) .4896 .04620 10.5966 .000 

SMA(1) .6844 .04013 17.0541 .000 

(D1) (D(12)1) x(t) = (1 -.49B1) (1 -.68B12) w(t) 

The ARIMA search in ASTSA leads to the model 

(1−.0578B 12 )(1−B)(1−B 12 )xt = (1−.4119B−.1515B 2 )(1−.8136B 12 )wt 

with AICc = 4.8526, somewhat lower than the previous model. The seasonal 

autoregressive coefficient is not statistically significant and should 

probably be omitted from the model. The new model becomes 

(1 − B)(1 − B 12 )xt = (1 − .4088B − .1645B 2 )(1 − .6990B 12 )wt, 

yielding AICc = 4.92 and R 2 = .981 2 = .962, slightly better than the 

ARIMA(0, 1, 1) × (0, 1, 1)12 model. Evaluating these latter models leads 

to the conclusion that the extra parameters do not add a practically 

substantial amount to the predictability. 

The model is expanded as 

so that 

or 

(1 − B)(1 − B 12 )xt = (1 − θ1B)(1 − Θ1B 12 )wt 

(1 − B − B 12 + B 13 )xt = (1 − θ1B − θ1B 12 + θ1Θ1B 13 )wt 

xt − xt−1 − xt−12 + xt−13 = wt − θ1wt−1 − Θ1wt−12 + θ1Θ1wt−13 

xt = xt−1 + xt−12 − xt−13 + wt − θ1wt−1 − Θ1wt−12 + θ1Θ1wt−13 

The forecast is 

x t t+1 = xt + xt−11 − xt−12 − θ1wt − Θ1wt−11 + θ1Θ1wt−12 

x t t+2 = x t t+1 + xt−10 − xt−11 − Θ1wt−10 + θ1Θ1wt−11 

Continuing in the same manner, we obtain 

x t t+12 = x t t+11 + xt − xt−1 − Θ1wt + θ1Θ1wt−1 

for the 12 month forecast.


The forecast limits are quite variable with a standard error that rises to 

20% of the mean by the end of the forecast period The plot shows that 

the general trend is upward, rising from about 250,000 to about 290,000 

births per year. One could check the actual records from the years 1979- 

1982. The direction is not certain because of the large uncertainty. One 

could compute the probability 

 

250 − 290 

P (Bt+47 ≤ 250, 000) = Φ 

= .25, 

60 

so there is a 75% chance of increase. 

A website where the forecasts can be compared on a yearly basis is 

http://www.cdc.gov/nccdphp/drh/pdf/nvs/nvs48 tb1.pdf 

Example 2.10: 

Figure 2.8 shows the autocorrelation function of the log-transformed J&J 

earnings series that is plotted in Figure 1.4 and we note the slow decay 

indicating the nonstationarity which has already been obvious in the 

Chapter 1 discussion. We may also compare the ACF with that of a 

random walk, shown in Figure 3.2, and note the close similarity. The 

partial autocorrelation function is very high at lag one which, under ordinary 

circumstances, would indicate a first order autoregressive AR(1,0) 

model, except that, in this case, the value is close to unity, indicating a 

root close to 1 on the unit circle. The only question would be whether 

differencing or detrending is the better transformation to stationarity. 

Following in the Box-Jenkins tradition, differencing leads to the ACF 

and PACF shown in the second panel and no simple structure is apparent. 

To force a next step, we interpret the peaks at 4, 8, 12, 16, . . . as 

contributing to a possible seasonal autoregressive term, leading to a possible 

ARIMA(0, 1, 0)×(1, 0, 0)4 and we simply fit this model and look at 

the ACF and PACF of the residuals, shown in the third two panels. The 

fit improves somewhat, with significant peaks still remaining at lag 1 in 

both the ACF and PACF. The peak in the ACF seems more isolated and 

there remains some exponentially decaying behavior in the PACF, so we 

try a model with a first-order moving average. The bottom two panels 

show the ACF and PACF of the resulting ARIMA(0, 1, 1) × (1, 0, 0)4 

and we note only relatively minor excursions above and below the 95% 

intervals under the assumption that the theoretical ACF is white noise. 

The final model suggested is (yt = log x2) 

(1 − Φ1B 4 )(1 − B)yt = (1 − θ1B)wt, 

where ˆ Φ1 = .820(.058), ˆ θ1 = .508(.098) and ˆσ 2 w = .0086. The model can 

be written in forecast form as 

yt = yt−1 + Φ1(yt−4 − yt−5) + wt − θ1wt−1.

2.6 Correlated Regression 51 

To forecast the original series for, say 4 quarters, we compute the forecast 

limits for yt = log xt and then exponentiate, i.e. 

x t t+h = exp{y t t+h} 

We note the large limits on the forecast values in Figure 2.9 and mention 

that the situation can be improved by the regression approach in the 

next section 

2.7 Regression Models With Correlated Errors 

The standard method for dealing with correlated errors et in the in the regression 

model 

yt = β ′ zt + et 

(2.2) ′ 

is to try to transform the errors et into uncorrelated ones and then apply the 

standard least squares approach to the transformed observations. For example, 

let P be an n × n matrix that transforms the vector e = (e1, . . . , en) ′ into a 

set of independent identically distributed variables with variance σ 2 . Then, 

transform the matrix version (2.4) to 

P y = P Zβ + P e 

and proceed as before. Of course, the major problem is deciding on what to 

choose for P but in the time series case, happily, there is a reasonable solution, 

based again on time series ARMA models. Suppose that we can find, for 

example, a reasonable ARMA model for the residuals, say, for example the 

ARMA(p,0,0) model 

p 

et = φket−k + wt, 

k=1 

which defines a linear transformation of the correlated et to a sequence of 

uncorrelated wt. We can ignore the problems near the beginning of the series 

by starting at t = p. In the ARMA notation, using the backshift operator B, 

we may write 

φ(B)et = wt, (2.43) 

where 

φ(B) = 1 − 

p 

φkB k , (2.44) 

and applying the operator to both sides of (2.2) leads to the model 

k=1 

φ(B)yt = β ′ φ(B)zt + wt, (2.45)


1 

0.5 

0 

log(J&J) 

−0.5 

0 

1 

10 20 30 

0.5 

4 8 

12 diff 

0 

−0.5 

0 

1 

10 20 30 

0.5 

0 

−0.5 

0 

1 

1 

10 20 30 

0.5 

0 

ACF 

ARIMA(0,1,0)X(1,0,0) 4 

ARIMA(0,1,1)X(1,0,0) 4 

−0.5 

0 10 20 30 

lag h 

1 

0.5 

0 

1 

PACF 

−0.5 

0 

1 

10 20 30 

0.5 

0 

−0.5 

0 

1 

10 20 30 

0.5 

0 

1 

−0.5 

0 

1 

10 20 30 

0.5 

0 

−0.5 

0 10 20 30 


(PACF) for the log J&J earnings series (top two panels), the first difference 

(second two panels) and two sets of ARIMA residuals. 

lag h

earnings 


30 

25 

20 

15 

10 

5 

− observed 

−− predicted 

forecasts 

0 

0 10 20 30 40 50 60 70 80 90 

quarter 

Figure 2.9 Observed and predicted values for the Johnson and Johnson Earnings Series 

with forecast values for the next four quarters, using the ARIMA(0, 1, 1)× 

(1, 0, 0)4 model for the log-transformed data. 

where the wt now satisfy the independence assumption. Doing ordinary least 

squares on the transformed model is the same as doing weighted least squares 

on the untransformed model. The only problem is that we do not know the 

values of the coefficients φk, k = 1, . . . , p in the transformation (2.42). However, 

if we knew the residuals et, it would be easy to estimate the coefficients, since 

(2.42) can be written in the form 

et = φ ′ et−1 + wt, (2.46) 

which is exactly the usual regression model (2.2) with φ ′ = (φ1, . . . , φp) replacing 

β and e ′ t−1 = (et−1, et−2, . . . , et−p) replacing zt. 

The above comments suggest a general approach known as the Cochran- 

Orcutt procedure (Cochrane and Orcutt, 1949) for dealing with the problem 

of correlated errors in the time series context. 

1. Begin by fitting the original regression model (2.2) by least squares, obtaining 

ˆ β and the residuals êt = yt − ˆ β ′ 

zt 

2. Fit an ARMA to the estimated residuals, say 

φ(B)êt = θ(B)wt


1 

0.5 

0 

3. Apply the ARMA transformation found to both sides of the regression 

equation (2.2)’ to obtain 

φ(B) 

θ(B) yt = β ′ φ(B) 

θ(B) zt + wt 

4. Run an ordinary least squares on the transformed values to obtain the 

new ˆ β. 

5. Return to 2. if desired. Often, one iteration is enough to develop the estimators 

under a reasonable correlation structure. In general, the Cochran- 

Orcutt procedure converges to the maximum likelihood or weighted least 

squares estimators. 

4,8,12,16 

detrended 

−0.5 

0 10 20 30 

1 

0.5 

0 

ACF 

ARIMA(1,0,0) 4 

−0.5 

0 10 20 30 

1 

0.5 

0 

4 

PACF 

−0.5 

0 10 20 30 

1 

0.5 

0 

−0.5 

0 10 20 30 

Figure 2.10 Autocorrelation functions (ACF) and partial autocorrelation functions 

(PACF) for the detrended log J&J earnings series (top two panels)and the 

fitted ARIMA(00, 0, 0) × (1, 0, 0)4 residuals.

earnings 


30 

25 

20 

15 

10 

5 

− observed 

−− predicted 

forecasts 

0 

0 10 20 30 40 50 60 70 80 90 

quarter 

Figure 2.8 Observed and predicted values for the Johnson and Johnson Earnings Series 

with forecast values for the next four quarters, using the correlated 

regression model for the log-transformed data. 

Example 2.11: 

We might consider an alternative approach to treating the Johnson and 

Johnson Earnings Series, assuming that 

yt = log xt = β1 + β2t + et 

In order to analyze the data with this approach, first we fit the model 

above, obtaining ˆ β1 = −.6678(.0349) and ˆ β2 = .0417(.0071). The computed 

residuals êt = yt − ˆ β1 − ˆ β2 t can be computed easily, the ACF and 

PACF are shown in the top two panels of Figure 2.7. Note that the ACF 

and PACF suggest that a seasonal AR series will fit well and we show 

the ACF and PACF of these residuals in the bottom panels of Figure 

2.7. The seasonal AR model is of the form 

et = Φ1et−4 + wt 

and we obtain ˆ Φ1 = .7614(.0639), with ˆσ 2 w = .00779. Using these values, 

we transform yt to 

yt − ˆ Φ1yt−4 = β1(1 − ˆ Φ1) + β2[t − ˆ Φ1(t − 4)] + wt


using the estimated value ˆ Φ1 = .7614. With this transformed regression, 

we obtain the new estimators ˆ β1 = −.7488(.1105) and ˆ β2 = .0424(.0018). 

The new estimator has the advantage of being unbiased and having a 

smaller generalized variance. 

To forecast, we consider the original model, with the newly estimated ˆ β1 

and ˆ β2. We obtain the approximate forecast for 

y t t+h = ˆ β1 + ˆ β2(t + h) + ê t t+h 

for the log transformed series, along with upper and lower limits depending 

on the estimated variance that only incorporates the prediction 

variance of et t+h , considering the trend and seasonal autoregressive parameters 

as fixed. The narrower upper and lower limits shown in Figure 

2.8 are mainly a reflection of a slightly better fit to the residuals and the 

ability of the trend model to take care of the nonstationarity. 

2.8 Chapter 2 Problems 

2.1 Consider the regression model 

yt = β1yt−1 + et 

where et is white noise with zero-mean and variance σ 2 e. Assume that we 

observe y1, y2, . . . , yn and consider the model above for t = 2, 3, . . . , n. 

Show that the least squares estimator of β1 is 

ˆβ1 = 

n t=2 ytyt−1 

n t=2 y2 t−1 

If we pretend that yt−1 are fixed, show that 

var{ ˆ β1} = 

. 

σ 2 e 

n 

t=2 y2 t−1 

Relate your answer to a method for fitting a first-order AR model to the 

data yt. 

2.2 Consider the autoregressive model (2.13) for p = 1, i.e. 

xt − φ1xt−1 = wt 

(a) show that the necessary condition below (2.15) implies that |φ1| < 1.

Chapter 2 Problems 57 

(b) Show that 

xt = 

∞ 

k=0 

is the form of (2.16) in this case. 

φ k 1wt−k 

(c) Show that E[wtxt] = σ 2 w and E[wtxt−1] = 0, so that future errors 

are uncorrelated with past data. 

2.3 The autocovariance and autocorrelation functions for AR processes are 

often derived from the Yule-Walker equations, obtained by multiplying 

both sides of the defining equation, successively by xt, xt−1, xt−2, . . ., 

using the result (2.16). 

(a) Derive the Yule-Walker equations 

⎧ 

⎨ σ 

γx(h) − φ1γx(h − 1) = 

⎩ 

2 w, h = 0 

0, h > 0. 

(b) Use the Yule-Walker equations to show that 

for the first-order AR. 

ρx(h) = φ |h| 

1 

2.4 For an ARMA series we define the optimal forecast based on xt, xt−1, . . . 

as the conditional expectation 

for h = 1, 2, 3, . . .. 

x t t+h = E[xt+h|xt, xt−1, . . .] 

(a) Show, for the general ARMA model that 

⎧ 

⎨ 0, h > 0 

E[wt+h|xt, xt−1, . . .] = 

⎩ 

wt+h, h ≤ 0 

(b) For the first-order AR model, show that the optimal forecast is 

⎧ 

⎨ φ1xt, h = 1 

x t t+h = 

⎩ 

φ1xt t+h−1 , h > 1 

(c) Show that E[(x t t+1 − xt+1) 2 ] = σ 2 w is the prediction error variance 

of the one-step forecast.


150 

100 

50 

100 

80 

60 

40 

100 

50 

0 

2.5 Suppose we have the simple linear trend model 

t = 1, 2, . . . , n, where 

yt = β1t + xt, 

xt = φ1xt−1 + wt. 

Give the exact form of the equations that you would use for estimating 

β1, φ1 and σ 2 w using the Cochran-Orcutt procedure of Section 2.7. 

LA Cardiovascular Mortality 

50 100 150 200 250 300 350 400 450 500 

Temperature 

50 100 150 200 250 300 350 400 450 500 

Particulate Level 

50 100 150 200 250 300 350 400 450 500 

Figure 2.9 Los Angeles Mortality, Temperature and Particulates (6-day increment). 

2.6 Consider the file la regr.dat, in the syllabus, which contains cardiovascular 

mortality, temperature values and particulate levels over 6-day 

periods from Los Angeles County (1970-1979). The file also contains two 

dummy variables for regression purposes, a column of ones for the constant 

term and a time index. The order is as follows: Column 1: 508 

cardiovascular mortality values (6-day averages), Column 2: 508 ones, 

Column 3: the integers 1, 2, . . . , 508, Column 3: Temperature in degrees 

F and Column 4: Particulate levels. A reference is Shumway et al (1988). 

The point here is to examine possible relations between the temperature 

and mortality in the presence of a time trend in cardiovascular mortality. 

(a) Use scatter diagrams to argue that particulate level may be linearly 

related to mortality and that temperature has either a linear 

or quadratic relation. Check for lagged relations using the cross 

correlation function.

Chapter 2 Problems 59 

(b) Adjust temperature for its mean value, using the Scale option and 

fit the model 

Mt = β0 + β1(Tt − ¯ T ) + β2(Tt − ¯ T ) 2 + β3Pt + et, 

where Mt, Tt and Pt denote the mortality, temperature and particulate 

pollution series. You can use as inputs Columns 2 and 3 for the 

trend terms and run the regression analysis without the constant 

option. Note that you need to transform temperature first. Retain 

the residuals for the next part of the problem. 

(c) Plot the residuals and compute the autocorrelation (ACF) and partial 

autocorrelation (PACF) functions. Do the residuals appear to 

be white? Suggest an ARIMA model for the residuals and fit the 

residuals. The simple ARIMA(2, 0, 0) model is a good compromise. 

(d) Apply the ARIMA model obtained in part (c) to all of the input 

variables and to cardiovascular mortality using the ARIMA transformation 

option. Retain the forecast values for the transformed 

mortality, say ˆmt = Mt − ˆ φ1Mt−1 − ˆ φ2Mt−2. 

2.7 Generate 10 realizations of a (n = 200 points each) series from an 

ARIMA(1,0,1) Model with φ1 = .90, θ1 = .20 and σ 2 = .25. Fit the 

ARIMA model to each of the series and compare the estimators to the 

true values by computing the average of the estimators and their standard 

deviations. 

2.8 Consider the bivariate time series record containing monthly U.S. Production 

as measured monthly by the Federal Reserve Board Production 

Index and unemployment as given in the file frb.asd. The file contains 

n = 372 monthly values for each series. Before you begin, be sure to plot 

the series. Fit a seasonal ARIMA model of your choice to the Federal 

Reserve Production Index. Develop a 12 month forecast using the model. 

2.9 The file labeled clim-hyd.asd has 454 months of measured values for 

the climatic variables Air Temperature, Dew Point, Cloud Cover, Wind 

Speed, Preciptation, and Inflow at Shasta Lake. We would like to look at 

possible relations between the weather factors and between the weather 

factors and the inflow to Shasta Lake. 

(a) Fit the ARIMA(0, 0, 0) × (0, 1, 1)12 model to transformed precipitation 

Pt = √ pt and transformed flow it = log it. Save the residuals 

for transformed precipitation for use in part (b). 

(b) Apply the ARIMA model fitted in part (a) for transformed precipitation 

to the flow series. Compute the cross correlation between 

the flow residuals using the precipitation ARIMA model and the 

precipitation residuals using the precipitation model and interpret.


200 

150 

100 

50 

0 

1000 

800 

600 

400 

200 

0 

Federal Reserve Board Production Index 

50 100 150 200 250 300 350 

Monthly Unemployment 

50 100 150 200 

month 

250 300 350 

Figure 2.10 Federal Reserve Board Production and Unemployment for Problem 2.7. 

Use the coefficients from the ARIMA model in the transform option 

in the main menu to construct the transformed flow residuals. Suggest 

two possible models for relating the two series. More analysis 

can be done using the transfer function models of Chapter 4.

R Notes 61 

2.9 Chapter 2 R Notes 

The function arima() is used to do ARIMA fits in R. If you want to fit an 

ARIMA(p, d, q) model to the time series x, the command would be 

>model = arima(x,order=c(p,d,q)) 

To include a seasonal (P, D, Q)S component, use 

>model = 

arima(x,order=c(p,d,q),seasonal=list(order=c(P,D,Q),season=S)) 

A call of 

>model 

will provide a decent summary of the ARIMA fit. 

>model$residuals 

will contain the residuals to your arima fit. 

>model$loglik 

will give the log-likelihood of the fit. 

>model$aic 

will give the Akaike’s Information Criteria for the fit (recall that this is useful 

in model selection). 

To get AICc, use the following code (we assume the object model holds the 

fit of your model, K is the number of parameters you are fitting, and N is the 

length of your series): 

>AICc = log(model$sigma2)+(n+K)/(n-K-2) 

One final note that may be of use: to predict (say 5) future observations 

given a fit, use 

>future = predict(model,n.ahead=5)


2.10 Chapter 2 ASTSA Notes 

8. Regression Analysis 

Time domain →Multiple Regression 

Model (without constant): 

Model (with constant): 

Series(dependent):yt 

yt = β1zt1 + β2zt2 + . . . + βqztq + et 

yt = β0 + β1zt1 + β2zt2 + . . . + βqztq + et 

No. of independent series: q 

series 1: zt1−h1 

lag: h1 Often is zero 

· · · 

series q: ztq−hq 

lag: hq Often is zero 

forecasts: 0 

constant(y/n): 

selector(AIC,AICc, BIC, FPEL, AICL): AICc 

Save →Residuals 

Save →Predicted 

9. Fit ARIMA(p, d, q) × (P, D, Q)s 

Time Domain →ARIMA 

Series: 

p: AR order 

d: Difference 

q: MA order 

P: SAR order 

D: Seasonal Difference

ASTSA Notes 63 

Q: SMA order 

season: s 

forecasts: h 

use .1 guess(y/n): y 

selector(AIC,AICc, BIC, FPEL, AICL): AICc 

Save →Residuals 

Save →Predicted 

10. ARIMA Transformation 

Transform →Transform →ARIMA Residual 

Series: 

p: AR order 

d: Difference 

q: MA order 

P: SAR order 

D: Seasonal Difference 

Q: SMA order 

season: s

CHAPTER 2 Univariate Time Series Models 2.1 Least ... - Statistics

Create successful ePaper yourself

Delete template?

Save as template?