CHAPTER 2 Univariate Time Series Models 2.1 Least ... - Statistics
CHAPTER 2 Univariate Time Series Models 2.1 Least ... - Statistics
CHAPTER 2 Univariate Time Series Models 2.1 Least ... - Statistics
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>CHAPTER</strong> 2<br />
<strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
<strong>2.1</strong> <strong>Least</strong> Squares Regression<br />
We begin our discussion of univariate and multivariate time series methods by<br />
considering the idea of a simple regression model, which we have met before in<br />
other contexts. All of the multivariate methods follow, in some sense, from the<br />
ideas involved in simple univariate linear regression. In this case, we assume<br />
that there is some collection of fixed known functions of time, say zt1, zt2, . . . ztq<br />
that are influencing our output yt which we know to be random. We express<br />
this relation between the inputs and outputs as<br />
yt = β1zt1 + β2zt2 + · · · + βqztq + et<br />
(<strong>2.1</strong>)<br />
at the time points t = 1, 2, . . . , n, where β1, . . . , βq are unknown fixed regression<br />
coefficients and et is a random error or noise, assumed to be white noise;<br />
this means that the observations have zero means, equal variances σ 2 and are<br />
independent. We traditionally assume also that the white noise series, et, is<br />
Gaussian or normally distributed.<br />
Example <strong>2.1</strong>:<br />
We have assumed implicitly that the model<br />
yt = β1 + β2t + et<br />
is reasonable in our discussion of detrending in Chapter 1. This is in<br />
the form of the regression model (<strong>2.1</strong>) when one makes the identification<br />
zt1 = 1, zt2 = t. The problem in detrending is to estimate the coefficients<br />
β1 and β2 in the above equation and detrend by constructing the<br />
estimated residual series et. We discuss the precise way in which this is<br />
accomplished below.<br />
The linear regresssion model described by Equation (<strong>2.1</strong>) can be conveniently<br />
written in slightly more general matrix notation by defining the column
<strong>2.1</strong>: <strong>Least</strong> Squares Regression 27<br />
vectors zt = (zt1, . . . , ztq) ′ and β = (β1, . . . , βq) ′ so that we write (<strong>2.1</strong>) in the<br />
alternate form<br />
yt = β ′ zt + et. (2.2)<br />
To find estimators for β and σ 2 it is natural to determine the coefficient vector<br />
β minimizing e 2 t with respect to β. This yields least squares or maximum<br />
likelihood estimator ˆ β and the maximum likelihood estimator for σ 2 which is<br />
proportional to the unbiased<br />
ˆσ 2 n−1<br />
1 <br />
=<br />
(yt −<br />
(n − q)<br />
ˆ β ′<br />
zt) 2<br />
t=0<br />
An alternate way of writing the model (2.2) is as<br />
(2.3)<br />
y = Zβ + e (2.4)<br />
where Z ′ = (z1, z2, . . . , zn) is a q×n matrix composed of the values of the input<br />
variables at the observed time points and y ′ = (y1, y2, . . . , yn) is the vector of<br />
observed outputs with the errors stacked in the vector e ′ = (e1, e2, . . . , en)<br />
.The ordinary least squares estimators ˆ β are the solutions to the normal<br />
equations<br />
Z ′ Z ˆ β = Z ′ y,<br />
You need not be concerned as to how the above equation is solved in practice<br />
as all computer packages have efficient software for inverting the q × q matrix<br />
Z ′ Z to obtain<br />
ˆβ = (Z ′ Z) −1 Z ′ y. (2.5)<br />
An important quantity that all software produces is a measure of uncertainty<br />
for the estimated regression coefficients, say<br />
cov{ ˆ ˆ β} = ˆσ 2 (Z ′ Z) −1 . (2.6)<br />
If cij denotes an element of C = (Z ′ Z) −1 , then cov( ˆ βi, ˆ βj) = σ 2 cij and a<br />
100(1 − α)% confidence interval for βi is<br />
ˆβi ± tn−q(α/2)ˆσ √ cii, (2.7)<br />
where tdf (α/2) denotes the upper 100(1 − α)% point on a t distribution with<br />
df degrees of freedom.<br />
Example 2.2:<br />
Consider estimating the possible global warming trend alluded to in Section<br />
1.1.2. The global temperature series, shown previously in Figure<br />
1.3 suggests the possibility of a gradually increasing average temperature<br />
over the 123 year period covered by the land-based series. If we<br />
fit the model in Example <strong>2.1</strong>, replacing t by t/100 to convert to a 100
28 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
year base so that the increase will be in degrees per 100 years, we obtain<br />
ˆβ1 = 38.72, ˆ β2 = .9501 using (2.5). The error variance, from (2.3), is<br />
.0752, with q = 2 and n = 123. Then (2.6) yields<br />
cov( ˆ ˆ β1, ˆ β2) =<br />
1.8272 −.0941<br />
−.0941 .0048<br />
leading to an estimated standard error of √ .0048 = .0696. The value of t<br />
with n−q = 123−2 = 121 degrees of freedom for α = .025 is about 1.98,<br />
leading to a narrow confidence interval of .95 ± .138 for the slope leading<br />
to a confidence interval on the one hundred year increase of about .81<br />
to 1.09 degrees. We would conclude from this analysis that there is a<br />
substantial increase in global temperature amounting to an increase of<br />
roughly one degree F per 100 years.<br />
1<br />
0.5<br />
0<br />
Detrended Temperature<br />
ACF = γ x (h)<br />
−0.5<br />
0 5 10<br />
lag<br />
15 20<br />
1<br />
0.5<br />
0<br />
Differenced Temperature<br />
ACF = γ x (h)<br />
−0.5<br />
0 5 10<br />
lag<br />
15 20<br />
1<br />
0.5<br />
0<br />
<br />
,<br />
PACF = Φ hh<br />
−0.5<br />
0 5 10<br />
lag<br />
15 20<br />
1<br />
0.5<br />
0<br />
PACF = Φ hh<br />
−0.5<br />
0 5 10<br />
lag<br />
15 20<br />
Figure <strong>2.1</strong> Autocorrelation functions (ACF) and partial autocorrelation functions<br />
(PACF) for the detrended (top panel) and differenced (bottom panel) global<br />
temperature series.<br />
If the model is reasonable, the residuals êt = yt − ˆ β1 − ˆ β2 t should be<br />
essentially independent and identically distributed with no correlation evident.<br />
The plot that we have made in Figure 1.3 of the detrended global temperature<br />
series shows that this is probably not the case because of the long low frequency
<strong>2.1</strong>: <strong>Least</strong> Squares Regression 29<br />
in the observed residuals. However, the differenced series, also shown in Figure<br />
1.3 (second panel), appears to be more independent suggesting that perhaps<br />
the apparent global warming is more consistent with a long term swing in<br />
an underlying random walk than it is of a fixed 100 year trend. If we check<br />
the autocorrelation function of the regression residuals, shown here in Figure<br />
<strong>2.1</strong>, it is clear that the significant values at higher lags imply that there is<br />
significant correlation in the residuals. Such correlation can be important<br />
since the estimated standard errors of the coefficients under the assumption<br />
that the least squares residuals are uncorrelated is often too small. We can<br />
partially repair the damage caused by the correlated residuals by looking at a<br />
model with correlated errors. The procedure and techniques for dealing with<br />
correlated errors are based on the Autoregressive Moving Average (ARMA)<br />
models to be considered in the next sections. Another method of reducing<br />
correlation is to apply a first difference ∆xt = xt − xt−1 to the global trend<br />
data. The ACF of the differenced series, also shown in Figure <strong>2.1</strong>, seems to<br />
have lower correlations at the higher lags. Figure 1.3 shows qualitatively that<br />
this transformation also eliminates the trend in the original series.<br />
Since we have again made some rather arbitrary looking specifications for<br />
the configuration of dependent variables in the above regression examples, the<br />
reader may wonder how to select among various plausible models. We mention<br />
that two criteria which reward reducing the squared error and penalize for<br />
additional parameters are the Akaike Information Criterion<br />
AIC(K) = log ˆσ 2 + 2K<br />
n<br />
and the Schwarz Information Criterion<br />
SIC(K) = log ˆσ 2 +<br />
(2.8)<br />
K log n<br />
, (2.9)<br />
n<br />
(Schwarz, 1978) where K is the number of parameters fitted (exclusive of variance<br />
parameters) and ˆσ 2 is the maximum likelihood estimator for the variance.<br />
This is sometimes termed the Bayesian Information Criterion, BIC and will<br />
often yield models with fewer parameters than the other selection methods. A<br />
modification to AIC(K) that is particularly well suited for small samples was<br />
suggested by Hurvich and Tsai (1989). This is the corrected AIC, given by<br />
AICC(K) = log ˆσ 2 +<br />
n + K<br />
n − K − 2<br />
(<strong>2.1</strong>0)<br />
The rule for all three measures above is to choose the value of K leading to the<br />
smallest value of AIC(K) or SIC(K) or AICC(K). We will give an example<br />
later comparing the above simple least squares model with a model where the<br />
errors have a time series correlation structure.<br />
The organization of this chapter is patterned after the landmark approach<br />
to developing models for time series data pioneered by Box and Jenkins (see
30 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
Box et al, 1994). This assumes that there will be a representation of time<br />
series data in terms of a difference equation that relates the current value<br />
to its past. Such models should be flexible enough to include non-stationary<br />
realizations like the random walk given above and seasonal behavior, where<br />
the current value is related to past values at multiples of an underlying season;<br />
a common one might be multiples of 12 months (1 year) for monthly data.<br />
The models are constructed from difference equations driven by random input<br />
shocks and are labeled in the most general formulation as ARIMA , i.e.,<br />
AutoRegressive Integrated Moving Average processes. The analogies<br />
with differential equations, which model many physical processes, are obvious.<br />
For clarity, we develop the separate components of the model sequentially,<br />
considering the integrated, autoregressive and moving average in order, followed<br />
by the seasonal modification. The Box-Jenkins approach suggests three<br />
steps in a procedure that they summarize as l identification, estimation<br />
and forecasting. Identification uses model selection techniques, combining<br />
the ACF and PACF as diagnostics with the versions of AIC given above to<br />
find a parsimonious (simple) model for the data. Estimation of parameters in<br />
the model will be the next step. Statistical techniques based on maximum likelihood<br />
and least squares are paramount for this stage and will only be sketched<br />
in this course. Finally, forecasting of time series based on the estimated parameters,<br />
with sensible estimates of uncertainty, is the bottom line, for any<br />
assumed model.<br />
2.2 Integrated (I) <strong>Models</strong><br />
We begin our study of time correlation by mentioning a simple model that will<br />
introduce strong correlations over time. This is the random walk model which<br />
defines the current value of the time series as just the immediately preceding<br />
value with additive noise. The model forms the basis, for example, of the<br />
random walk theory of stock price behavior. In this model we define<br />
xt = xt−1 + wt, (<strong>2.1</strong>1)<br />
where wt is a white noise series with mean zero and variance σ 2 . Figure 2.2<br />
shows a typical realization of such a series and we observe that it bears a<br />
passing resemblance to the global temperature series. Appealing to (<strong>2.1</strong>1),<br />
the best prediction of the current value would be expected to be given by its<br />
immediately preceding value. The model is, in a sense, unsatisfactory, because<br />
one would think that better results would be possible by a more efficient use<br />
of the past.<br />
The ACF of the original series, shown in Figure 2.3, exhibits a slow decay<br />
as lags increase. In order to model such a series without knowing that it is<br />
necessarily generated by (<strong>2.1</strong>1), one might try looking at a first difference and<br />
comparing the result to a white noise or completely independent process. It is
2.2 I <strong>Models</strong> 31<br />
5<br />
0<br />
−5<br />
−10<br />
Random walk: x t =x t−1 +w t<br />
−15<br />
0 20 40 60 80 100 120 140 160 180 200<br />
2<br />
1<br />
0<br />
−1<br />
−2<br />
3 First Difference: x t −x t−1<br />
−3<br />
0 20 40 60 80 100 120 140 160 180 200<br />
Figure 2.2 A typical realization of the random walk series (top panel and the first<br />
difference of the series (bottom panel)<br />
clear from (<strong>2.1</strong>1) that the first difference would be ∆xt = xt −xt−1 = wt which<br />
is just white noise. The ACF of the differenced process, in this case, would be<br />
expected to be zero at all lags h = 0 and the sample ACF should reflect this<br />
behavior. The first difference of the random walk in Figure 2.2 is also shown<br />
in Figure 2.3 and we note that it appears to be much more random. The ACF,<br />
shown in Figure 2.3, reflects this predicted behavior, with no significant values<br />
for lags other than zero. It is clear that (<strong>2.1</strong>1) is a reasonable model for this<br />
data. The original series is nonstationary, with an autocorrelation function<br />
that depends on time of the form<br />
⎧ <br />
⎪⎨<br />
t<br />
t+h , h ≥ 0<br />
ρ(xt+h, xt) =<br />
⎪⎩<br />
<br />
t+h<br />
t , h < 0
32 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
1<br />
0.5<br />
0<br />
Random Walk<br />
ACF = γ x (h)<br />
−0.5<br />
0 5 10<br />
lag<br />
15 20<br />
1<br />
0.5<br />
0<br />
First Difference<br />
ACF = γ x (h)<br />
−0.5<br />
0 5 10<br />
lag<br />
15 20<br />
1<br />
0.5<br />
0<br />
PACF = Φ hh<br />
−0.5<br />
0 5 10<br />
lag<br />
15 20<br />
1<br />
0.5<br />
0<br />
PACF = Φ hh<br />
−0.5<br />
0 5 10<br />
lag<br />
15 20<br />
Figure 2.3 Autocorrelation functions (ACF) and partial autocorrelation functions<br />
(PACF) for the random walk (top panel) and the first difference (bottom<br />
panel) series.<br />
The above example, using a difference transformation to make a random<br />
walk stationary, shows a very particular case of the model identification procedure<br />
advocated by Box et al (1994). Namely, we seek a linearly filtered<br />
transformation of the original series, based strictly on the past values, that<br />
will reduce it to completely random white noise. This gives a model that<br />
enables prediction to be done with a residual noise that satisfies the usual<br />
statistical assumptions about model error.<br />
We will introduce, in the following discussion, more general versions of<br />
this simple model that are useful for modeling and forecasting series with<br />
observations that are correlated in time. The notation and terminology were<br />
introduced in the landmark work by Box and Jenkins (1970) (see Box et al,<br />
1994). A requirement for the ARMA model of Box and Jenkins is that the<br />
underlying process be stationary. Clearly the first difference of the random<br />
walk is stationary but the ACF of the first difference shows relatively little<br />
dependence on the past, meaning that the differenced process is not predictable<br />
in terms of its past behavior.<br />
To introduce a notation that has advantages for treating more general models,<br />
define the backshift operator B as the result of shifting the series back<br />
by one time unit, i.e.<br />
Bxt = xt−1, (<strong>2.1</strong>2)
2.2 AR <strong>Models</strong> 33<br />
and applying successively higher powers, B k xt = xt−k. The operator has many<br />
of the usual algebraic properties and allows, for example, writing the random<br />
walk model (<strong>2.1</strong>1) as<br />
(1 − B)xt = wt.<br />
Note that the difference operator discussed previously in 1.2.2 is just ∇ = 1−B.<br />
Identifying nonstationarity is an important first step in the Box-Jenkins<br />
procedure. From the above discussion, we note that the ACF of a nonstationary<br />
process will tend to decay rather slowly as a function of lag h. For example,<br />
a straightly line would be perfectly correlated, regardless of lag. Based on<br />
this observation, we mention the following properties that aid in identifying<br />
non-stationarity.<br />
Property P<strong>2.1</strong>: ACF and PACF of a non-stationary time series<br />
The ACF of a non-stationary time series decays very slowly as<br />
a function of lag h. The PACF of a non-stationary time series<br />
tends to have a peak very near unity at lag 1, with other values<br />
less than the significance level.<br />
2.3 Autoregressive (AR) <strong>Models</strong><br />
Now, extending the notions above to more general linear combinations of past<br />
values might suggest writing<br />
xt = φ1xt−1 + φ2xt−2 + . . . φpxt−p + wt<br />
(<strong>2.1</strong>3)<br />
as a function of p past values and an additive noise component wt. The model<br />
given by (<strong>2.1</strong>2) is called an autoregressive model of order p, since it is assumed<br />
that one needs p past values to predict xt. The coefficients φ1, φ2, . . . , φp<br />
are autoregressive coefficients, chosen to produce a good fit between the observed<br />
xt and its prediction based on xt−1, xt−2, . . . , xt−p. It is convenient to<br />
rewrite (<strong>2.1</strong>3), using the backshift operator, as<br />
where<br />
φ(B)xt = wt, (<strong>2.1</strong>4)<br />
φ(B) = 1 − φ1B − φ2B 2 − . . . − φpB p<br />
(<strong>2.1</strong>5)<br />
is a polynomial with roots (solutions of φ(B) = 0) outside the unit circle<br />
(|Bk| > 1). The restrictions are necessary for expressing the solution xt of<br />
(<strong>2.1</strong>4) in terms of present and past values of wt. That solution has the form<br />
xt = ψ(B)wt<br />
(<strong>2.1</strong>6)
34 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
where<br />
ψ(B) =<br />
∞<br />
ψkB k , (<strong>2.1</strong>7)<br />
k=0<br />
is an infinite polynomial (ψ0 = 1), with coefficients determined by equating<br />
coefficients of B in<br />
ψ(B)φ(B) = 1. (<strong>2.1</strong>8)<br />
Equation (<strong>2.1</strong>6) can be obtained formally by noting that choosing ψ(B) satisfying<br />
(<strong>2.1</strong>8), and multiplying both sides of (<strong>2.1</strong>6) by ψ(B) gives the representation<br />
(<strong>2.1</strong>6). It is clear that the random walk has B1 = 1, which does not<br />
satisfy the restriction and the process is nonstationary.<br />
Example 2.2<br />
Suppose that we have an autoregressive model (<strong>2.1</strong>3) with p = 1, i.e.,<br />
xt − φ1xt−1 = (1 − φ1B)xt = wt. Then (<strong>2.1</strong>8) becomes<br />
(1 + ψ1B + ψ2B 2 + . . .)(1 − φ1B) = 1<br />
Equating coefficients of B implies that ψ1 − φ1 = 0 or ψ1 = φ1. For B 2 ,<br />
we would get ψ2 − φ1ψ1 = 0, or ψ2 = φ 2 1. Continuing, we obtain ψk = φ k 1<br />
and the representation is<br />
and we have<br />
ψ(B) = 1 +<br />
xt =<br />
∞<br />
k=0<br />
∞<br />
φ k 1B k<br />
k=1<br />
φ k 1wt−k. (<strong>2.1</strong>9)<br />
The representation (<strong>2.1</strong>6) is fundamental for developing approximate<br />
forecasts and also exhibits the series as a linear process of the form considered<br />
in Problem 1.4.<br />
For data involving such autoregressive (AR) models as defined above, the<br />
main selection problems are deciding that the autoregressive structure is appropriate<br />
and then in determining the value of p for the model. The ACF of<br />
the process is a potential aid for determining the order of the process as are<br />
the model selection measures (2.8)-(<strong>2.1</strong>0). To determine the ACF of the pth<br />
order AR in (<strong>2.1</strong>3), , write the equation as<br />
xt −<br />
p<br />
k=1<br />
φkxt−k = wt
2.2 AR <strong>Models</strong> 35<br />
and multiply both sides by xt−h, h = 1, 2, . . .. Assuming that the mean E(xt) =<br />
0, and using the definition of the autocovariance function (1.2) leads to the<br />
equation<br />
p<br />
E[(xt − φkxt−k)xt−h] = E[wtxt−h]<br />
k=1<br />
The left-hand side immediately becomes<br />
γx(h) −<br />
p<br />
φkγx(h − k).<br />
k=1<br />
The representation (<strong>2.1</strong>6) implies that<br />
E[wtxt−h] = E[wt(wt−h + ψ1wt−h−1 + ψ2wt−h−2 + . . .)]<br />
For h = 0, we get σ 2 w. For all other h, the fact that the wt are independent<br />
implies that the right-hand side will be zero. Hence, we may write the equations<br />
for determining γx(h) as<br />
and<br />
γx(0) −<br />
γx(h) −<br />
p<br />
φkγx(h − k) = σ 2 w<br />
k=1<br />
(2.20)<br />
p<br />
φkγx(h − k) = 0 (2.21)<br />
k=1<br />
for h = 1, 2, 3, . . .. Note that one will need the property γx(−h) = γx(h)<br />
in solving these equations. Equations (2.20) and (2.21) are called the Yule-<br />
Walker Equations (see Yule, 1927, Walker, 1931).<br />
Example 2.3<br />
Consider finding the ACF of the first-order autoregressive model. First,<br />
(2.21) implies that γx(0) − φ1γx(1) = σ 2 w. For h = 1, 2, . . ., we obtain<br />
γx(h) − φ1γx(h − 1) = 0 Solving these successively gives<br />
Combining with (2.20) yields<br />
γx(h) = γx(0)φ h 1<br />
γx(0) = σ2 w<br />
1 − φ 2 1<br />
It follows that the autocovariance function is<br />
γx(h) = σ2 w<br />
1 − φ2 φ<br />
1<br />
h 1
36 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
Taking into account that γx(−h) = γx(h) and using (1.3), we obtain<br />
for h = 0, ±1, ±2, . . ..<br />
ρx(h) = φ |h|<br />
1<br />
The exponential decay is typical of autoregressive behavior and there may<br />
also be some periodic structure. However, the most effective diagnostic of AR<br />
structure is in the PACF and is summarized by the following identification<br />
property:<br />
Property P2.2: PACF for AR Process<br />
The partial autocorrelation function φhh as a function of lag h<br />
is zero for h > p, the order of the autoregressive process. This<br />
enables one to make a preliminary identification of the order p<br />
of the process using the partial autocorrelation function PACF.<br />
Simply choose the order beyond which most of the sample values<br />
of the PACF are approximately zero.<br />
To verify the above, note that the PACF (see Section 1.3.3) is basically the<br />
last cofficient obtained when minimizing the squared error<br />
MSE = E[(xt+h −<br />
h<br />
k=1<br />
akxt+h−k) 2 ].<br />
Setting the derivatives with respect to aj equal to zero leads to the equations<br />
This can be written as<br />
E[((xt+h −<br />
γx(j) −<br />
h<br />
akxt+h−k)xt+h−j] = 0<br />
k=1<br />
h<br />
akγx(j − k) = 0<br />
i=1<br />
for j = 1, 2, . . . , h. Now, from Equation and (2.21), it is clear that, for an<br />
AR(p), we may take ak = φk for k ≤ p and ak = 0 for k > p to get a solution<br />
for the above equation. This implies Property P2.3 above.<br />
Having decided on the order p of the model, it is clear that, for the estimation<br />
step, one may write the model (<strong>2.1</strong>3) in the regression form<br />
xt = φ ′ zt + wt, (2.22)
2.2 AR <strong>Models</strong> 37<br />
where φ = (φ1, φ2, . . . , φp) ′ corresponds to β and zt = (xt−1, xt−2, . . . , xt−p) ′<br />
is the vector of dependent variables in (2.2). Taking into account the fact that<br />
xt is not observed for t ≤ 0, we may run the regression approach in Section 3.1<br />
for t = p, p + 1, . . . , n − 1 to get estimators for φ and for σ 2 , the variance of the<br />
white noise process. These so-called conditional maximum likelihood estimators<br />
are commonly used because the exact maximum likelihood estimators involve<br />
solving nonlinear equations.<br />
Example 2.4<br />
We consider the simple problem of modeling the recruit series shown in<br />
Figure 1.1 using an autoregressive model. The bottom panel of Figure 1.9<br />
shows the autocorrelation ACF and partial autocorelation PACF functions<br />
of the recruit series. The PACF has large values for h = 1, 2 and<br />
then is essentially zero for higher order lags. This implies by Property<br />
P2.2 above that a second order (p = 2) AR model might provide a good<br />
fit. Running the regression program for the model<br />
leads to the estimators<br />
xt = β0 + φ1xt−1 + φ2xt−2 + wt<br />
ˆβ0 = 6.74(1.11), ˆ φ1 = 1.35(.04), ˆ φ2 = −.46(.04), ˆσ 2 = 90.31<br />
where the estimated standard deviations are in parentheses. To determine<br />
whether the above order is the best choice, we fitted models for<br />
p = 1, . . . , 10, obtaining corrected AICC values of 5.75, 5.52, 5.53, 5.54,<br />
5.54, 5.55, 5.55, 5.56, 5.57, and 5.58 respectively using (<strong>2.1</strong>0) with K = 2.<br />
This shows that the minimum AICC obtains for p = 2 and we choose<br />
the second-order model.<br />
Example 2.5<br />
The previous example used various autoregressive models for the recruits<br />
series, fitting a second-order regression model. We may also use this regression<br />
idea to fit the model to other series such as a detrended version<br />
of the Southern Oscillation Index (SOI) given in previous discussions.<br />
We have noted in our discussion of Figure 1.9 from the partial autocorrelation<br />
function (PACF) that a plausible model for this series might be<br />
a first order autoregression of the form given above with p = 1. Again,<br />
putting the model above into the regression framework (2.2) for a single<br />
coefficient leads to the estimators ˆ φ1 = .59 with standard error .04,<br />
ˆσ 2 = .09218 and AICC(1) = −1.375. The ACF of these residuals (not<br />
shown), however, will still show cyclical variation and it is clear that<br />
they still have a number of values exceeding the ±1.96/ √ n threshold
38 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
(see Equation 1.14). A suggested procedure is to try higher order autoregressive<br />
models and successive models for p = 1, 2, . . . , 30 were fitted<br />
and the AICC(K) values are plotted in Figure 3.10 of Chapter 3 so we do<br />
not repeat it here. There is a clear minimum for a p = 16th order model.<br />
The coefficient vector is ˆ φ with components .40, .07, .15, .08, -.04, -.08,<br />
-.09, -.08, .00, .11, .16, .15, .03, -.20, -.14 and -.06 and ˆσ 2 = .07354.<br />
Finally, we give a general approach to forecasting for any process that can<br />
be written in the form (<strong>2.1</strong>6). This includes the AR, MA and ARMA processes.<br />
We begin by defining an h-step forecast of the process xt as<br />
x t t+h = E[xt+h|xt, xt−1, . . .] (2.23)<br />
Note that this is not exactly right because we only have x1, xt, . . . , xt available,<br />
so that conditioning on the infinite past is only an approximation. From this<br />
definition is reasonable to intuit that x t s = xt, s ≤ t and<br />
for s ≤ t. For s > t, use x t s and<br />
E[ws|xt, xt−1, . . .] = E[ws|wt, wt−1 . . .] = ws, (2.24)<br />
E[ws|xt, xt−1, . . .] = E[ws|wt, wt−1, . . .] = E[ws] = 0, (2.25)<br />
since ws will be independent of past values of wt. We define the h-step forecast<br />
variance as<br />
P t t+h = E[(xt+h − x t t+h) 2 |xt, xt−1, . . .]. (2.26)<br />
To develop an expression for this mean square error, note that, with ψ0 = 1,<br />
we can write<br />
∞<br />
xt+h = ψkwt+h−k.<br />
Then, since w t t+h−k<br />
so that the residual is<br />
k=0<br />
= 0 for t + h − k > t, i.e. k < h, we have<br />
x t t+h =<br />
∞<br />
k=h<br />
xt+h − x t h−1 <br />
t+h =<br />
ψkwt+h−k,<br />
k=0<br />
ψkwt+h−k,<br />
Hence, the mean square error (2.26) is just the variance of a linear combination<br />
of independent zero mean errors, with common variance σ 2 w<br />
h−1 <br />
P t t+h = σ 2 w ψ<br />
k=0<br />
2 k<br />
(2.27)
2.2 AR <strong>Models</strong> 39<br />
As an example, we consider forecasting the second order model developed for<br />
the recruit series in Example 2.5.<br />
Example 2.6<br />
Consider the one-step forecast x t t+1 first. Writing the defining equation<br />
for t + 1 gives<br />
xt+1 = φ1xt + φ2xt−1 + wt+1,<br />
so that<br />
x t t+1<br />
Continuing in this vein, we obtain<br />
Then,<br />
x t t+h<br />
x t t+2<br />
= φ1x t t + φ2x t t−1 + w t t+1<br />
= φ1xt + φ2xt−1 + 0<br />
= φ1x t t+1 + φ2x t t + w t t+2<br />
= φ1x t t+1 + φ2xt + 0.<br />
= φ1x t t+h−1 + φ2x t t+h−2 + wt t+h<br />
= φ1x t t+h−1 + φ2x t t+h−2<br />
for h > 2. Forecasts out to lag h = 4 and beyond, if necessary, can be<br />
found by solving (<strong>2.1</strong>8) for ψ1, ψ2 and ψ3, and substituting into (2.26).<br />
By equating coefficients of B, B 2 and B 3 in<br />
+ 0<br />
(1 − φ1B − φ2B 2 )(1 + ψ1B + ψ2B 2 + ψ3B 3 + . . .) = 1,<br />
we obtain ψ1 = φ1, ψ2 − φ2 + φ1ψ1 = 0 and ψ3 − φ1ψ2 − φ2ψ1 = 0.<br />
This gives the coefficients ψ1 = φ1, ψ2 = φ2 − φ 2 1, ψ3 = 2φ2φ1 − φ 2 1 From<br />
Example 2.5, we have ˆ φ1 = 1.35, ˆ φ2 = −.46, ˆσ 2 w = 90.31 and ˆ β0 = 6.74.<br />
The forecasts are of the form<br />
x t t+h = 6.74 + 1.35x t t+h−1 − .46x t t+h−2<br />
For the forecast variance, we evaluate ψ1 = 1.35, ψ2 = −2.282, ψ3 =<br />
−3.065, leading to 90.31, 90.31(2.288), 90.31(7.495) and 90.31(16.890) for<br />
forecasts at h = 1, 2, 3, 4. The standard deviations of the forecasts are<br />
9.50, 14.37, 26.02 and 39.06 for the standard errors of the forecasts. The<br />
recruit series values range from 20 to 100 so the forecast uncertainty will<br />
be rather large.
40 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
2.4 Moving Average (MA) <strong>Models</strong><br />
We may also consider processes that contain linear combinations of underlying<br />
unobserved shocks, say, represented by white noise series wt. These moving<br />
average components generate a series of the form<br />
xt = wt − θ1wt−1 − θ2wt−2 − . . . − θqwt−q<br />
(2.28)<br />
where q denotes the order of the moving average component and θ1, θ2, . . . , θq<br />
are parameters to be estimated. Using the shift notation, the above equation<br />
can be written in the form<br />
xt = θ(B)wt<br />
(2.29)<br />
where<br />
θ(B) = 1 − θ1B − θ2B 2 − . . . − θqB q<br />
(2.30)<br />
is another polynomial in the shift operator B. It should be noted that the MA<br />
process of order q is a linear process of the form considered earlier in Problem<br />
1.4 with ψ0 = 1, ψ2 = −θ1, . . . , ψq = θq. This implies that the ACF will be<br />
zero for lags larger than q because terms in the form of the covariance function<br />
given in Problem 1.4 of Chapter 1 will all be zero. Specifically, he exact forms<br />
are<br />
for h + 0 and<br />
γx(0) = σ 2 w<br />
γx(h) = σ 2 w<br />
<br />
1 +<br />
q<br />
k=1<br />
q−h <br />
−θh +<br />
k=1<br />
θ 2 k<br />
<br />
θk+hθk<br />
for h = 1, . . . , q − 1, with γx(q) = −σ 2 wθq, and γx(h) = 0 for h > q.<br />
Hence, we will have<br />
<br />
(2.31)<br />
(2.32)<br />
P2.3: ACF for MA <strong>Series</strong><br />
For a moving average series of order q, note that the autocorrelation<br />
function (ACF) is zero for lags h > q, i.e. ρx(h) = 0<br />
for h > q. Such a result enables us to diagnose the order of a<br />
moving average component by examining ˆρx(h) and choosing q<br />
as the value beyond which the coefficients are essentially zero.<br />
Example 2.7<br />
Consider the varve thicknesses in Figure 1.10, which is described in Problem<br />
1.7 of Chapter 1. Figure 2.4 shows the ACF and PACF of the original<br />
log-transformed varve series and the first differences. The ACF of the<br />
original series indicates a possible non-stationary behavior, and suggests<br />
taking a first difference, interpreted hear as the percentage yearly change
2.2 MA <strong>Models</strong> 41<br />
in deposition. The ACF of the first difference shows a clear peak at h = 1<br />
and no other significant peaks, suggesting a first-order moving average.<br />
Fitting the first order moving average model xt = wt − θ1wt−1 to this<br />
data using the Gauss-Newton procedure described next leads to ˆ θ1 = .77<br />
and ˆσ 2 w = .2358.<br />
1<br />
0.5<br />
0<br />
log varves<br />
−0.5<br />
0 10 20 30<br />
1<br />
0.5<br />
0<br />
ACF<br />
First difference<br />
−0.5<br />
0 10 20 30<br />
1<br />
0.5<br />
0<br />
−0.5<br />
0 10 20 30<br />
1<br />
0.5<br />
0<br />
PACF<br />
−0.5<br />
0 10 20 30<br />
Figure 2.4 Autocorrelation functions (ACF) and partial autocorrelation functions<br />
(PACF) for the log varve series (top panel) and the first difference (bottom<br />
panel), showing a peak in the ACF at lag h = 1.<br />
Fitting the pure moving average term turns into a nonlinear problem as we<br />
can see by noting that either maximum likelihood or regression involves solving<br />
(2.28) or (2.29) for wt, and minimizing the sum of the squared errors. Suppose<br />
that the roots of π(B) = 0 are all outside the unit circle, then this is possible<br />
by solving π(B)θ(B) = 1, so that, for the vector parameter θ = (θ1, . . . , θq) ′ ,<br />
we may write<br />
wt(θ) = π(B)xt<br />
(2.33)<br />
and minimize<br />
SSE(θ) =<br />
n<br />
t=q+1<br />
w 2 t (θ)<br />
as a function of the vector parameter θ. We don’t really need to find the operator<br />
π(B) but can simply solve (2.23) recursively for wt, with w1, w2, . . . wq = 0
42 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
and<br />
wt(θ) = xt +<br />
q<br />
k=1<br />
θkwt−k<br />
for t = q+1, . . . , n. It is easy to verify that SSE(θ) will be a nonlinear function<br />
of θ1, θ2, . . . , θq. However, note that<br />
′<br />
∂wt<br />
wt(θ) ≈ wt(θ0) + (θ − θ0),<br />
∂t<br />
where the derivative is evaluated at the previous guess θ0. Rearranging the<br />
above equation leads to<br />
<br />
wt(θ0) ≈ − ∂wt<br />
′<br />
(θ − θ0) + wt(θ), (2.34)<br />
∂θ<br />
which is just the regression model (2.2). Hence, we can begin with an initial<br />
guess θ0 = (.1, .1, . . . , .1) ′ , say and successively minimize SSE(θ) until<br />
convergence.<br />
In order to forecast a moving average series, note that<br />
q<br />
xt+h = wt+h − θkwt+h−k.<br />
The results below (2.24) imply that<br />
x t t+h = −<br />
q<br />
k=1<br />
k=h+1<br />
θkw t t+h−k,<br />
where the wt values needed for the above are computed recursively as before.<br />
Because of (<strong>2.1</strong>7), it is clear that ψ0 = 1 and ψk = −θk, k = 1, 2, . . . , q and<br />
these values can be substituted directly into the variance formula (2.27).<br />
2.5 Autoregressive Integrated Moving Average<br />
(ARIMA) <strong>Models</strong><br />
Now combining the autoregressive and moving average components leads<br />
to the autoregressive moving average ARMA(p, q) model, written as<br />
φ(B)xt = θ(B)wt, (2.35)<br />
where the polynomials in B are as defined earlier in (<strong>2.1</strong>5) and (2.29), with p<br />
autoregressive coefficients and q moving average coefficients. In the difference<br />
equation form, this becomes<br />
p<br />
q<br />
xt − φkxt−k = wt − θkwt−k. (2.36)<br />
k=1<br />
k=1
2.5 ARIMA <strong>Models</strong> 43<br />
The mixed processes do not satisfy the properties P<strong>2.1</strong>-P2.3 any more but<br />
they tend to behave in approximately the same way, even for the mixed cases.<br />
Estimation and forecasting for such problems are treated in essentially the<br />
same manner as for the AR and MA processes. We note that we can formally<br />
divide both sides of (2.25) by φ(B) and note that the usual representation<br />
(<strong>2.1</strong>6) holds when<br />
ψ(B)φ(B) = θ(B). (2.37)<br />
For forecasting, we determine the ψ1, ψ2, . . . by equating coefficients of B, B 2 , B 3 , . . .<br />
in (2.37), as before, assuming the all the roots of φ(B) = 0 are greater than<br />
one in absolute value. Similarly, we can always solve for the residuals, say<br />
wt = xt −<br />
p<br />
φkxt−k +<br />
k=1<br />
q<br />
k=1<br />
θkwt−k<br />
to get the terms needed for forecasting and estimation.<br />
(2.38)<br />
Example 2.8<br />
Consider the above mixed process with p = q = 1, i.e. ARMA(1, 1). By<br />
(2.26), we may write<br />
Now,<br />
so that<br />
xt = φ1xt−1 + wt − θ1wt−1.<br />
xt+1 = φ1xt + wt+1 − θ1wt<br />
x t t+1 = φ1xt + 0 − θ1wt<br />
and xt t+h = φxt t+h−1 for h > 1, leading to very simple forecasts in this<br />
case. Equating coefficients of Bk in<br />
leads to<br />
(1 − φB)(1 + ψ1B + ψ2B 2 + . . .) = (1 − θ1B)<br />
ψk = (φ1 − θ1)φ k−1<br />
1<br />
for k = 1, 2, . . .. Using (2.26) leads to the expression<br />
P t t+h<br />
= σ 2 w<br />
for the forecast variance.<br />
<br />
1 + (φ1 − θ1) 2 h−1 k=1 φ2(k−1)<br />
<br />
1<br />
<br />
<br />
= σ2 w 1 + (φ1−θ1) 2 (1−φ 2(h−1)<br />
1 )<br />
(1−φ2 1 )
44 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
In the first example of this chapter, it was noted that nonstationary processes<br />
are characterized by a slow decay in the ACF as in Figure 2.3. In many of the<br />
cases where slow decay is present, the use of a first order difference<br />
∆xt = xt − xt−1<br />
= (1 − B)xt<br />
will reduce the nonstationary process xt to a stationary series ∆xt. On can<br />
check to see whether the slow decay has been eliminated in the ACF of the<br />
transformed series. Higher order differences, ∆ d xt = ∆∆ d−1 xt are possible<br />
and we call the process obtained when the d th difference is an ARMA series<br />
an ARIMA(p, d, q) series where p is the order of the autoregressive component,<br />
d is the order of differencing needed and q is the order of the moving average<br />
component. Symbolically, the form is<br />
φ(B)∆ d xt = θ(B)wt<br />
(2.39)<br />
The principles of model selection for ARIMA(p, d, q) series are obtained using<br />
the extensions of (2.8)-(<strong>2.1</strong>0) which replace K by K = p + q the total number<br />
of ARMA parameters.<br />
2.6 Seasonal ARIMA <strong>Models</strong><br />
When the autoregressive, differencing, or seasonal moving average behavior<br />
seems to occur at multiples of some underlying period s, a seasonal ARIMA<br />
series may result. The seasonal nonstationarity is characterized by slow decay<br />
at multiples of s and can often be eliminated by a seasonal differencing operator<br />
of the form<br />
∇ D s xt = (1 − B s ) D xt.<br />
For example, when we have monthly data, it is reasonable that a yearly phenomenon<br />
will induce s = 12 and the ACF will be characterized by slowly<br />
decaying spikes at 12, 24, 36, 48, . . . and we can obtain a stationary series by<br />
transforming with the operator (1−B 12 )xt = xt −xt−12 which is the difference<br />
between the current month and the value one year or 12 months ago.<br />
If the autoregressive or moving average behavior is seasonal at period s, we<br />
define formally the operators<br />
and<br />
Φ(B s ) = 1 − Φ1(B s ) − Φ2(B 2s ) − . . . − ΦP (B P s ) (2.40)<br />
Θ(B s ) = 1 − Θ1(B s ) − Θ2(B 2s ) − . . . − ΘQ(B Qs ). (2.41)<br />
The final form of the ARIMA(p, d, q) × ARIMA(P, D, Q)s model is<br />
Φ(B s )φ(B)∆ s D∆ d xt = Θ(B s )θ(B)wt<br />
(2.42)
2.5 SARIMA <strong>Models</strong> 45<br />
We may also note the properties below corresponding to P<strong>2.1</strong>-P2.3<br />
Property P<strong>2.1</strong>’: ACF and PACF of a seasonally non-stationary time<br />
series<br />
The ACF of a seasonally non-stationary time series decays<br />
very slowly at lag multiples s, 2s, 3s, . . . with zeros in between,<br />
where s denotes a seasonal period ,usually 12. The PACF of a<br />
non-stationary time series tends to have a peak very near unity<br />
at lag s.<br />
Property P2.2’: PACF for Seasonal AR <strong>Series</strong><br />
The partial autocorrelation function φhh as a function of lag<br />
h has nonzero values at s, 2s, 3s, . . . , P s, with zeros in between,<br />
and is zero for h > P s, the order of the seasonal autoregressive<br />
process. There should be some exponential decay.<br />
P2.3’: ACF for a Seasonal MA <strong>Series</strong><br />
For a seasonal moving average series of order Q, note that the<br />
autocorrelation function (ACF) has nonzero values at s, 2s, 3s, . . . , Qs<br />
and is zero for h > Qs<br />
Example 2.9:<br />
We illustrate by fitting the monthly birth series from 1948-1979 shown in<br />
Figure 2.5. The period encompasses the boom that followed the Second<br />
World War and there is the expected rise which persists for about 13<br />
years followed by a decline to around 1974, The series appears to have<br />
long-term swings, with seasonal effects super-imposed. The long-term<br />
swings indicate possible non-stationarity and we verify that this is the<br />
case by checking the ACF and PACF shown in the top panel of Figure<br />
2.6. Note, that by Property <strong>2.1</strong>, slow decay of the ACF indicates nonstationarity<br />
and we respond by taking a first difference. The results<br />
shown in the second panel of Figure 2.5 indicate that the first difference<br />
has eliminated the strong low frequency swing. The ACF, shown in the<br />
second panel from the top in Figure 2.6 shows peaks at 12, 24, 36, 48,<br />
..., with now decay. This behavior implies seasonal non-stationarity, by<br />
Property P<strong>2.1</strong>’ above, with s = 12. A seasonal difference of the first<br />
difference generates an ACF and PACF in Figure 2.6 that we expect for<br />
stationary series.<br />
Taking the seasonal difference of the first difference gives a series that<br />
looks stationary and has an ACF with peaks at 1 and 12 and a PACF with<br />
a substantial peak at 12 and lesser peaks at 12,24, .... This suggests trying<br />
either a first order moving average term, by Property P2.3, or a first order
46 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
400<br />
300<br />
200<br />
50<br />
0<br />
−50<br />
50<br />
0<br />
−50<br />
50<br />
0<br />
−50<br />
Births<br />
50 100 150 200 250 300 350<br />
1st diff.<br />
50 100 150 200 250 300 350<br />
ARIMA(0,1,0)X(0,1,0) 12<br />
50 100 150 200 250 300 350<br />
ARIMA(0,1,1)X(0,1,1) 12<br />
50 100 150 200 250 300 350<br />
month<br />
Figure 2.5 Number of live births 1948(1)-1979(1) and residuals from models with a<br />
first difference, a first difference and a seasonal difference of order 12 and a<br />
fitted ARIMA(0, 1, 1) × (0, 1, 1)12 model.
2.5 SARIMA <strong>Models</strong> 47<br />
1<br />
0.5<br />
0<br />
−0.5<br />
0 20 40 60<br />
1<br />
0.5<br />
0<br />
−0.5<br />
0 20 40 60<br />
1<br />
0.5<br />
0<br />
−0.5<br />
0 20 40 60<br />
1<br />
0.5<br />
0<br />
−0.5<br />
0 20 40 60<br />
1<br />
0.5<br />
0<br />
ACF<br />
−0.5<br />
0 20 40 60<br />
lag lag<br />
1<br />
0.5<br />
0<br />
PACF<br />
data<br />
−0.5<br />
0 20 40 60<br />
1<br />
0.5<br />
0<br />
ARIMA(0,1,0)<br />
−0.5<br />
0 20 40 60<br />
1<br />
0.5<br />
0<br />
ARIMA(0,1,0)X(0,1,0) 12<br />
−0.5<br />
0 20 40 60<br />
1<br />
0.5<br />
0<br />
ARIMA(0,1,0)X(0,1,1) 12<br />
−0.5<br />
0 20 40 60<br />
1<br />
0.5<br />
0<br />
ARIMA(0,1,1)X(0,1,1) 12<br />
−0.5<br />
0 20 40 60<br />
Figure 2.6 Autocorrelation functions (ACF) and partial autocorrelation functions<br />
(PACF) for the birth series (top two panels), the first difference (second<br />
two panels) an ARIMA(0, 1, 0 × (0, 1, 1)12 model (third two panels) and an<br />
ARIMA(0, 1, 1) × (0, 1, 1)12 model (last two panels.<br />
seasonal moving average term with s = 12, by Property P2.3’ above. We<br />
choose to eliminate the largest peak first by applying a first-order seasonal<br />
moving average model with s = 12. The ACF and PACF of the residual<br />
series from this model, i.e. from ARIMA(0, 1, 0) × (0, 1, 112, written as<br />
(1 − B)(1 − B 12 )xt = (1 − Θ1B 12 )wt,
48 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
Forecast 1979(2)−1982(1)<br />
450<br />
400<br />
350<br />
300<br />
250<br />
200<br />
lower 95%<br />
forecast<br />
upper 95%<br />
150<br />
370 375 380 385 390<br />
month<br />
395 400 405 410<br />
Figure 2.7 A 36 month forecast for the birth series with 95% uncertainty limits.<br />
is shown in the fourth panel from the top in Figure 2.6. We note that<br />
the peak at lag one is still there, with attending exponential decay in<br />
the PACF. This can be eliminated by fitting a first-order moving average<br />
term and we consider the model ARIMA(0, 1, 1) × (0, 1, 1)12, written as<br />
(1 − B)(1 − B 12 )xt = (1 − θ1B)(1 − Θ1B 12 )wt<br />
The ACF of the residuals from this model are relatively well behaved<br />
with a number of peaks either near or exceeding the 95% test of no<br />
correlation. Fitting this final ARIMA(0, 1, 1) × (0, 1, 1)12 model leads to<br />
the model<br />
(1 − B)(1 − B 12 )xt = (1 − .4896B)(1 − .6844B 12 )wt<br />
AICc = 4.95, R 2 = .9804 2 = .961, P − values = .000, .000<br />
R 2 is computed from saving the predicted values and then plotting against<br />
the observed values using the 2-D plot option. The format that ASTSA<br />
puts out these results is shown below.<br />
ARIMA(0,1,1)x(0,1,1)x12 from U.S. Births AICc = 4.94684 variance =<br />
51.1906 d.f. = 358 Start values = .1
2.5 SARIMA <strong>Models</strong> 49<br />
predictor coef st. error t-ratio p-value<br />
MA(1) .4896 .04620 10.5966 .000<br />
SMA(1) .6844 .04013 17.0541 .000<br />
(D1) (D(12)1) x(t) = (1 -.49B1) (1 -.68B12) w(t)<br />
The ARIMA search in ASTSA leads to the model<br />
(1−.0578B 12 )(1−B)(1−B 12 )xt = (1−.4119B−.1515B 2 )(1−.8136B 12 )wt<br />
with AICc = 4.8526, somewhat lower than the previous model. The seasonal<br />
autoregressive coefficient is not statistically significant and should<br />
probably be omitted from the model. The new model becomes<br />
(1 − B)(1 − B 12 )xt = (1 − .4088B − .1645B 2 )(1 − .6990B 12 )wt,<br />
yielding AICc = 4.92 and R 2 = .981 2 = .962, slightly better than the<br />
ARIMA(0, 1, 1) × (0, 1, 1)12 model. Evaluating these latter models leads<br />
to the conclusion that the extra parameters do not add a practically<br />
substantial amount to the predictability.<br />
The model is expanded as<br />
so that<br />
or<br />
(1 − B)(1 − B 12 )xt = (1 − θ1B)(1 − Θ1B 12 )wt<br />
(1 − B − B 12 + B 13 )xt = (1 − θ1B − θ1B 12 + θ1Θ1B 13 )wt<br />
xt − xt−1 − xt−12 + xt−13 = wt − θ1wt−1 − Θ1wt−12 + θ1Θ1wt−13<br />
xt = xt−1 + xt−12 − xt−13 + wt − θ1wt−1 − Θ1wt−12 + θ1Θ1wt−13<br />
The forecast is<br />
x t t+1 = xt + xt−11 − xt−12 − θ1wt − Θ1wt−11 + θ1Θ1wt−12<br />
x t t+2 = x t t+1 + xt−10 − xt−11 − Θ1wt−10 + θ1Θ1wt−11<br />
Continuing in the same manner, we obtain<br />
x t t+12 = x t t+11 + xt − xt−1 − Θ1wt + θ1Θ1wt−1<br />
for the 12 month forecast.
50 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
The forecast limits are quite variable with a standard error that rises to<br />
20% of the mean by the end of the forecast period The plot shows that<br />
the general trend is upward, rising from about 250,000 to about 290,000<br />
births per year. One could check the actual records from the years 1979-<br />
1982. The direction is not certain because of the large uncertainty. One<br />
could compute the probability<br />
<br />
250 − 290<br />
P (Bt+47 ≤ 250, 000) = Φ<br />
= .25,<br />
60<br />
so there is a 75% chance of increase.<br />
A website where the forecasts can be compared on a yearly basis is<br />
http://www.cdc.gov/nccdphp/drh/pdf/nvs/nvs48 tb1.pdf<br />
Example <strong>2.1</strong>0:<br />
Figure 2.8 shows the autocorrelation function of the log-transformed J&J<br />
earnings series that is plotted in Figure 1.4 and we note the slow decay<br />
indicating the nonstationarity which has already been obvious in the<br />
Chapter 1 discussion. We may also compare the ACF with that of a<br />
random walk, shown in Figure 3.2, and note the close similarity. The<br />
partial autocorrelation function is very high at lag one which, under ordinary<br />
circumstances, would indicate a first order autoregressive AR(1,0)<br />
model, except that, in this case, the value is close to unity, indicating a<br />
root close to 1 on the unit circle. The only question would be whether<br />
differencing or detrending is the better transformation to stationarity.<br />
Following in the Box-Jenkins tradition, differencing leads to the ACF<br />
and PACF shown in the second panel and no simple structure is apparent.<br />
To force a next step, we interpret the peaks at 4, 8, 12, 16, . . . as<br />
contributing to a possible seasonal autoregressive term, leading to a possible<br />
ARIMA(0, 1, 0)×(1, 0, 0)4 and we simply fit this model and look at<br />
the ACF and PACF of the residuals, shown in the third two panels. The<br />
fit improves somewhat, with significant peaks still remaining at lag 1 in<br />
both the ACF and PACF. The peak in the ACF seems more isolated and<br />
there remains some exponentially decaying behavior in the PACF, so we<br />
try a model with a first-order moving average. The bottom two panels<br />
show the ACF and PACF of the resulting ARIMA(0, 1, 1) × (1, 0, 0)4<br />
and we note only relatively minor excursions above and below the 95%<br />
intervals under the assumption that the theoretical ACF is white noise.<br />
The final model suggested is (yt = log x2)<br />
(1 − Φ1B 4 )(1 − B)yt = (1 − θ1B)wt,<br />
where ˆ Φ1 = .820(.058), ˆ θ1 = .508(.098) and ˆσ 2 w = .0086. The model can<br />
be written in forecast form as<br />
yt = yt−1 + Φ1(yt−4 − yt−5) + wt − θ1wt−1.
2.6 Correlated Regression 51<br />
To forecast the original series for, say 4 quarters, we compute the forecast<br />
limits for yt = log xt and then exponentiate, i.e.<br />
x t t+h = exp{y t t+h}<br />
We note the large limits on the forecast values in Figure 2.9 and mention<br />
that the situation can be improved by the regression approach in the<br />
next section<br />
2.7 Regression <strong>Models</strong> With Correlated Errors<br />
The standard method for dealing with correlated errors et in the in the regression<br />
model<br />
yt = β ′ zt + et<br />
(2.2) ′<br />
is to try to transform the errors et into uncorrelated ones and then apply the<br />
standard least squares approach to the transformed observations. For example,<br />
let P be an n × n matrix that transforms the vector e = (e1, . . . , en) ′ into a<br />
set of independent identically distributed variables with variance σ 2 . Then,<br />
transform the matrix version (2.4) to<br />
P y = P Zβ + P e<br />
and proceed as before. Of course, the major problem is deciding on what to<br />
choose for P but in the time series case, happily, there is a reasonable solution,<br />
based again on time series ARMA models. Suppose that we can find, for<br />
example, a reasonable ARMA model for the residuals, say, for example the<br />
ARMA(p,0,0) model<br />
p<br />
et = φket−k + wt,<br />
k=1<br />
which defines a linear transformation of the correlated et to a sequence of<br />
uncorrelated wt. We can ignore the problems near the beginning of the series<br />
by starting at t = p. In the ARMA notation, using the backshift operator B,<br />
we may write<br />
φ(B)et = wt, (2.43)<br />
where<br />
φ(B) = 1 −<br />
p<br />
φkB k , (2.44)<br />
and applying the operator to both sides of (2.2) leads to the model<br />
k=1<br />
φ(B)yt = β ′ φ(B)zt + wt, (2.45)
52 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
1<br />
0.5<br />
0<br />
log(J&J)<br />
−0.5<br />
0<br />
1<br />
10 20 30<br />
0.5<br />
4 8<br />
12 diff<br />
0<br />
−0.5<br />
0<br />
1<br />
10 20 30<br />
0.5<br />
0<br />
−0.5<br />
0<br />
1<br />
1<br />
10 20 30<br />
0.5<br />
0<br />
ACF<br />
ARIMA(0,1,0)X(1,0,0) 4<br />
ARIMA(0,1,1)X(1,0,0) 4<br />
−0.5<br />
0 10 20 30<br />
lag h<br />
1<br />
0.5<br />
0<br />
1<br />
PACF<br />
−0.5<br />
0<br />
1<br />
10 20 30<br />
0.5<br />
0<br />
−0.5<br />
0<br />
1<br />
10 20 30<br />
0.5<br />
0<br />
1<br />
−0.5<br />
0<br />
1<br />
10 20 30<br />
0.5<br />
0<br />
−0.5<br />
0 10 20 30<br />
Figure 2.8 Autocorrelation functions (ACF) and partial autocorrelation functions<br />
(PACF) for the log J&J earnings series (top two panels), the first difference<br />
(second two panels) and two sets of ARIMA residuals.<br />
lag h
earnings<br />
2.6 Correlated Regression 53<br />
30<br />
25<br />
20<br />
15<br />
10<br />
5<br />
− observed<br />
−− predicted<br />
forecasts<br />
0<br />
0 10 20 30 40 50 60 70 80 90<br />
quarter<br />
Figure 2.9 Observed and predicted values for the Johnson and Johnson Earnings <strong>Series</strong><br />
with forecast values for the next four quarters, using the ARIMA(0, 1, 1)×<br />
(1, 0, 0)4 model for the log-transformed data.<br />
where the wt now satisfy the independence assumption. Doing ordinary least<br />
squares on the transformed model is the same as doing weighted least squares<br />
on the untransformed model. The only problem is that we do not know the<br />
values of the coefficients φk, k = 1, . . . , p in the transformation (2.42). However,<br />
if we knew the residuals et, it would be easy to estimate the coefficients, since<br />
(2.42) can be written in the form<br />
et = φ ′ et−1 + wt, (2.46)<br />
which is exactly the usual regression model (2.2) with φ ′ = (φ1, . . . , φp) replacing<br />
β and e ′ t−1 = (et−1, et−2, . . . , et−p) replacing zt.<br />
The above comments suggest a general approach known as the Cochran-<br />
Orcutt procedure (Cochrane and Orcutt, 1949) for dealing with the problem<br />
of correlated errors in the time series context.<br />
1. Begin by fitting the original regression model (2.2) by least squares, obtaining<br />
ˆ β and the residuals êt = yt − ˆ β ′<br />
zt<br />
2. Fit an ARMA to the estimated residuals, say<br />
φ(B)êt = θ(B)wt
54 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
1<br />
0.5<br />
0<br />
3. Apply the ARMA transformation found to both sides of the regression<br />
equation (2.2)’ to obtain<br />
φ(B)<br />
θ(B) yt = β ′ φ(B)<br />
θ(B) zt + wt<br />
4. Run an ordinary least squares on the transformed values to obtain the<br />
new ˆ β.<br />
5. Return to 2. if desired. Often, one iteration is enough to develop the estimators<br />
under a reasonable correlation structure. In general, the Cochran-<br />
Orcutt procedure converges to the maximum likelihood or weighted least<br />
squares estimators.<br />
4,8,12,16<br />
detrended<br />
−0.5<br />
0 10 20 30<br />
1<br />
0.5<br />
0<br />
ACF<br />
ARIMA(1,0,0) 4<br />
−0.5<br />
0 10 20 30<br />
1<br />
0.5<br />
0<br />
4<br />
PACF<br />
−0.5<br />
0 10 20 30<br />
1<br />
0.5<br />
0<br />
−0.5<br />
0 10 20 30<br />
Figure <strong>2.1</strong>0 Autocorrelation functions (ACF) and partial autocorrelation functions<br />
(PACF) for the detrended log J&J earnings series (top two panels)and the<br />
fitted ARIMA(00, 0, 0) × (1, 0, 0)4 residuals.
earnings<br />
2.6 Correlated Regression 55<br />
30<br />
25<br />
20<br />
15<br />
10<br />
5<br />
− observed<br />
−− predicted<br />
forecasts<br />
0<br />
0 10 20 30 40 50 60 70 80 90<br />
quarter<br />
Figure 2.8 Observed and predicted values for the Johnson and Johnson Earnings <strong>Series</strong><br />
with forecast values for the next four quarters, using the correlated<br />
regression model for the log-transformed data.<br />
Example <strong>2.1</strong>1:<br />
We might consider an alternative approach to treating the Johnson and<br />
Johnson Earnings <strong>Series</strong>, assuming that<br />
yt = log xt = β1 + β2t + et<br />
In order to analyze the data with this approach, first we fit the model<br />
above, obtaining ˆ β1 = −.6678(.0349) and ˆ β2 = .0417(.0071). The computed<br />
residuals êt = yt − ˆ β1 − ˆ β2 t can be computed easily, the ACF and<br />
PACF are shown in the top two panels of Figure 2.7. Note that the ACF<br />
and PACF suggest that a seasonal AR series will fit well and we show<br />
the ACF and PACF of these residuals in the bottom panels of Figure<br />
2.7. The seasonal AR model is of the form<br />
et = Φ1et−4 + wt<br />
and we obtain ˆ Φ1 = .7614(.0639), with ˆσ 2 w = .00779. Using these values,<br />
we transform yt to<br />
yt − ˆ Φ1yt−4 = β1(1 − ˆ Φ1) + β2[t − ˆ Φ1(t − 4)] + wt
56 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
using the estimated value ˆ Φ1 = .7614. With this transformed regression,<br />
we obtain the new estimators ˆ β1 = −.7488(.1105) and ˆ β2 = .0424(.0018).<br />
The new estimator has the advantage of being unbiased and having a<br />
smaller generalized variance.<br />
To forecast, we consider the original model, with the newly estimated ˆ β1<br />
and ˆ β2. We obtain the approximate forecast for<br />
y t t+h = ˆ β1 + ˆ β2(t + h) + ê t t+h<br />
for the log transformed series, along with upper and lower limits depending<br />
on the estimated variance that only incorporates the prediction<br />
variance of et t+h , considering the trend and seasonal autoregressive parameters<br />
as fixed. The narrower upper and lower limits shown in Figure<br />
2.8 are mainly a reflection of a slightly better fit to the residuals and the<br />
ability of the trend model to take care of the nonstationarity.<br />
2.8 Chapter 2 Problems<br />
<strong>2.1</strong> Consider the regression model<br />
yt = β1yt−1 + et<br />
where et is white noise with zero-mean and variance σ 2 e. Assume that we<br />
observe y1, y2, . . . , yn and consider the model above for t = 2, 3, . . . , n.<br />
Show that the least squares estimator of β1 is<br />
ˆβ1 =<br />
n t=2 ytyt−1<br />
n t=2 y2 t−1<br />
If we pretend that yt−1 are fixed, show that<br />
var{ ˆ β1} =<br />
.<br />
σ 2 e<br />
n<br />
t=2 y2 t−1<br />
Relate your answer to a method for fitting a first-order AR model to the<br />
data yt.<br />
2.2 Consider the autoregressive model (<strong>2.1</strong>3) for p = 1, i.e.<br />
xt − φ1xt−1 = wt<br />
(a) show that the necessary condition below (<strong>2.1</strong>5) implies that |φ1| < 1.
Chapter 2 Problems 57<br />
(b) Show that<br />
xt =<br />
∞<br />
k=0<br />
is the form of (<strong>2.1</strong>6) in this case.<br />
φ k 1wt−k<br />
(c) Show that E[wtxt] = σ 2 w and E[wtxt−1] = 0, so that future errors<br />
are uncorrelated with past data.<br />
2.3 The autocovariance and autocorrelation functions for AR processes are<br />
often derived from the Yule-Walker equations, obtained by multiplying<br />
both sides of the defining equation, successively by xt, xt−1, xt−2, . . .,<br />
using the result (<strong>2.1</strong>6).<br />
(a) Derive the Yule-Walker equations<br />
⎧<br />
⎨ σ<br />
γx(h) − φ1γx(h − 1) =<br />
⎩<br />
2 w, h = 0<br />
0, h > 0.<br />
(b) Use the Yule-Walker equations to show that<br />
for the first-order AR.<br />
ρx(h) = φ |h|<br />
1<br />
2.4 For an ARMA series we define the optimal forecast based on xt, xt−1, . . .<br />
as the conditional expectation<br />
for h = 1, 2, 3, . . ..<br />
x t t+h = E[xt+h|xt, xt−1, . . .]<br />
(a) Show, for the general ARMA model that<br />
⎧<br />
⎨ 0, h > 0<br />
E[wt+h|xt, xt−1, . . .] =<br />
⎩<br />
wt+h, h ≤ 0<br />
(b) For the first-order AR model, show that the optimal forecast is<br />
⎧<br />
⎨ φ1xt, h = 1<br />
x t t+h =<br />
⎩<br />
φ1xt t+h−1 , h > 1<br />
(c) Show that E[(x t t+1 − xt+1) 2 ] = σ 2 w is the prediction error variance<br />
of the one-step forecast.
58 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
150<br />
100<br />
50<br />
100<br />
80<br />
60<br />
40<br />
100<br />
50<br />
0<br />
2.5 Suppose we have the simple linear trend model<br />
t = 1, 2, . . . , n, where<br />
yt = β1t + xt,<br />
xt = φ1xt−1 + wt.<br />
Give the exact form of the equations that you would use for estimating<br />
β1, φ1 and σ 2 w using the Cochran-Orcutt procedure of Section 2.7.<br />
LA Cardiovascular Mortality<br />
50 100 150 200 250 300 350 400 450 500<br />
Temperature<br />
50 100 150 200 250 300 350 400 450 500<br />
Particulate Level<br />
50 100 150 200 250 300 350 400 450 500<br />
Figure 2.9 Los Angeles Mortality, Temperature and Particulates (6-day increment).<br />
2.6 Consider the file la regr.dat, in the syllabus, which contains cardiovascular<br />
mortality, temperature values and particulate levels over 6-day<br />
periods from Los Angeles County (1970-1979). The file also contains two<br />
dummy variables for regression purposes, a column of ones for the constant<br />
term and a time index. The order is as follows: Column 1: 508<br />
cardiovascular mortality values (6-day averages), Column 2: 508 ones,<br />
Column 3: the integers 1, 2, . . . , 508, Column 3: Temperature in degrees<br />
F and Column 4: Particulate levels. A reference is Shumway et al (1988).<br />
The point here is to examine possible relations between the temperature<br />
and mortality in the presence of a time trend in cardiovascular mortality.<br />
(a) Use scatter diagrams to argue that particulate level may be linearly<br />
related to mortality and that temperature has either a linear<br />
or quadratic relation. Check for lagged relations using the cross<br />
correlation function.
Chapter 2 Problems 59<br />
(b) Adjust temperature for its mean value, using the Scale option and<br />
fit the model<br />
Mt = β0 + β1(Tt − ¯ T ) + β2(Tt − ¯ T ) 2 + β3Pt + et,<br />
where Mt, Tt and Pt denote the mortality, temperature and particulate<br />
pollution series. You can use as inputs Columns 2 and 3 for the<br />
trend terms and run the regression analysis without the constant<br />
option. Note that you need to transform temperature first. Retain<br />
the residuals for the next part of the problem.<br />
(c) Plot the residuals and compute the autocorrelation (ACF) and partial<br />
autocorrelation (PACF) functions. Do the residuals appear to<br />
be white? Suggest an ARIMA model for the residuals and fit the<br />
residuals. The simple ARIMA(2, 0, 0) model is a good compromise.<br />
(d) Apply the ARIMA model obtained in part (c) to all of the input<br />
variables and to cardiovascular mortality using the ARIMA transformation<br />
option. Retain the forecast values for the transformed<br />
mortality, say ˆmt = Mt − ˆ φ1Mt−1 − ˆ φ2Mt−2.<br />
2.7 Generate 10 realizations of a (n = 200 points each) series from an<br />
ARIMA(1,0,1) Model with φ1 = .90, θ1 = .20 and σ 2 = .25. Fit the<br />
ARIMA model to each of the series and compare the estimators to the<br />
true values by computing the average of the estimators and their standard<br />
deviations.<br />
2.8 Consider the bivariate time series record containing monthly U.S. Production<br />
as measured monthly by the Federal Reserve Board Production<br />
Index and unemployment as given in the file frb.asd. The file contains<br />
n = 372 monthly values for each series. Before you begin, be sure to plot<br />
the series. Fit a seasonal ARIMA model of your choice to the Federal<br />
Reserve Production Index. Develop a 12 month forecast using the model.<br />
2.9 The file labeled clim-hyd.asd has 454 months of measured values for<br />
the climatic variables Air Temperature, Dew Point, Cloud Cover, Wind<br />
Speed, Preciptation, and Inflow at Shasta Lake. We would like to look at<br />
possible relations between the weather factors and between the weather<br />
factors and the inflow to Shasta Lake.<br />
(a) Fit the ARIMA(0, 0, 0) × (0, 1, 1)12 model to transformed precipitation<br />
Pt = √ pt and transformed flow it = log it. Save the residuals<br />
for transformed precipitation for use in part (b).<br />
(b) Apply the ARIMA model fitted in part (a) for transformed precipitation<br />
to the flow series. Compute the cross correlation between<br />
the flow residuals using the precipitation ARIMA model and the<br />
precipitation residuals using the precipitation model and interpret.
60 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
200<br />
150<br />
100<br />
50<br />
0<br />
1000<br />
800<br />
600<br />
400<br />
200<br />
0<br />
Federal Reserve Board Production Index<br />
50 100 150 200 250 300 350<br />
Monthly Unemployment<br />
50 100 150 200<br />
month<br />
250 300 350<br />
Figure <strong>2.1</strong>0 Federal Reserve Board Production and Unemployment for Problem 2.7.<br />
Use the coefficients from the ARIMA model in the transform option<br />
in the main menu to construct the transformed flow residuals. Suggest<br />
two possible models for relating the two series. More analysis<br />
can be done using the transfer function models of Chapter 4.
R Notes 61<br />
2.9 Chapter 2 R Notes<br />
The function arima() is used to do ARIMA fits in R. If you want to fit an<br />
ARIMA(p, d, q) model to the time series x, the command would be<br />
>model = arima(x,order=c(p,d,q))<br />
To include a seasonal (P, D, Q)S component, use<br />
>model =<br />
arima(x,order=c(p,d,q),seasonal=list(order=c(P,D,Q),season=S))<br />
A call of<br />
>model<br />
will provide a decent summary of the ARIMA fit.<br />
>model$residuals<br />
will contain the residuals to your arima fit.<br />
>model$loglik<br />
will give the log-likelihood of the fit.<br />
>model$aic<br />
will give the Akaike’s Information Criteria for the fit (recall that this is useful<br />
in model selection).<br />
To get AICc, use the following code (we assume the object model holds the<br />
fit of your model, K is the number of parameters you are fitting, and N is the<br />
length of your series):<br />
>AICc = log(model$sigma2)+(n+K)/(n-K-2)<br />
One final note that may be of use: to predict (say 5) future observations<br />
given a fit, use<br />
>future = predict(model,n.ahead=5)
62 1 <strong>Univariate</strong> <strong>Time</strong> <strong>Series</strong> <strong>Models</strong><br />
<strong>2.1</strong>0 Chapter 2 ASTSA Notes<br />
8. Regression Analysis<br />
<strong>Time</strong> domain →Multiple Regression<br />
Model (without constant):<br />
Model (with constant):<br />
<strong>Series</strong>(dependent):yt<br />
yt = β1zt1 + β2zt2 + . . . + βqztq + et<br />
yt = β0 + β1zt1 + β2zt2 + . . . + βqztq + et<br />
No. of independent series: q<br />
series 1: zt1−h1<br />
lag: h1 Often is zero<br />
· · ·<br />
series q: ztq−hq<br />
lag: hq Often is zero<br />
forecasts: 0<br />
constant(y/n):<br />
selector(AIC,AICc, BIC, FPEL, AICL): AICc<br />
Save →Residuals<br />
Save →Predicted<br />
9. Fit ARIMA(p, d, q) × (P, D, Q)s<br />
<strong>Time</strong> Domain →ARIMA<br />
<strong>Series</strong>:<br />
p: AR order<br />
d: Difference<br />
q: MA order<br />
P: SAR order<br />
D: Seasonal Difference
ASTSA Notes 63<br />
Q: SMA order<br />
season: s<br />
forecasts: h<br />
use .1 guess(y/n): y<br />
selector(AIC,AICc, BIC, FPEL, AICL): AICc<br />
Save →Residuals<br />
Save →Predicted<br />
10. ARIMA Transformation<br />
Transform →Transform →ARIMA Residual<br />
<strong>Series</strong>:<br />
p: AR order<br />
d: Difference<br />
q: MA order<br />
P: SAR order<br />
D: Seasonal Difference<br />
Q: SMA order<br />
season: s