25.03.2013 Views

Time Series - Data and Statistical Services - Princeton University

Time Series - Data and Statistical Services - Princeton University

Time Series - Data and Statistical Services - Princeton University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Time</strong> <strong>Series</strong><br />

(ver. 1.5)<br />

Oscar Torres-Reyna<br />

<strong>Data</strong> Consultant<br />

otorres@princeton.edu<br />

http://dss.princeton.edu/training/<br />

PU/DSS/OTR


If you have a format like ‘date1’ type<br />

-----STATA 10.x/11.x:<br />

gen datevar = date(date1,"DMY", 2012)<br />

format datevar %td /*For daily data*/<br />

-----STATA 9.x:<br />

gen datevar = date(date1,"dmy", 2012)<br />

format datevar %td /*For daily data*/<br />

If you have a format like ‘date2’ type<br />

-----STATA 10.x/11.x:<br />

gen datevar = date(date2,"MDY", 2012)<br />

format datevar %td /*For daily data*/<br />

------STATA 9.x:<br />

gen datevar = date(date2,"mdy", 2012)<br />

format datevar %td /*For daily data*/<br />

If you have a format like ‘date3’ type<br />

------STATA 10.x/11.x:<br />

tostring date3, gen(date3a)<br />

gen datevar=date(date3a,"YMD")<br />

format datevar %td /*For daily data*/<br />

------STATA 9.x:<br />

tostring date3, gen(date3a)<br />

gen year=substr(date3a,1,4)<br />

gen month=substr(date3a,5,2)<br />

gen day=substr(date3a,7,2)<br />

destring year month day, replace<br />

gen datevar1 = mdy(month,day,year)<br />

format datevar1 %td /*For daily data*/<br />

If you have a format like ‘date4’ type<br />

See http://www.princeton.edu/~otorres/Stata/<br />

Date variable<br />

2<br />

PU/DSS/OTR


If the original date variable is string (i.e. color red):<br />

gen week= weekly(stringvar,"wy")<br />

gen month= monthly(stringvar,"my")<br />

gen quarter= quarterly(stringvar,"qy")<br />

gen half = halfyearly(stringvar,"hy")<br />

gen year= yearly(stringvar,"y")<br />

If the components of the original date are in different numeric variables (i.e. color black):<br />

gen daily = mdy(month,day,year)<br />

gen week = yw(year, week)<br />

gen month = ym(year,month)<br />

gen quarter = yq(year,quarter)<br />

gen half = yh(year,half-year)<br />

To extract days of the week (Monday, Tuesday, etc.) use the function dow()<br />

gen dayofweek= dow(date)<br />

Date variable (cont.)<br />

Replace “date” with the date variable in your dataset. This will create the variable ‘dayofweek’ where 0 is ‘Sunday’, 1 is<br />

‘Monday’, etc. (type help dow for more details)<br />

To specify a range of dates (or integers in general) you can use the tin() <strong>and</strong> twithin() functions. tin() includes the<br />

first <strong>and</strong> last date, twithin() does not. Use the format of the date variable in your dataset.<br />

/* Make sure to set your data as time series before using tin/twithin */<br />

tsset date<br />

regress y x1 x2 if tin(01jan1995,01jun1995)<br />

regress y x1 x2 if twithin(01jan2000,01jan2001)<br />

NOTE: Remember to format the date variable accordingly. After creating it type:<br />

format datevar %t? /*Change ‘datevar’ with your date variable*/<br />

Change “?” with the correct format: w (week), m (monthly), q (quarterly), h (half), y (yearly).<br />

NOTE: Remember to format the date variable accordingly. After creating it type:<br />

format datevar %t? /*Change ‘datevar’ with your date variable*/<br />

Change “?” with the correct format: w (week), m (monthly), q (quarterly), h (half), y (yearly).<br />

3<br />

PU/DSS/OTR


<strong>Time</strong> series data is data collected over time for a single or a group of variables.<br />

For this kind of data the first thing to do is to check the variable that contains the time or date range <strong>and</strong> make<br />

sure is the one you need: yearly, monthly, quarterly, daily, etc.<br />

The next step is to verify it is in the correct format. In the example below the time variable is stored in “date”<br />

but it is a string variable not a date variable. In Stata you need to convert this string variable to a date<br />

variable.*<br />

A closer inspection of the variable, for the years 2000 the format<br />

changes, we need to create a new variable with a uniform format.<br />

Type the following:<br />

use http://dss.princeton.edu/training/tsdata.dta<br />

gen date1=substr(date,1,7)<br />

gen datevar=quarterly(date1,"yq")<br />

format datevar %tq<br />

browse date date1 datevar<br />

See next page.<br />

For more details type help date<br />

*<strong>Data</strong> source: Stock & Watson’s companion materials<br />

Date variable (example)<br />

4<br />

PU/DSS/OTR


The date variable is now stored in datevar as a date variable.<br />

Date variable (example, cont.)<br />

5<br />

PU/DSS/OTR


Setting as time series: tsset<br />

Once you have the date variable in a ‘date format’ you need to declare your data as time series in order to<br />

use the time series operators. In Stata type:<br />

tsset datevar<br />

. tsset datevar<br />

time variable: datevar, 1957q1 to 2005q1<br />

delta: 1 quarter<br />

If you have gaps in your time series, for example there may not be data available for weekends. This<br />

complicates the analysis using lags for those missing dates. In this case you may want to create a continuous<br />

time trend as follows:<br />

gen time = _n<br />

Then use it to set the time series:<br />

tsset time<br />

In the case of cross-sectional time series type:<br />

sort panel date<br />

by panel: gen time = _n<br />

xtset panel time<br />

6<br />

PU/DSS/OTR


Use the comm<strong>and</strong> tsfill to fill in the gap in the time series. You need to tset, tsset or xtset the data<br />

before using tsfill. In the example below:<br />

tset quarters<br />

tsfill<br />

Type help tsfill for more details.<br />

Filling gaps in time variables<br />

7<br />

PU/DSS/OTR


Subsetting tin/twithin<br />

With tsset (time series set) you can use two time series comm<strong>and</strong>s: tin (‘times in’, from a to b) <strong>and</strong><br />

twithin (‘times within’, between a <strong>and</strong> b, it excludes a <strong>and</strong> b). If you have yearly data just include the years.<br />

. list datevar unemp if tin(2000q1,2000q4)<br />

datevar unemp<br />

173. 2000q1 4.033333<br />

174. 2000q2 3.933333<br />

175. 2000q3 4<br />

176. 2000q4 3.9<br />

. list datevar unemp if twithin(2000q1,2000q4)<br />

datevar unemp<br />

174. 2000q2 3.933333<br />

175. 2000q3 4<br />

/* Make sure to set your data as time series before using tin/twithin */<br />

tsset date<br />

regress y x1 x2 if tin(01jan1995,01jun1995)<br />

regress y x1 x2 if twithin(01jan2000,01jan2001)<br />

8<br />

PU/DSS/OTR


See<br />

http://dss.princeton.edu/training/Merge101.pdf<br />

Merge/Append<br />

PU/DSS/OTR


Lag operators (lag)<br />

Another set of time series comm<strong>and</strong>s are the lags, leads, differences <strong>and</strong> seasonal operators.<br />

It is common to analyzed the impact of previous values on current ones.<br />

To generate values with past values use the “L” operator<br />

generate unempL1=L1.unemp<br />

generate unempL2=L2.unemp<br />

list datevar unemp unempL1 unempL2 in 1/5<br />

. generate unempL1=L1.unemp<br />

(1 missing value generated)<br />

. generate unempL2=L2.unemp<br />

(2 missing values generated)<br />

. list datevar unemp unempL1 unempL2 in 1/5<br />

datevar unemp unempL1 unempL2<br />

1. 1957q1 3.933333 . .<br />

2. 1957q2 4.1 3.933333 .<br />

3. 1957q3 4.233333 4.1 3.933333<br />

4. 1957q4 4.933333 4.233333 4.1<br />

5. 1958q1 6.3 4.933333 4.233333<br />

In a regression you could type:<br />

regress y x L1.x L2.x<br />

or regress y x L(1/5).x<br />

10<br />

PU/DSS/OTR


To generate forward or lead values use the “F” operator<br />

generate unempF1=F1.unemp<br />

generate unempF2=F2.unemp<br />

list datevar unemp unempF1 unempF2 in 1/5<br />

. generate unempF1=F1.unemp<br />

(1 missing value generated)<br />

. generate unempF2=F2.unemp<br />

(2 missing values generated)<br />

. list datevar unemp unempF1 unempF2 in 1/5<br />

datevar unemp unempF1 unempF2<br />

1. 1957q1 3.933333 4.1 4.233333<br />

2. 1957q2 4.1 4.233333 4.933333<br />

3. 1957q3 4.233333 4.933333 6.3<br />

4. 1957q4 4.933333 6.3 7.366667<br />

5. 1958q1 6.3 7.366667 7.333333<br />

Lag operators (forward)<br />

In a regression you could type:<br />

regress y x F1.x F2.x<br />

or regress y x F(1/5).x 11<br />

PU/DSS/OTR


Lag operators (difference)<br />

To generate the difference between current a previous values use the “D” operator<br />

generate unempD1=D1.unemp /* D1 = y t – y t-1 */<br />

generate unempD2=D2.unemp /* D2 = (y t – y t-1) – (y t-1 – y t-2) */<br />

list datevar unemp unempD1 unempD2 in 1/5<br />

. generate unempD1=D1.unemp<br />

(1 missing value generated)<br />

. generate unempD2=D2.unemp<br />

(2 missing values generated)<br />

. list datevar unemp unempD1 unempD2 in 1/5<br />

1. 1957q1 3.933333 . .<br />

2. 1957q2 4.1 .1666665 .<br />

3. 1957q3 4.233333 .1333332 -.0333333<br />

4. 1957q4 4.933333 .7000003 .5666671<br />

5. 1958q1 6.3 1.366667 .6666665<br />

In a regression you could type:<br />

datevar unemp unempD1 unempD2<br />

regress y x D1.x<br />

12<br />

PU/DSS/OTR


To generate seasonal differences use the “S” operator<br />

generate unempS1=S1.unemp /* S1 = y t – y t-1 */<br />

generate unempS2=S2.unemp /* S2 = (y t – y t-2) */<br />

list datevar unemp unempS1 unempS2 in 1/5<br />

. generate unempS1=S1.unemp<br />

(1 missing value generated)<br />

. generate unempS2=S2.unemp<br />

(2 missing values generated)<br />

. list datevar unemp unempS1 unempS2 in 1/5<br />

In a regression you could type:<br />

datevar unemp unempS1 unempS2<br />

regress y x S1.x<br />

Lag operators (seasonal)<br />

1. 1957q1 3.933333 . .<br />

2. 1957q2 4.1 .1666665 .<br />

3. 1957q3 4.233333 .1333332 .2999997<br />

4. 1957q4 4.933333 .7000003 .8333335<br />

5. 1958q1 6.3 1.366667 2.066667<br />

13<br />

PU/DSS/OTR


To explore autocorrelation, which is the correlation between a variable <strong>and</strong> its previous values,<br />

use the comm<strong>and</strong> corrgram. The number of lags depend on theory, AIC/BIC process or<br />

experience. The output includes autocorrelation coefficient <strong>and</strong> partial correlations coefficients<br />

used to specify an ARIMA model.<br />

corrgram unemp, lags(12)<br />

. corrgram unemp, lags(12)<br />

Correlograms: autocorrelation<br />

-1 0 1 -1 0 1<br />

LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]<br />

1 0.9641 0.9650 182.2 0.0000<br />

2 0.8921 -0.6305 339.02 0.0000<br />

3 0.8045 0.1091 467.21 0.0000<br />

4 0.7184 0.0424 569.99 0.0000<br />

5 0.6473 0.0836 653.86 0.0000<br />

6 0.5892 -0.0989 723.72 0.0000<br />

7 0.5356 -0.0384 781.77 0.0000<br />

8 0.4827 0.0744 829.17 0.0000<br />

9 0.4385 0.1879 868.5 0.0000<br />

10 0.3984 -0.1832 901.14 0.0000<br />

11 0.3594 -0.1396 927.85 0.0000<br />

12 0.3219 0.0745 949.4 0.0000<br />

AC shows that the<br />

correlation between the<br />

current value of unemp <strong>and</strong><br />

its value three quarters ago<br />

is 0.8045. AC can be use to<br />

define the q in MA(q) only<br />

in stationary series<br />

PAC shows that the<br />

correlation between the<br />

current value of unemp <strong>and</strong><br />

its value three quarters ago<br />

is 0.1091 without the effect<br />

of the two previous lags.<br />

PAC can be used to define<br />

the p in AR(p) only in<br />

stationary series<br />

Box-Pierce’ Q statistic tests<br />

the null hypothesis that all<br />

correlation up to lag k are<br />

equal to 0. This series show<br />

significant autocorrelation<br />

as shown in the Prob>Q<br />

value which at any k are less<br />

than 0.05, therefore<br />

rejecting the null that all lags<br />

are not autocorrelated.<br />

Graphic view of AC<br />

which shows a slow<br />

decay in the trend,<br />

suggesting nonstationarity.<br />

See also the ac<br />

comm<strong>and</strong>.<br />

Graphic view of PAC<br />

which does not show<br />

spikes after the second<br />

lag which suggests that<br />

all other lags are mirrors<br />

of the second lag. See the<br />

pac comm<strong>and</strong>.<br />

14<br />

PU/DSS/OTR


The explore the relationship between two time series use the comm<strong>and</strong> xcorr. The graph below shows the correlation<br />

between GDP quarterly growth rate <strong>and</strong> unemployment. When using xcorr list the independent variable first <strong>and</strong> the<br />

dependent variable second. type<br />

xcorr gdp unemp, lags(10) xlabel(-10(1)10,grid)<br />

Cross-correlations of gdp <strong>and</strong> unemp<br />

-1.00 -0.50 0.00 0.50 1.00<br />

Cross-correlogram<br />

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10<br />

Lag<br />

Correlograms: cross correlation<br />

-1.00 -0.50 0.00 0.50 1.00<br />

. xcorr gdp unemp, lags(10) table<br />

-1 0 1<br />

LAG CORR [Cross-correlation]<br />

-10 -0.1080<br />

-9 -0.1052<br />

-8 -0.1075<br />

-7 -0.1144<br />

-6 -0.1283<br />

-5 -0.1412<br />

-4 -0.1501<br />

-3 -0.1578<br />

-2 -0.1425<br />

-1 -0.1437<br />

0 -0.1853<br />

1 -0.1828<br />

2 -0.1685<br />

3 -0.1177<br />

4 -0.0716<br />

5 -0.0325<br />

6 -0.0111<br />

7 -0.0038<br />

8 0.0168<br />

9 0.0393<br />

10 0.0419<br />

At lag 0 there is a negative immediate correlation between GDP growth rate <strong>and</strong> unemployment. This means that a drop<br />

in GDP causes an immediate increase in unemployment.<br />

15<br />

PU/DSS/OTR


xcorr interest unemp, lags(10) xlabel(-10(1)10,grid)<br />

Cross-correlations of interest <strong>and</strong> unemp<br />

-1.00 -0.50 0.00 0.50 1.00<br />

Cross-correlogram<br />

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10<br />

Lag<br />

Interest rates have a positive effect on future level of<br />

unemployment, reaching the highest point at lag 8 (four<br />

quarters or two years). In this case, interest rates are positive<br />

correlated with unemployment rates eight quarters later.<br />

-1.00 -0.50 0.00 0.50 1.00<br />

Correlograms: cross correlation<br />

. xcorr interest unemp, lags(10) table<br />

-1 0 1<br />

LAG CORR [Cross-correlation]<br />

-10 0.3297<br />

-9 0.3150<br />

-8 0.2997<br />

-7 0.2846<br />

-6 0.2685<br />

-5 0.2585<br />

-4 0.2496<br />

-3 0.2349<br />

-2 0.2323<br />

-1 0.2373<br />

0 0.2575<br />

1 0.3095<br />

2 0.3845<br />

3 0.4576<br />

4 0.5273<br />

5 0.5850<br />

6 0.6278<br />

7 0.6548<br />

8 0.6663<br />

9 0.6522<br />

10 0.6237<br />

16<br />

PU/DSS/OTR


. varsoc gdp cpi, maxlag(10)<br />

Selection-order criteria<br />

Sample: 1959q4 - 2005q1 Number of obs = 182<br />

lag LL LR df p FPE AIC HQIC SBIC<br />

0 -1294.75 5293.32 14.25 14.2642 14.2852<br />

1 -467.289 1654.9 4 0.000 .622031 5.20098 5.2438 5.30661<br />

2 -401.381 131.82 4 0.000 .315041 4.52067 4.59204 4.69672*<br />

3 -396.232 10.299 4 0.036 .311102 4.50804 4.60796 4.75451<br />

4 -385.514 21.435* 4 0.000 .288988* 4.43422* 4.56268* 4.7511<br />

5 -383.92 3.1886 4 0.527 .296769 4.46066 4.61766 4.84796<br />

6 -381.135 5.5701 4 0.234 .300816 4.47401 4.65956 4.93173<br />

7 -379.062 4.1456 4 0.387 .307335 4.49519 4.70929 5.02332<br />

8 -375.483 7.1585 4 0.128 .308865 4.49981 4.74246 5.09836<br />

9 -370.817 9.3311 4 0.053 .306748 4.4925 4.76369 5.16147<br />

10 -370.585 .46392 4 0.977 .319888 4.53391 4.83364 5.27329<br />

Endogenous: gdp cpi<br />

Exogenous: _cons<br />

Lag selection<br />

Too many lags could increase the error in the forecasts, too few could leave out relevant information*.<br />

Experience, knowledge <strong>and</strong> theory are usually the best way to determine the number of lags needed. There<br />

are, however, information criterion procedures to help come up with a proper number. Three commonly used<br />

are: Schwarz's Bayesian information criterion (SBIC), the Akaike's information criterion (AIC), <strong>and</strong> the<br />

Hannan <strong>and</strong> Quinn information criterion (HQIC). All these are reported by the comm<strong>and</strong> ‘varsoc’ in Stata.<br />

When all three agree, the selection is clear, but what happens when getting conflicting results? A paper from<br />

the CEPR suggests, in the context of VAR models, that AIC tends to be more accurate with monthly data,<br />

HQIC works better for quarterly data on samples over 120 <strong>and</strong> SBIC works fine with any sample size for<br />

quarterly data (on VEC models)**. In our example above we have quarterly data with 182 observations,<br />

HQIC suggest a lag of 4 (which is also suggested by AIC).<br />

* See Stock & Watson for more details <strong>and</strong> on how to estimate BIC <strong>and</strong> SIC<br />

** Ivanov, V. <strong>and</strong> Kilian, L. 2001. 'A Practitioner's Guide to Lag-Order Selection for Vector Autoregressions'. CEPR Discussion Paper no. 2685. London, Centre for Economic Policy<br />

Research. http://www.cepr.org/pubs/dps/DP2685.asp.<br />

17<br />

PU/DSS/OTR


Having a unit root in a series mean that there is more than one trend in the series.<br />

. regress unemp gdp if tin(1965q1,1981q4)<br />

Source SS df MS Number of obs = 68<br />

F( 1, 66) = 19.14<br />

Model 36.1635247 1 36.1635247 Prob > F = 0.0000<br />

Residual 124.728158 66 1.88982058 R-squared = 0.2248<br />

Adj R-squared = 0.2130<br />

Total 160.891683 67 2.4013684 Root MSE = 1.3747<br />

unemp Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />

gdp -.4435909 .1014046 -4.37 0.000 -.6460517 -.2411302<br />

_cons 7.087789 .3672397 19.30 0.000 6.354572 7.821007<br />

. regress unemp gdp if tin(1982q1,2000q4)<br />

Source SS df MS Number of obs = 76<br />

F( 1, 74) = 3.62<br />

Model 8.83437339 1 8.83437339 Prob > F = 0.0608<br />

Residual 180.395848 74 2.43778172 R-squared = 0.0467<br />

Adj R-squared = 0.0338<br />

Total 189.230221 75 2.52306961 Root MSE = 1.5613<br />

unemp Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />

gdp .3306551 .173694 1.90 0.061 -.0154377 .6767479<br />

_cons 5.997169 .2363599 25.37 0.000 5.526211 6.468126<br />

Unit roots<br />

18<br />

PU/DSS/OTR


Unemployment rate.<br />

line unemp datevar<br />

Unemployment Rate<br />

4 6 8 10 12<br />

Unit roots<br />

1957q1 1969q1 1981q1 1993q1 2005q1<br />

datevar<br />

19<br />

PU/DSS/OTR


The Dickey-Fuller test is one of the most commonly use tests for stationarity. The null<br />

hypothesis is that the series has a unit root. The test statistic shows that the unemployment<br />

series have a unit root, it lies within the acceptance region.<br />

One way to deal with stochastic trends (unit root) is by taking the first difference of the variable<br />

(second test below).<br />

. dfuller unemp, lag(5)<br />

MacKinnon approximate p-value for Z(t) = 0.0936<br />

Unit root test<br />

Augmented Dickey-Fuller test for unit root Number of obs = 187<br />

Unit root<br />

Interpolated Dickey-Fuller<br />

Test 1% Critical 5% Critical 10% Critical<br />

Statistic Value Value Value<br />

Z(t) -2.597 -3.481 -2.884 -2.574<br />

. dfuller unempD1, lag(5)<br />

Augmented Dickey-Fuller test for unit root Number of obs = 186<br />

No unit root<br />

Interpolated Dickey-Fuller<br />

Test 1% Critical 5% Critical 10% Critical<br />

Statistic Value Value Value<br />

Z(t) -5.303 -3.481 -2.884 -2.574<br />

MacKinnon approximate p-value for Z(t) = 0.0000<br />

20<br />

PU/DSS/OTR


Cointegration refers to the fact that two or more series share an stochastic trend (Stock &<br />

Watson). Engle <strong>and</strong> Granger (1987) suggested a two step process to test for cointegration (an<br />

OLS regression <strong>and</strong> a unit root test), the EG-ADF test.<br />

regress unemp gdp<br />

predict e, resid<br />

dfuller e, lags(10)<br />

Augmented Dickey-Fuller test for unit root Number of obs = 181<br />

Unit root*<br />

Run an OLS regression<br />

Get the residuals<br />

Run a unit root test on the residuals.<br />

Interpolated Dickey-Fuller<br />

Test 1% Critical 5% Critical 10% Critical<br />

Statistic Value Value Value<br />

Z(t) -2.535 -3.483 -2.885 -2.575<br />

MacKinnon approximate p-value for Z(t) = 0.1071<br />

Testing for cointegration<br />

Both variables are not cointegrated<br />

See Stock & Watson for a table of critical values for the unit root test <strong>and</strong> the theory behind.<br />

*Critical value for one independent variable in the OLS regression, at 5% is -3.41 (Stock & Watson)<br />

21<br />

PU/DSS/OTR


If you regress ‘y’ on lagged values of ‘y’ <strong>and</strong> ‘x’ <strong>and</strong> the coefficients of the lag of ‘x’ are<br />

statistically significantly different from 0, then you can argue that ‘x’ Granger-cause ‘y’, this is,<br />

‘x’ can be used to predict ‘y’ (see Stock & Watson -2007-, Green -2008).<br />

1<br />

2<br />

. regress unemp L(1/4).unemp L(1/4).gdp<br />

. test L1.gdp L2.gdp L3.gdp L4.gdp<br />

.<br />

Source SS df MS Number of obs = 188<br />

F( 8, 179) = 668.37<br />

Model 373.501653 8 46.6877066 Prob > F = 0.0000<br />

Residual 12.5037411 179 .069853302 R-squared = 0.9676<br />

Adj R-squared = 0.9662<br />

Total 386.005394 187 2.06419997 Root MSE = .2643<br />

unemp Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />

unemp<br />

L1. 1.625708 .0763035 21.31 0.000 1.475138 1.776279<br />

L2. -.7695503 .1445769 -5.32 0.000 -1.054845 -.484256<br />

L3. .0868131 .1417562 0.61 0.541 -.1929152 .3665415<br />

L4. .0217041 .0726137 0.30 0.765 -.1215849 .1649931<br />

gdp<br />

L1. .0060996 .0136043 0.45 0.654 -.0207458 .0329451<br />

L2. -.0189398 .0128618 -1.47 0.143 -.0443201 .0064405<br />

L3. .0247494 .0130617 1.89 0.060 -.0010253 .0505241<br />

L4. .003637 .0129079 0.28 0.778 -.0218343 .0291083<br />

_cons .1702419 .096857 1.76 0.081 -.0208865 .3613704<br />

( 1) L.gdp = 0<br />

( 2) L2.gdp = 0<br />

( 3) L3.gdp = 0<br />

( 4) L4.gdp = 0<br />

F( 4, 179) = 1.67<br />

Prob > F = 0.1601<br />

Granger causality: using OLS<br />

You cannot reject the null hypothesis that all<br />

coefficients of lag of ‘x’ are equal to 0.<br />

Therefore ‘gdp’ does not Granger-cause<br />

‘unemp’.<br />

22<br />

PU/DSS/OTR


The following procedure uses VAR models to estimate Granger causality using the comm<strong>and</strong><br />

‘vargranger’<br />

1<br />

2<br />

. quietly var unemp gdp, lags(1/4)<br />

. vargranger<br />

Granger causality Wald tests<br />

Granger causality: using VAR<br />

Equation Excluded chi2 df Prob > chi2<br />

unemp gdp 6.9953 4 0.136<br />

unemp ALL 6.9953 4 0.136<br />

gdp unemp 6.8658 4 0.143<br />

gdp ALL 6.8658 4 0.143<br />

The null hypothesis is ‘var1 does not Granger-cause var2’. In both cases, we cannot reject<br />

the null that each variable does not Granger-cause the other<br />

23<br />

PU/DSS/OTR


The Chow test allows to test whether a particular date causes a break in the regression coefficients. It is named after Gregory<br />

Chow (1960)*.<br />

Step 1. Create a dummy variable where 1 if date > break date <strong>and</strong> 0 tq(1981q4))<br />

Step 2. Create interaction terms between the lags of the independent variables <strong>and</strong> the lag of the dependent variables. We will<br />

assume lag 1 for this example (the number of lags depends on your theory/data)<br />

generate break_unemp = break*l1.unemp<br />

generate break_gdp = break*l1.gdp<br />

Step 3. Run a regression between the outcome variables (in this case ‘unemp’) <strong>and</strong> the independent along with the interactions<br />

<strong>and</strong> the dummy for the break.<br />

reg unemp l1.unemp l1.gdp break break_unemp break_gdp<br />

Step 4. Run an F-test on the coefficients for the interactions <strong>and</strong> the dummy for the break<br />

test break break_unemp break_gdp<br />

. test break break_unemp break_gdp<br />

( 1) break = 0<br />

( 2) break_unemp = 0<br />

( 3) break_gdp = 0<br />

F( 3, 185) = 1.14<br />

Prob > F = 0.3351<br />

Chow test (testing for known breaks)<br />

Change “tq” with the correct date format: tw (week), tm (monthly), tq (quarterly), th (half), ty (yearly) <strong>and</strong> the<br />

corresponding date format in the parenthesis<br />

The null hypothesis is no break. If the p-value<br />

is < 0.05 reject the null in favor of the<br />

alternative that there is a break. In this<br />

example, we fail to reject the null <strong>and</strong><br />

conclude that the first quarter of 1982 does<br />

not cause a break in the regression<br />

coefficients.<br />

* See Stock & Watson for more details<br />

24<br />

PU/DSS/OTR


The Qu<strong>and</strong>t likelihood ratio (QLR test –Qu<strong>and</strong>t,1960) or sup-Wald statistic is a modified version of the Chow test used to identify<br />

break dates. The following is a modified procedure taken from Stock & Watson’s companion materials to their book Introduction<br />

to Econometrics, I strongly advise to read the corresponding chapter to better underst<strong>and</strong> the procedure <strong>and</strong> to check the critical<br />

values for the QLR statistic. Below we will check for breaks in a GDP per-capita series (quarterly).<br />

/* Replace the words in bold with your own variables, do not change anything else*/<br />

/* The log file ‘qlrtest.log’ will have the list for QLR statistics (use Word to read it)*/<br />

/* See next page for a graph*/<br />

/* STEP 1. Copy-<strong>and</strong>-paste-run the code below to a do-file, double-check the quotes (re-type them if necessary)*/<br />

log using qlrtest.log<br />

tset datevar<br />

sum datevar<br />

local time=r(max)-r(min)+1<br />

local i = round(`time'*.15)<br />

local f = round(`time'*.85)<br />

local var = "gdp"<br />

gen diff`var' = d.`var'<br />

gen chow`var' = .<br />

gen qlr`var' = .<br />

set more off<br />

while `i'


Testing for unknown breaks: graph<br />

/* Replace the words in bold with your own variables, do not change anything else*/<br />

/* The code will produce the graph shown in the next page*/<br />

/* The critical value 3.66 is for q=5 (constant <strong>and</strong> four lags) <strong>and</strong> 5% significance*/<br />

/* STEP 2. Copy-<strong>and</strong>-paste-run the code below to a do-file, double-check the quotes (re-type them if necessary)*/<br />

sum qlr`var'<br />

local maxvalue=r(max)<br />

gen maxdate=datevar if qlr`var'==`maxvalue'<br />

local maxvalue1=round(`maxvalue',0.01)<br />

local critical=3.66 /*Replace with the appropriate critical value (see Stock & Watson)*/<br />

sum datevar<br />

local mindate=r(min)<br />

sum maxdate<br />

local maxdate=r(max)<br />

gen break=datevar if qlr`var'>=`critical' & qlr`var'!=.<br />

dis "Below are the break dates..."<br />

list datevar qlr`var' if break!=.<br />

levelsof break, local(break1)<br />

twoway tsline qlr`var', title(Testing for breaks in GDP per-capita (1957-2005)) ///<br />

xlabel(`break1', angle(90) labsize(0.9) alternate) ///<br />

yline(`critical') ytitle(QLR statistic) xtitle(<strong>Time</strong>) ///<br />

ttext(`critical' `mindate' "Critical value 5% (`critical')", placement(ne)) ///<br />

ttext(`maxvalue' `maxdate' "Max QLR = `maxvalue1'", placement(e))<br />

datevar qlrgdp<br />

129. 1989q1 3.823702<br />

131. 1989q3 6.285852<br />

132. 1989q4 6.902882<br />

133. 1990q1 5.416068<br />

134. 1990q2 7.769114<br />

135. 1990q3 8.354294<br />

136. 1990q4 5.399252<br />

137. 1991q1 8.524492<br />

138. 1991q2 6.007093<br />

139. 1991q3 4.44151<br />

140. 1991q4 4.946689<br />

141. 1992q1 3.699911<br />

160. 1996q4 3.899656<br />

161. 1997q1 3.906271<br />

26<br />

PU/DSS/OTR


QLR statistic<br />

0 2 4 6 8<br />

Critical value 5% (3.66)<br />

Testing for unknown breaks: graph (cont.)<br />

Testing for breaks in GDP per-capita (1957-2005)<br />

<strong>Time</strong><br />

1989q1<br />

1989q3<br />

1989q4<br />

1990q1<br />

1990q2<br />

1990q3<br />

1990q4<br />

1991q1<br />

1991q2<br />

1991q3<br />

1991q4<br />

1992q1<br />

Max QLR = 8.52<br />

1996q4<br />

1997q1<br />

27<br />

PU/DSS/OTR


White noise refers to the fact that a variable does not have autocorrelation. In Stata use the<br />

wntestq (white noise Q test) to check for autocorrelation. The null is that there is no serial<br />

correlation (type help wntestq for more details):<br />

. wntestq unemp<br />

Portmanteau test for white noise<br />

Portmanteau (Q) statistic = 1044.6341<br />

Prob > chi2(40) = 0.0000<br />

. wntestq unemp, lags(10)<br />

Portmanteau test for white noise<br />

Portmanteau (Q) statistic = 901.1399<br />

Prob > chi2(10) = 0.0000<br />

<strong>Time</strong> <strong>Series</strong>: white noise<br />

Serial correlation<br />

If your variable is not white noise then see the page on correlograms to see the order of the<br />

autocorrelation.<br />

28<br />

PU/DSS/OTR


Breush-Godfrey <strong>and</strong> Durbin-Watson are used to test for serial correlation. The null in both tests<br />

is that there is no serial correlation (type help estat dwatson, help estat dubinalt<br />

<strong>and</strong> help estat bgodfrey for more details).<br />

. regress unempd gdp<br />

<strong>Time</strong> <strong>Series</strong>: Testing for serial correlation<br />

Source SS df MS Number of obs = 192<br />

F( 1, 190) = 1.62<br />

Model .205043471 1 .205043471 Prob > F = 0.2048<br />

Residual 24.0656991 190 .126661574 R-squared = 0.0084<br />

Adj R-squared = 0.0032<br />

Total 24.2707425 191 .12707195 Root MSE = .3559<br />

unempd1 Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />

gdp -.015947 .0125337 -1.27 0.205 -.0406701 .0087761<br />

_cons .0401123 .036596 1.10 0.274 -.0320743 .1122989<br />

. estat dwatson<br />

Durbin-Watson d-statistic( 2, 192) = .7562744<br />

. estat durbinalt<br />

Durbin's alternative test for autocorrelation<br />

lags(p) chi2 df Prob > chi2<br />

1 118.790 1 0.0000<br />

. estat bgodfrey<br />

H0: no serial correlation<br />

Breusch-Godfrey LM test for autocorrelation<br />

lags(p) chi2 df Prob > chi2<br />

1 74.102 1 0.0000<br />

H0: no serial correlation<br />

Serial correlation<br />

29<br />

PU/DSS/OTR


Run a Cochrane-Orcutt regression using the prais comm<strong>and</strong> (type help prais for more<br />

details)<br />

. prais unemp gdp, corc<br />

Iteration 0: rho = 0.0000<br />

Iteration 1: rho = 0.9556<br />

Iteration 2: rho = 0.9660<br />

Iteration 3: rho = 0.9661<br />

Iteration 4: rho = 0.9661<br />

<strong>Time</strong> <strong>Series</strong>: Correcting for serial correlation<br />

Cochrane-Orcutt AR(1) regression -- iterated estimates<br />

Source SS df MS Number of obs = 191<br />

F( 1, 189) = 2.48<br />

Model .308369041 1 .308369041 Prob > F = 0.1167<br />

Residual 23.4694088 189 .124176766 R-squared = 0.0130<br />

Adj R-squared = 0.0077<br />

Total 23.7777778 190 .125146199 Root MSE = .35239<br />

unemp Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />

gdp -.020264 .0128591 -1.58 0.117 -.0456298 .0051018<br />

_cons 6.105931 .7526023 8.11 0.000 4.621351 7.590511<br />

rho .966115<br />

Durbin-Watson statistic (original) 0.087210<br />

Durbin-Watson statistic (transformed) 0.758116<br />

30<br />

PU/DSS/OTR


Useful links / Recommended books<br />

• DSS Online Training Section http://dss.princeton.edu/training/<br />

• UCLA Resources to learn <strong>and</strong> use STATA http://www.ats.ucla.edu/stat/stata/<br />

• DSS help-sheets for STATA http://dss/online_help/stats_packages/stata/stata.htm<br />

• Introduction to Stata (PDF), Christopher F. Baum, Boston College, USA. “A 67-page description of Stata, its key<br />

features <strong>and</strong> benefits, <strong>and</strong> other useful information.” http://fmwww.bc.edu/GStat/docs/StataIntro.pdf<br />

• STATA FAQ website http://stata.com/support/faqs/<br />

• <strong>Princeton</strong> DSS Libguides http://libguides.princeton.edu/dss<br />

Books<br />

• Introduction to econometrics / James H. Stock, Mark W. Watson. 2nd ed., Boston: Pearson Addison<br />

Wesley, 2007.<br />

• <strong>Data</strong> analysis using regression <strong>and</strong> multilevel/hierarchical models / Andrew Gelman, Jennifer Hill.<br />

Cambridge ; New York : Cambridge <strong>University</strong> Press, 2007.<br />

• Econometric analysis / William H. Greene. 6th ed., Upper Saddle River, N.J. : Prentice Hall, 2008.<br />

• Designing Social Inquiry: Scientific Inference in Qualitative Research / Gary King, Robert O.<br />

Keohane, Sidney Verba, <strong>Princeton</strong> <strong>University</strong> Press, 1994.<br />

• Unifying Political Methodology: The Likelihood Theory of <strong>Statistical</strong> Inference / Gary King, Cambridge<br />

<strong>University</strong> Press, 1989<br />

• <strong>Statistical</strong> Analysis: an interdisciplinary introduction to univariate & multivariate methods / Sam<br />

Kachigan, New York : Radius Press, c1986<br />

• Statistics with Stata (updated for version 9) / Lawrence Hamilton, Thomson Books/Cole, 2006<br />

31<br />

PU/DSS/OTR

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!