Time Series - Data and Statistical Services - Princeton University
Time Series - Data and Statistical Services - Princeton University
Time Series - Data and Statistical Services - Princeton University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Time</strong> <strong>Series</strong><br />
(ver. 1.5)<br />
Oscar Torres-Reyna<br />
<strong>Data</strong> Consultant<br />
otorres@princeton.edu<br />
http://dss.princeton.edu/training/<br />
PU/DSS/OTR
If you have a format like ‘date1’ type<br />
-----STATA 10.x/11.x:<br />
gen datevar = date(date1,"DMY", 2012)<br />
format datevar %td /*For daily data*/<br />
-----STATA 9.x:<br />
gen datevar = date(date1,"dmy", 2012)<br />
format datevar %td /*For daily data*/<br />
If you have a format like ‘date2’ type<br />
-----STATA 10.x/11.x:<br />
gen datevar = date(date2,"MDY", 2012)<br />
format datevar %td /*For daily data*/<br />
------STATA 9.x:<br />
gen datevar = date(date2,"mdy", 2012)<br />
format datevar %td /*For daily data*/<br />
If you have a format like ‘date3’ type<br />
------STATA 10.x/11.x:<br />
tostring date3, gen(date3a)<br />
gen datevar=date(date3a,"YMD")<br />
format datevar %td /*For daily data*/<br />
------STATA 9.x:<br />
tostring date3, gen(date3a)<br />
gen year=substr(date3a,1,4)<br />
gen month=substr(date3a,5,2)<br />
gen day=substr(date3a,7,2)<br />
destring year month day, replace<br />
gen datevar1 = mdy(month,day,year)<br />
format datevar1 %td /*For daily data*/<br />
If you have a format like ‘date4’ type<br />
See http://www.princeton.edu/~otorres/Stata/<br />
Date variable<br />
2<br />
PU/DSS/OTR
If the original date variable is string (i.e. color red):<br />
gen week= weekly(stringvar,"wy")<br />
gen month= monthly(stringvar,"my")<br />
gen quarter= quarterly(stringvar,"qy")<br />
gen half = halfyearly(stringvar,"hy")<br />
gen year= yearly(stringvar,"y")<br />
If the components of the original date are in different numeric variables (i.e. color black):<br />
gen daily = mdy(month,day,year)<br />
gen week = yw(year, week)<br />
gen month = ym(year,month)<br />
gen quarter = yq(year,quarter)<br />
gen half = yh(year,half-year)<br />
To extract days of the week (Monday, Tuesday, etc.) use the function dow()<br />
gen dayofweek= dow(date)<br />
Date variable (cont.)<br />
Replace “date” with the date variable in your dataset. This will create the variable ‘dayofweek’ where 0 is ‘Sunday’, 1 is<br />
‘Monday’, etc. (type help dow for more details)<br />
To specify a range of dates (or integers in general) you can use the tin() <strong>and</strong> twithin() functions. tin() includes the<br />
first <strong>and</strong> last date, twithin() does not. Use the format of the date variable in your dataset.<br />
/* Make sure to set your data as time series before using tin/twithin */<br />
tsset date<br />
regress y x1 x2 if tin(01jan1995,01jun1995)<br />
regress y x1 x2 if twithin(01jan2000,01jan2001)<br />
NOTE: Remember to format the date variable accordingly. After creating it type:<br />
format datevar %t? /*Change ‘datevar’ with your date variable*/<br />
Change “?” with the correct format: w (week), m (monthly), q (quarterly), h (half), y (yearly).<br />
NOTE: Remember to format the date variable accordingly. After creating it type:<br />
format datevar %t? /*Change ‘datevar’ with your date variable*/<br />
Change “?” with the correct format: w (week), m (monthly), q (quarterly), h (half), y (yearly).<br />
3<br />
PU/DSS/OTR
<strong>Time</strong> series data is data collected over time for a single or a group of variables.<br />
For this kind of data the first thing to do is to check the variable that contains the time or date range <strong>and</strong> make<br />
sure is the one you need: yearly, monthly, quarterly, daily, etc.<br />
The next step is to verify it is in the correct format. In the example below the time variable is stored in “date”<br />
but it is a string variable not a date variable. In Stata you need to convert this string variable to a date<br />
variable.*<br />
A closer inspection of the variable, for the years 2000 the format<br />
changes, we need to create a new variable with a uniform format.<br />
Type the following:<br />
use http://dss.princeton.edu/training/tsdata.dta<br />
gen date1=substr(date,1,7)<br />
gen datevar=quarterly(date1,"yq")<br />
format datevar %tq<br />
browse date date1 datevar<br />
See next page.<br />
For more details type help date<br />
*<strong>Data</strong> source: Stock & Watson’s companion materials<br />
Date variable (example)<br />
4<br />
PU/DSS/OTR
The date variable is now stored in datevar as a date variable.<br />
Date variable (example, cont.)<br />
5<br />
PU/DSS/OTR
Setting as time series: tsset<br />
Once you have the date variable in a ‘date format’ you need to declare your data as time series in order to<br />
use the time series operators. In Stata type:<br />
tsset datevar<br />
. tsset datevar<br />
time variable: datevar, 1957q1 to 2005q1<br />
delta: 1 quarter<br />
If you have gaps in your time series, for example there may not be data available for weekends. This<br />
complicates the analysis using lags for those missing dates. In this case you may want to create a continuous<br />
time trend as follows:<br />
gen time = _n<br />
Then use it to set the time series:<br />
tsset time<br />
In the case of cross-sectional time series type:<br />
sort panel date<br />
by panel: gen time = _n<br />
xtset panel time<br />
6<br />
PU/DSS/OTR
Use the comm<strong>and</strong> tsfill to fill in the gap in the time series. You need to tset, tsset or xtset the data<br />
before using tsfill. In the example below:<br />
tset quarters<br />
tsfill<br />
Type help tsfill for more details.<br />
Filling gaps in time variables<br />
7<br />
PU/DSS/OTR
Subsetting tin/twithin<br />
With tsset (time series set) you can use two time series comm<strong>and</strong>s: tin (‘times in’, from a to b) <strong>and</strong><br />
twithin (‘times within’, between a <strong>and</strong> b, it excludes a <strong>and</strong> b). If you have yearly data just include the years.<br />
. list datevar unemp if tin(2000q1,2000q4)<br />
datevar unemp<br />
173. 2000q1 4.033333<br />
174. 2000q2 3.933333<br />
175. 2000q3 4<br />
176. 2000q4 3.9<br />
. list datevar unemp if twithin(2000q1,2000q4)<br />
datevar unemp<br />
174. 2000q2 3.933333<br />
175. 2000q3 4<br />
/* Make sure to set your data as time series before using tin/twithin */<br />
tsset date<br />
regress y x1 x2 if tin(01jan1995,01jun1995)<br />
regress y x1 x2 if twithin(01jan2000,01jan2001)<br />
8<br />
PU/DSS/OTR
See<br />
http://dss.princeton.edu/training/Merge101.pdf<br />
Merge/Append<br />
PU/DSS/OTR
Lag operators (lag)<br />
Another set of time series comm<strong>and</strong>s are the lags, leads, differences <strong>and</strong> seasonal operators.<br />
It is common to analyzed the impact of previous values on current ones.<br />
To generate values with past values use the “L” operator<br />
generate unempL1=L1.unemp<br />
generate unempL2=L2.unemp<br />
list datevar unemp unempL1 unempL2 in 1/5<br />
. generate unempL1=L1.unemp<br />
(1 missing value generated)<br />
. generate unempL2=L2.unemp<br />
(2 missing values generated)<br />
. list datevar unemp unempL1 unempL2 in 1/5<br />
datevar unemp unempL1 unempL2<br />
1. 1957q1 3.933333 . .<br />
2. 1957q2 4.1 3.933333 .<br />
3. 1957q3 4.233333 4.1 3.933333<br />
4. 1957q4 4.933333 4.233333 4.1<br />
5. 1958q1 6.3 4.933333 4.233333<br />
In a regression you could type:<br />
regress y x L1.x L2.x<br />
or regress y x L(1/5).x<br />
10<br />
PU/DSS/OTR
To generate forward or lead values use the “F” operator<br />
generate unempF1=F1.unemp<br />
generate unempF2=F2.unemp<br />
list datevar unemp unempF1 unempF2 in 1/5<br />
. generate unempF1=F1.unemp<br />
(1 missing value generated)<br />
. generate unempF2=F2.unemp<br />
(2 missing values generated)<br />
. list datevar unemp unempF1 unempF2 in 1/5<br />
datevar unemp unempF1 unempF2<br />
1. 1957q1 3.933333 4.1 4.233333<br />
2. 1957q2 4.1 4.233333 4.933333<br />
3. 1957q3 4.233333 4.933333 6.3<br />
4. 1957q4 4.933333 6.3 7.366667<br />
5. 1958q1 6.3 7.366667 7.333333<br />
Lag operators (forward)<br />
In a regression you could type:<br />
regress y x F1.x F2.x<br />
or regress y x F(1/5).x 11<br />
PU/DSS/OTR
Lag operators (difference)<br />
To generate the difference between current a previous values use the “D” operator<br />
generate unempD1=D1.unemp /* D1 = y t – y t-1 */<br />
generate unempD2=D2.unemp /* D2 = (y t – y t-1) – (y t-1 – y t-2) */<br />
list datevar unemp unempD1 unempD2 in 1/5<br />
. generate unempD1=D1.unemp<br />
(1 missing value generated)<br />
. generate unempD2=D2.unemp<br />
(2 missing values generated)<br />
. list datevar unemp unempD1 unempD2 in 1/5<br />
1. 1957q1 3.933333 . .<br />
2. 1957q2 4.1 .1666665 .<br />
3. 1957q3 4.233333 .1333332 -.0333333<br />
4. 1957q4 4.933333 .7000003 .5666671<br />
5. 1958q1 6.3 1.366667 .6666665<br />
In a regression you could type:<br />
datevar unemp unempD1 unempD2<br />
regress y x D1.x<br />
12<br />
PU/DSS/OTR
To generate seasonal differences use the “S” operator<br />
generate unempS1=S1.unemp /* S1 = y t – y t-1 */<br />
generate unempS2=S2.unemp /* S2 = (y t – y t-2) */<br />
list datevar unemp unempS1 unempS2 in 1/5<br />
. generate unempS1=S1.unemp<br />
(1 missing value generated)<br />
. generate unempS2=S2.unemp<br />
(2 missing values generated)<br />
. list datevar unemp unempS1 unempS2 in 1/5<br />
In a regression you could type:<br />
datevar unemp unempS1 unempS2<br />
regress y x S1.x<br />
Lag operators (seasonal)<br />
1. 1957q1 3.933333 . .<br />
2. 1957q2 4.1 .1666665 .<br />
3. 1957q3 4.233333 .1333332 .2999997<br />
4. 1957q4 4.933333 .7000003 .8333335<br />
5. 1958q1 6.3 1.366667 2.066667<br />
13<br />
PU/DSS/OTR
To explore autocorrelation, which is the correlation between a variable <strong>and</strong> its previous values,<br />
use the comm<strong>and</strong> corrgram. The number of lags depend on theory, AIC/BIC process or<br />
experience. The output includes autocorrelation coefficient <strong>and</strong> partial correlations coefficients<br />
used to specify an ARIMA model.<br />
corrgram unemp, lags(12)<br />
. corrgram unemp, lags(12)<br />
Correlograms: autocorrelation<br />
-1 0 1 -1 0 1<br />
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]<br />
1 0.9641 0.9650 182.2 0.0000<br />
2 0.8921 -0.6305 339.02 0.0000<br />
3 0.8045 0.1091 467.21 0.0000<br />
4 0.7184 0.0424 569.99 0.0000<br />
5 0.6473 0.0836 653.86 0.0000<br />
6 0.5892 -0.0989 723.72 0.0000<br />
7 0.5356 -0.0384 781.77 0.0000<br />
8 0.4827 0.0744 829.17 0.0000<br />
9 0.4385 0.1879 868.5 0.0000<br />
10 0.3984 -0.1832 901.14 0.0000<br />
11 0.3594 -0.1396 927.85 0.0000<br />
12 0.3219 0.0745 949.4 0.0000<br />
AC shows that the<br />
correlation between the<br />
current value of unemp <strong>and</strong><br />
its value three quarters ago<br />
is 0.8045. AC can be use to<br />
define the q in MA(q) only<br />
in stationary series<br />
PAC shows that the<br />
correlation between the<br />
current value of unemp <strong>and</strong><br />
its value three quarters ago<br />
is 0.1091 without the effect<br />
of the two previous lags.<br />
PAC can be used to define<br />
the p in AR(p) only in<br />
stationary series<br />
Box-Pierce’ Q statistic tests<br />
the null hypothesis that all<br />
correlation up to lag k are<br />
equal to 0. This series show<br />
significant autocorrelation<br />
as shown in the Prob>Q<br />
value which at any k are less<br />
than 0.05, therefore<br />
rejecting the null that all lags<br />
are not autocorrelated.<br />
Graphic view of AC<br />
which shows a slow<br />
decay in the trend,<br />
suggesting nonstationarity.<br />
See also the ac<br />
comm<strong>and</strong>.<br />
Graphic view of PAC<br />
which does not show<br />
spikes after the second<br />
lag which suggests that<br />
all other lags are mirrors<br />
of the second lag. See the<br />
pac comm<strong>and</strong>.<br />
14<br />
PU/DSS/OTR
The explore the relationship between two time series use the comm<strong>and</strong> xcorr. The graph below shows the correlation<br />
between GDP quarterly growth rate <strong>and</strong> unemployment. When using xcorr list the independent variable first <strong>and</strong> the<br />
dependent variable second. type<br />
xcorr gdp unemp, lags(10) xlabel(-10(1)10,grid)<br />
Cross-correlations of gdp <strong>and</strong> unemp<br />
-1.00 -0.50 0.00 0.50 1.00<br />
Cross-correlogram<br />
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10<br />
Lag<br />
Correlograms: cross correlation<br />
-1.00 -0.50 0.00 0.50 1.00<br />
. xcorr gdp unemp, lags(10) table<br />
-1 0 1<br />
LAG CORR [Cross-correlation]<br />
-10 -0.1080<br />
-9 -0.1052<br />
-8 -0.1075<br />
-7 -0.1144<br />
-6 -0.1283<br />
-5 -0.1412<br />
-4 -0.1501<br />
-3 -0.1578<br />
-2 -0.1425<br />
-1 -0.1437<br />
0 -0.1853<br />
1 -0.1828<br />
2 -0.1685<br />
3 -0.1177<br />
4 -0.0716<br />
5 -0.0325<br />
6 -0.0111<br />
7 -0.0038<br />
8 0.0168<br />
9 0.0393<br />
10 0.0419<br />
At lag 0 there is a negative immediate correlation between GDP growth rate <strong>and</strong> unemployment. This means that a drop<br />
in GDP causes an immediate increase in unemployment.<br />
15<br />
PU/DSS/OTR
xcorr interest unemp, lags(10) xlabel(-10(1)10,grid)<br />
Cross-correlations of interest <strong>and</strong> unemp<br />
-1.00 -0.50 0.00 0.50 1.00<br />
Cross-correlogram<br />
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10<br />
Lag<br />
Interest rates have a positive effect on future level of<br />
unemployment, reaching the highest point at lag 8 (four<br />
quarters or two years). In this case, interest rates are positive<br />
correlated with unemployment rates eight quarters later.<br />
-1.00 -0.50 0.00 0.50 1.00<br />
Correlograms: cross correlation<br />
. xcorr interest unemp, lags(10) table<br />
-1 0 1<br />
LAG CORR [Cross-correlation]<br />
-10 0.3297<br />
-9 0.3150<br />
-8 0.2997<br />
-7 0.2846<br />
-6 0.2685<br />
-5 0.2585<br />
-4 0.2496<br />
-3 0.2349<br />
-2 0.2323<br />
-1 0.2373<br />
0 0.2575<br />
1 0.3095<br />
2 0.3845<br />
3 0.4576<br />
4 0.5273<br />
5 0.5850<br />
6 0.6278<br />
7 0.6548<br />
8 0.6663<br />
9 0.6522<br />
10 0.6237<br />
16<br />
PU/DSS/OTR
. varsoc gdp cpi, maxlag(10)<br />
Selection-order criteria<br />
Sample: 1959q4 - 2005q1 Number of obs = 182<br />
lag LL LR df p FPE AIC HQIC SBIC<br />
0 -1294.75 5293.32 14.25 14.2642 14.2852<br />
1 -467.289 1654.9 4 0.000 .622031 5.20098 5.2438 5.30661<br />
2 -401.381 131.82 4 0.000 .315041 4.52067 4.59204 4.69672*<br />
3 -396.232 10.299 4 0.036 .311102 4.50804 4.60796 4.75451<br />
4 -385.514 21.435* 4 0.000 .288988* 4.43422* 4.56268* 4.7511<br />
5 -383.92 3.1886 4 0.527 .296769 4.46066 4.61766 4.84796<br />
6 -381.135 5.5701 4 0.234 .300816 4.47401 4.65956 4.93173<br />
7 -379.062 4.1456 4 0.387 .307335 4.49519 4.70929 5.02332<br />
8 -375.483 7.1585 4 0.128 .308865 4.49981 4.74246 5.09836<br />
9 -370.817 9.3311 4 0.053 .306748 4.4925 4.76369 5.16147<br />
10 -370.585 .46392 4 0.977 .319888 4.53391 4.83364 5.27329<br />
Endogenous: gdp cpi<br />
Exogenous: _cons<br />
Lag selection<br />
Too many lags could increase the error in the forecasts, too few could leave out relevant information*.<br />
Experience, knowledge <strong>and</strong> theory are usually the best way to determine the number of lags needed. There<br />
are, however, information criterion procedures to help come up with a proper number. Three commonly used<br />
are: Schwarz's Bayesian information criterion (SBIC), the Akaike's information criterion (AIC), <strong>and</strong> the<br />
Hannan <strong>and</strong> Quinn information criterion (HQIC). All these are reported by the comm<strong>and</strong> ‘varsoc’ in Stata.<br />
When all three agree, the selection is clear, but what happens when getting conflicting results? A paper from<br />
the CEPR suggests, in the context of VAR models, that AIC tends to be more accurate with monthly data,<br />
HQIC works better for quarterly data on samples over 120 <strong>and</strong> SBIC works fine with any sample size for<br />
quarterly data (on VEC models)**. In our example above we have quarterly data with 182 observations,<br />
HQIC suggest a lag of 4 (which is also suggested by AIC).<br />
* See Stock & Watson for more details <strong>and</strong> on how to estimate BIC <strong>and</strong> SIC<br />
** Ivanov, V. <strong>and</strong> Kilian, L. 2001. 'A Practitioner's Guide to Lag-Order Selection for Vector Autoregressions'. CEPR Discussion Paper no. 2685. London, Centre for Economic Policy<br />
Research. http://www.cepr.org/pubs/dps/DP2685.asp.<br />
17<br />
PU/DSS/OTR
Having a unit root in a series mean that there is more than one trend in the series.<br />
. regress unemp gdp if tin(1965q1,1981q4)<br />
Source SS df MS Number of obs = 68<br />
F( 1, 66) = 19.14<br />
Model 36.1635247 1 36.1635247 Prob > F = 0.0000<br />
Residual 124.728158 66 1.88982058 R-squared = 0.2248<br />
Adj R-squared = 0.2130<br />
Total 160.891683 67 2.4013684 Root MSE = 1.3747<br />
unemp Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
gdp -.4435909 .1014046 -4.37 0.000 -.6460517 -.2411302<br />
_cons 7.087789 .3672397 19.30 0.000 6.354572 7.821007<br />
. regress unemp gdp if tin(1982q1,2000q4)<br />
Source SS df MS Number of obs = 76<br />
F( 1, 74) = 3.62<br />
Model 8.83437339 1 8.83437339 Prob > F = 0.0608<br />
Residual 180.395848 74 2.43778172 R-squared = 0.0467<br />
Adj R-squared = 0.0338<br />
Total 189.230221 75 2.52306961 Root MSE = 1.5613<br />
unemp Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
gdp .3306551 .173694 1.90 0.061 -.0154377 .6767479<br />
_cons 5.997169 .2363599 25.37 0.000 5.526211 6.468126<br />
Unit roots<br />
18<br />
PU/DSS/OTR
Unemployment rate.<br />
line unemp datevar<br />
Unemployment Rate<br />
4 6 8 10 12<br />
Unit roots<br />
1957q1 1969q1 1981q1 1993q1 2005q1<br />
datevar<br />
19<br />
PU/DSS/OTR
The Dickey-Fuller test is one of the most commonly use tests for stationarity. The null<br />
hypothesis is that the series has a unit root. The test statistic shows that the unemployment<br />
series have a unit root, it lies within the acceptance region.<br />
One way to deal with stochastic trends (unit root) is by taking the first difference of the variable<br />
(second test below).<br />
. dfuller unemp, lag(5)<br />
MacKinnon approximate p-value for Z(t) = 0.0936<br />
Unit root test<br />
Augmented Dickey-Fuller test for unit root Number of obs = 187<br />
Unit root<br />
Interpolated Dickey-Fuller<br />
Test 1% Critical 5% Critical 10% Critical<br />
Statistic Value Value Value<br />
Z(t) -2.597 -3.481 -2.884 -2.574<br />
. dfuller unempD1, lag(5)<br />
Augmented Dickey-Fuller test for unit root Number of obs = 186<br />
No unit root<br />
Interpolated Dickey-Fuller<br />
Test 1% Critical 5% Critical 10% Critical<br />
Statistic Value Value Value<br />
Z(t) -5.303 -3.481 -2.884 -2.574<br />
MacKinnon approximate p-value for Z(t) = 0.0000<br />
20<br />
PU/DSS/OTR
Cointegration refers to the fact that two or more series share an stochastic trend (Stock &<br />
Watson). Engle <strong>and</strong> Granger (1987) suggested a two step process to test for cointegration (an<br />
OLS regression <strong>and</strong> a unit root test), the EG-ADF test.<br />
regress unemp gdp<br />
predict e, resid<br />
dfuller e, lags(10)<br />
Augmented Dickey-Fuller test for unit root Number of obs = 181<br />
Unit root*<br />
Run an OLS regression<br />
Get the residuals<br />
Run a unit root test on the residuals.<br />
Interpolated Dickey-Fuller<br />
Test 1% Critical 5% Critical 10% Critical<br />
Statistic Value Value Value<br />
Z(t) -2.535 -3.483 -2.885 -2.575<br />
MacKinnon approximate p-value for Z(t) = 0.1071<br />
Testing for cointegration<br />
Both variables are not cointegrated<br />
See Stock & Watson for a table of critical values for the unit root test <strong>and</strong> the theory behind.<br />
*Critical value for one independent variable in the OLS regression, at 5% is -3.41 (Stock & Watson)<br />
21<br />
PU/DSS/OTR
If you regress ‘y’ on lagged values of ‘y’ <strong>and</strong> ‘x’ <strong>and</strong> the coefficients of the lag of ‘x’ are<br />
statistically significantly different from 0, then you can argue that ‘x’ Granger-cause ‘y’, this is,<br />
‘x’ can be used to predict ‘y’ (see Stock & Watson -2007-, Green -2008).<br />
1<br />
2<br />
. regress unemp L(1/4).unemp L(1/4).gdp<br />
. test L1.gdp L2.gdp L3.gdp L4.gdp<br />
.<br />
Source SS df MS Number of obs = 188<br />
F( 8, 179) = 668.37<br />
Model 373.501653 8 46.6877066 Prob > F = 0.0000<br />
Residual 12.5037411 179 .069853302 R-squared = 0.9676<br />
Adj R-squared = 0.9662<br />
Total 386.005394 187 2.06419997 Root MSE = .2643<br />
unemp Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
unemp<br />
L1. 1.625708 .0763035 21.31 0.000 1.475138 1.776279<br />
L2. -.7695503 .1445769 -5.32 0.000 -1.054845 -.484256<br />
L3. .0868131 .1417562 0.61 0.541 -.1929152 .3665415<br />
L4. .0217041 .0726137 0.30 0.765 -.1215849 .1649931<br />
gdp<br />
L1. .0060996 .0136043 0.45 0.654 -.0207458 .0329451<br />
L2. -.0189398 .0128618 -1.47 0.143 -.0443201 .0064405<br />
L3. .0247494 .0130617 1.89 0.060 -.0010253 .0505241<br />
L4. .003637 .0129079 0.28 0.778 -.0218343 .0291083<br />
_cons .1702419 .096857 1.76 0.081 -.0208865 .3613704<br />
( 1) L.gdp = 0<br />
( 2) L2.gdp = 0<br />
( 3) L3.gdp = 0<br />
( 4) L4.gdp = 0<br />
F( 4, 179) = 1.67<br />
Prob > F = 0.1601<br />
Granger causality: using OLS<br />
You cannot reject the null hypothesis that all<br />
coefficients of lag of ‘x’ are equal to 0.<br />
Therefore ‘gdp’ does not Granger-cause<br />
‘unemp’.<br />
22<br />
PU/DSS/OTR
The following procedure uses VAR models to estimate Granger causality using the comm<strong>and</strong><br />
‘vargranger’<br />
1<br />
2<br />
. quietly var unemp gdp, lags(1/4)<br />
. vargranger<br />
Granger causality Wald tests<br />
Granger causality: using VAR<br />
Equation Excluded chi2 df Prob > chi2<br />
unemp gdp 6.9953 4 0.136<br />
unemp ALL 6.9953 4 0.136<br />
gdp unemp 6.8658 4 0.143<br />
gdp ALL 6.8658 4 0.143<br />
The null hypothesis is ‘var1 does not Granger-cause var2’. In both cases, we cannot reject<br />
the null that each variable does not Granger-cause the other<br />
23<br />
PU/DSS/OTR
The Chow test allows to test whether a particular date causes a break in the regression coefficients. It is named after Gregory<br />
Chow (1960)*.<br />
Step 1. Create a dummy variable where 1 if date > break date <strong>and</strong> 0 tq(1981q4))<br />
Step 2. Create interaction terms between the lags of the independent variables <strong>and</strong> the lag of the dependent variables. We will<br />
assume lag 1 for this example (the number of lags depends on your theory/data)<br />
generate break_unemp = break*l1.unemp<br />
generate break_gdp = break*l1.gdp<br />
Step 3. Run a regression between the outcome variables (in this case ‘unemp’) <strong>and</strong> the independent along with the interactions<br />
<strong>and</strong> the dummy for the break.<br />
reg unemp l1.unemp l1.gdp break break_unemp break_gdp<br />
Step 4. Run an F-test on the coefficients for the interactions <strong>and</strong> the dummy for the break<br />
test break break_unemp break_gdp<br />
. test break break_unemp break_gdp<br />
( 1) break = 0<br />
( 2) break_unemp = 0<br />
( 3) break_gdp = 0<br />
F( 3, 185) = 1.14<br />
Prob > F = 0.3351<br />
Chow test (testing for known breaks)<br />
Change “tq” with the correct date format: tw (week), tm (monthly), tq (quarterly), th (half), ty (yearly) <strong>and</strong> the<br />
corresponding date format in the parenthesis<br />
The null hypothesis is no break. If the p-value<br />
is < 0.05 reject the null in favor of the<br />
alternative that there is a break. In this<br />
example, we fail to reject the null <strong>and</strong><br />
conclude that the first quarter of 1982 does<br />
not cause a break in the regression<br />
coefficients.<br />
* See Stock & Watson for more details<br />
24<br />
PU/DSS/OTR
The Qu<strong>and</strong>t likelihood ratio (QLR test –Qu<strong>and</strong>t,1960) or sup-Wald statistic is a modified version of the Chow test used to identify<br />
break dates. The following is a modified procedure taken from Stock & Watson’s companion materials to their book Introduction<br />
to Econometrics, I strongly advise to read the corresponding chapter to better underst<strong>and</strong> the procedure <strong>and</strong> to check the critical<br />
values for the QLR statistic. Below we will check for breaks in a GDP per-capita series (quarterly).<br />
/* Replace the words in bold with your own variables, do not change anything else*/<br />
/* The log file ‘qlrtest.log’ will have the list for QLR statistics (use Word to read it)*/<br />
/* See next page for a graph*/<br />
/* STEP 1. Copy-<strong>and</strong>-paste-run the code below to a do-file, double-check the quotes (re-type them if necessary)*/<br />
log using qlrtest.log<br />
tset datevar<br />
sum datevar<br />
local time=r(max)-r(min)+1<br />
local i = round(`time'*.15)<br />
local f = round(`time'*.85)<br />
local var = "gdp"<br />
gen diff`var' = d.`var'<br />
gen chow`var' = .<br />
gen qlr`var' = .<br />
set more off<br />
while `i'
Testing for unknown breaks: graph<br />
/* Replace the words in bold with your own variables, do not change anything else*/<br />
/* The code will produce the graph shown in the next page*/<br />
/* The critical value 3.66 is for q=5 (constant <strong>and</strong> four lags) <strong>and</strong> 5% significance*/<br />
/* STEP 2. Copy-<strong>and</strong>-paste-run the code below to a do-file, double-check the quotes (re-type them if necessary)*/<br />
sum qlr`var'<br />
local maxvalue=r(max)<br />
gen maxdate=datevar if qlr`var'==`maxvalue'<br />
local maxvalue1=round(`maxvalue',0.01)<br />
local critical=3.66 /*Replace with the appropriate critical value (see Stock & Watson)*/<br />
sum datevar<br />
local mindate=r(min)<br />
sum maxdate<br />
local maxdate=r(max)<br />
gen break=datevar if qlr`var'>=`critical' & qlr`var'!=.<br />
dis "Below are the break dates..."<br />
list datevar qlr`var' if break!=.<br />
levelsof break, local(break1)<br />
twoway tsline qlr`var', title(Testing for breaks in GDP per-capita (1957-2005)) ///<br />
xlabel(`break1', angle(90) labsize(0.9) alternate) ///<br />
yline(`critical') ytitle(QLR statistic) xtitle(<strong>Time</strong>) ///<br />
ttext(`critical' `mindate' "Critical value 5% (`critical')", placement(ne)) ///<br />
ttext(`maxvalue' `maxdate' "Max QLR = `maxvalue1'", placement(e))<br />
datevar qlrgdp<br />
129. 1989q1 3.823702<br />
131. 1989q3 6.285852<br />
132. 1989q4 6.902882<br />
133. 1990q1 5.416068<br />
134. 1990q2 7.769114<br />
135. 1990q3 8.354294<br />
136. 1990q4 5.399252<br />
137. 1991q1 8.524492<br />
138. 1991q2 6.007093<br />
139. 1991q3 4.44151<br />
140. 1991q4 4.946689<br />
141. 1992q1 3.699911<br />
160. 1996q4 3.899656<br />
161. 1997q1 3.906271<br />
26<br />
PU/DSS/OTR
QLR statistic<br />
0 2 4 6 8<br />
Critical value 5% (3.66)<br />
Testing for unknown breaks: graph (cont.)<br />
Testing for breaks in GDP per-capita (1957-2005)<br />
<strong>Time</strong><br />
1989q1<br />
1989q3<br />
1989q4<br />
1990q1<br />
1990q2<br />
1990q3<br />
1990q4<br />
1991q1<br />
1991q2<br />
1991q3<br />
1991q4<br />
1992q1<br />
Max QLR = 8.52<br />
1996q4<br />
1997q1<br />
27<br />
PU/DSS/OTR
White noise refers to the fact that a variable does not have autocorrelation. In Stata use the<br />
wntestq (white noise Q test) to check for autocorrelation. The null is that there is no serial<br />
correlation (type help wntestq for more details):<br />
. wntestq unemp<br />
Portmanteau test for white noise<br />
Portmanteau (Q) statistic = 1044.6341<br />
Prob > chi2(40) = 0.0000<br />
. wntestq unemp, lags(10)<br />
Portmanteau test for white noise<br />
Portmanteau (Q) statistic = 901.1399<br />
Prob > chi2(10) = 0.0000<br />
<strong>Time</strong> <strong>Series</strong>: white noise<br />
Serial correlation<br />
If your variable is not white noise then see the page on correlograms to see the order of the<br />
autocorrelation.<br />
28<br />
PU/DSS/OTR
Breush-Godfrey <strong>and</strong> Durbin-Watson are used to test for serial correlation. The null in both tests<br />
is that there is no serial correlation (type help estat dwatson, help estat dubinalt<br />
<strong>and</strong> help estat bgodfrey for more details).<br />
. regress unempd gdp<br />
<strong>Time</strong> <strong>Series</strong>: Testing for serial correlation<br />
Source SS df MS Number of obs = 192<br />
F( 1, 190) = 1.62<br />
Model .205043471 1 .205043471 Prob > F = 0.2048<br />
Residual 24.0656991 190 .126661574 R-squared = 0.0084<br />
Adj R-squared = 0.0032<br />
Total 24.2707425 191 .12707195 Root MSE = .3559<br />
unempd1 Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
gdp -.015947 .0125337 -1.27 0.205 -.0406701 .0087761<br />
_cons .0401123 .036596 1.10 0.274 -.0320743 .1122989<br />
. estat dwatson<br />
Durbin-Watson d-statistic( 2, 192) = .7562744<br />
. estat durbinalt<br />
Durbin's alternative test for autocorrelation<br />
lags(p) chi2 df Prob > chi2<br />
1 118.790 1 0.0000<br />
. estat bgodfrey<br />
H0: no serial correlation<br />
Breusch-Godfrey LM test for autocorrelation<br />
lags(p) chi2 df Prob > chi2<br />
1 74.102 1 0.0000<br />
H0: no serial correlation<br />
Serial correlation<br />
29<br />
PU/DSS/OTR
Run a Cochrane-Orcutt regression using the prais comm<strong>and</strong> (type help prais for more<br />
details)<br />
. prais unemp gdp, corc<br />
Iteration 0: rho = 0.0000<br />
Iteration 1: rho = 0.9556<br />
Iteration 2: rho = 0.9660<br />
Iteration 3: rho = 0.9661<br />
Iteration 4: rho = 0.9661<br />
<strong>Time</strong> <strong>Series</strong>: Correcting for serial correlation<br />
Cochrane-Orcutt AR(1) regression -- iterated estimates<br />
Source SS df MS Number of obs = 191<br />
F( 1, 189) = 2.48<br />
Model .308369041 1 .308369041 Prob > F = 0.1167<br />
Residual 23.4694088 189 .124176766 R-squared = 0.0130<br />
Adj R-squared = 0.0077<br />
Total 23.7777778 190 .125146199 Root MSE = .35239<br />
unemp Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
gdp -.020264 .0128591 -1.58 0.117 -.0456298 .0051018<br />
_cons 6.105931 .7526023 8.11 0.000 4.621351 7.590511<br />
rho .966115<br />
Durbin-Watson statistic (original) 0.087210<br />
Durbin-Watson statistic (transformed) 0.758116<br />
30<br />
PU/DSS/OTR
Useful links / Recommended books<br />
• DSS Online Training Section http://dss.princeton.edu/training/<br />
• UCLA Resources to learn <strong>and</strong> use STATA http://www.ats.ucla.edu/stat/stata/<br />
• DSS help-sheets for STATA http://dss/online_help/stats_packages/stata/stata.htm<br />
• Introduction to Stata (PDF), Christopher F. Baum, Boston College, USA. “A 67-page description of Stata, its key<br />
features <strong>and</strong> benefits, <strong>and</strong> other useful information.” http://fmwww.bc.edu/GStat/docs/StataIntro.pdf<br />
• STATA FAQ website http://stata.com/support/faqs/<br />
• <strong>Princeton</strong> DSS Libguides http://libguides.princeton.edu/dss<br />
Books<br />
• Introduction to econometrics / James H. Stock, Mark W. Watson. 2nd ed., Boston: Pearson Addison<br />
Wesley, 2007.<br />
• <strong>Data</strong> analysis using regression <strong>and</strong> multilevel/hierarchical models / Andrew Gelman, Jennifer Hill.<br />
Cambridge ; New York : Cambridge <strong>University</strong> Press, 2007.<br />
• Econometric analysis / William H. Greene. 6th ed., Upper Saddle River, N.J. : Prentice Hall, 2008.<br />
• Designing Social Inquiry: Scientific Inference in Qualitative Research / Gary King, Robert O.<br />
Keohane, Sidney Verba, <strong>Princeton</strong> <strong>University</strong> Press, 1994.<br />
• Unifying Political Methodology: The Likelihood Theory of <strong>Statistical</strong> Inference / Gary King, Cambridge<br />
<strong>University</strong> Press, 1989<br />
• <strong>Statistical</strong> Analysis: an interdisciplinary introduction to univariate & multivariate methods / Sam<br />
Kachigan, New York : Radius Press, c1986<br />
• Statistics with Stata (updated for version 9) / Lawrence Hamilton, Thomson Books/Cole, 2006<br />
31<br />
PU/DSS/OTR