Homework 2
Homework 2
Homework 2
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Prof. Bernd Fitzenberger, Ph.D. SoSe 2010<br />
Topics in Microeconometrics<br />
<strong>Homework</strong> 2<br />
Due date: 26 July 2010 (at chair before 16:00h, room 2309)<br />
1.) Let K(u) be a symmetric kernel function for which important properties are summarized by<br />
the characteristics<br />
∫ +∞<br />
µj(K) = u j ∫ +∞<br />
K(u)du and νj(K) = K(u) j du<br />
−∞<br />
and where j is a nonnegative integer.<br />
Consider the rectangular kernel<br />
and the Gaussian kernel<br />
K(u) = 1<br />
· I(|u| ≤ 1)<br />
2<br />
K(u) = 1<br />
√ 2π e −u2 /2<br />
and compute the characteristic values µj(K) (j = 0, 1, 2) and νj(K) (j = 1, 2) for both kernels.<br />
Remark: You can assume for both kernels that they are probability density functions and use<br />
known results from statistics.<br />
2.) Assume the following nonparametric regression model<br />
−∞<br />
Yi = m(Xi) + ϵi for the sample i = 1, ..., N ,<br />
[3 credits]<br />
where Xi is a scalar, continuously distributed i.i.d. random variable with probability density f(x),<br />
E(ϵi|Xi) = 0, and V ar(ϵi|Xi) = σ 2 (Xi).<br />
a) Describe in your own words the local linear kernel regression estimator ˆmh(x). Explain the<br />
calculation of the estimator and interpret the expressions used in this calculation. What<br />
are the properties of the kernel function and why is it used here? What is the role of the<br />
bandwidth parameter?<br />
Remark: Do not discuss the distribution of the estimator here.<br />
b) Based on the notation used in the lecture, describe and explain in your own words the<br />
asymptotic distribution of the local linear kernel regression estimator at point x. Discuss<br />
the central role of the bandwidth parameter h.<br />
1
c) Simulate data using the following data generating process in TSP and implement the local<br />
linear regression based on the Gaussian kernel. Use Silverman’s rule of thumb and crossvalidation<br />
to determine the bandwidth parameter h. For cross validation, implement grid<br />
search around the rule of thumb estimate. Compare the results.<br />
TSP–Code:<br />
options crt,mem=20,double,limwarn=0;<br />
supres smpl;<br />
options crt, mem=20;<br />
smpl 1 100; set nob = @nob;<br />
set seedin =14;<br />
random(seedin=14) x eps;<br />
y = 1+ x -x^2 + eps/3;<br />
sort(all) x;<br />
msd(noprint,all) x;<br />
SET H00h00=@Stddev;<br />
if @iqr/1.349
es2 = (y - @coef(1))**2;<br />
enddo;<br />
select 1;<br />
msd(noprint) res2;<br />
set sumres2 = @sum;<br />
print h sumres2;<br />
if (sumres2 .lt. sumres2cv); then; do;<br />
set sumres2cv = sumres2;<br />
set hmincv = h;<br />
enddo;<br />
enddo;<br />
title ’Minimum CV’;<br />
print hmincv sumres2cv;<br />
llinreg y x hmincv mhat;<br />
graph(preview) x y mhat;<br />
proc llinreg y00 x00 h00 mhat00;<br />
?<br />
? Procedure for locally linear kernel regressions<br />
?<br />
local z00 xi00 w00 i dxi00;<br />
supres smpl;<br />
genr mhat00 = 0;<br />
? supresses unnecessary calculations<br />
regopt(nocalc) regout auto het;<br />
do i = 1 to @nob;<br />
set xi00 = x00(i);<br />
genr w00 = norm((x00-xi00)/h00); ?msd(terse) w00;<br />
genr dxi00 = x00-xi00;<br />
? select w00 .gt. 1.d-8;<br />
olsq(silent,weight=w00) y00 c dxi00;<br />
set mhat00(i) = @coef(1);<br />
? select 1;<br />
enddo;<br />
? reinstall default options<br />
regopt; supres smpl;<br />
endproc;<br />
END;<br />
3
Remark: The solutions to the problem 2c) should include the TSP–programs, which you<br />
wrote to solve the problems, together with the output on paper. The discussion of the<br />
results should make reference to the computer output.<br />
[ 7 credits ]<br />
3.) For a panel of length T = 2, recall that the first differences (FD) estimator for the outcome<br />
equation<br />
(0) yit = θt + zitγ + δ1progit + ci + uit<br />
yields the conditional difference–in–differences (CDiD) estimator of the treatment effect δ1, when<br />
the policy is introduced in period 2, i.e. progi1 = 0, and participation takes only place in period<br />
2, i.e. progi2 = 0, 1. This estimator accounts for the possibility that progit is correlated with ci.<br />
a) Motivate and describe the FD estimator above in detail. Why does it provide a consistent<br />
estimate of the treatment effect δ1?<br />
b) Discuss and describe a semiparametric matching estimator as an extension of the CDiD<br />
estimator for the model<br />
(S) yit = θt + g(zit) + δ1progit + ci + uit .<br />
under the above setup, where the policy is introduced in period 2. Assume that zit is a<br />
scalar regressor. Be as specific as possible.<br />
[ 4 credits ]<br />
4.) (Sharp RDD with simulated Data, PC Pool Problem Set 3, Problem 3)<br />
Assume that a sample of N observations is simulated based on the following regression model<br />
yi = 1 + α · Di + x1i − 0.1x 2 1i + x2i − 0.2x1ix2i + ϵi ,<br />
where x1i, x2i, and ϵi are three independent random variables following a standard normal distribution.<br />
The treatment dummy is given by<br />
Di = I(x1i > 0.2) .<br />
a) First assume that the treatment effect α = 3 is a fixed constant. Show that the conditions<br />
for a sharp RDD are satisfied here, i.e. the RDD estimator identifies α. Consider the assumptions<br />
put forward in Van der Klaauw (2002) and check formally that they are satisfied<br />
in this case. Motivate the identification result. What is the control function k(S) in this<br />
context? Be as specific as possible.<br />
b) Why is it not necessary that the RDD estimator controls for x2?<br />
4
c) Is there a discontinuous jump in the distribution of x2 at the RDD threshold ¯x1 = 0.2?<br />
How could one implement a test for this?<br />
d) Now assume that the treatment effect α is random with E(α) = 3 and the distribution of<br />
α is independent of (x1, x2). Show that the conditions for a sharp RDD are satisfied here<br />
as well, i.e. the RDD estimator identifies α. Consider the assumptions put forward in Van<br />
der Klaauw (2002) and check formally that they are satisfied in this case. Motivate the<br />
identification result. What is the control function k(S) in this context? Be as specific as<br />
possible.<br />
Remark: This is a theoretical exercise. To solve this problem, it is not necessary to implement a<br />
TSP program for estimation purposes.<br />
[ 7 credits ]<br />
5.) (Fuzzy RDD: Angrist and Lavy 1999, Maimonides Rule, PC Pool Problem Set 3, Problem 4)<br />
Use the programs and the data provided as part of the third PC Pool Problem Set. For the<br />
problem analyze only math scores as outcome variables for the fourth grade.<br />
a) Analyze whether the variable ’percent disadvantaged’ shows discontinuous jumps at the<br />
tresholds used to split classes. What do these results imply regarding the question as to<br />
whether it is necessary to control for the variable ’percent disadvantaged’ when estimating<br />
the RDD estimate of the treatment effect of class size?<br />
b) Implement the 2SLS estimate of the effect of class size on math scores in the fourth grade<br />
controlling for enrollment using an appropriate polynomial specification of enrollemnt and<br />
percent disadvantaged as control variables. Discuss the results.<br />
c) Implement the local Wald estimator for the RDD based on a local linear regression (using<br />
a rectangular kernel) of class size on expected class size on both sides of each threshold.<br />
Use these nonparametric estimates from the first stage to estimate in a second stage the<br />
fuzzy RDD estimate of the effect of class size<br />
E(Yi|<br />
ˆρs = lim<br />
∆→0<br />
¯ Ss < S < ¯ Ss + ∆) − E(Yi| ¯ Ss − ∆ < S < ¯ Ss)<br />
E(Di| ¯ Ss < S < ¯ Ss + ∆) − E(Di| ¯ Ss − ∆ < S < ¯ Ss)<br />
for the s th threshold ¯ Ss (using the notation in the lecture). Discuss the differences in<br />
methods and results in comparison to part b).<br />
Remark: The solutions to the problem should include the TSP–programs, which you wrote to<br />
solve the problems, together with the output on paper. The discussion of the results should make<br />
reference to the computer output.<br />
5<br />
[ 6 credits ]