On the Identification of Misspecified Propensity Scores - School of ...
On the Identification of Misspecified Propensity Scores - School of ...
On the Identification of Misspecified Propensity Scores - School of ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>On</strong> <strong>the</strong> <strong>Identification</strong> <strong>of</strong> <strong>Misspecified</strong> <strong>Propensity</strong> <strong>Scores</strong><br />
Azeem M. Shaikh<br />
Department <strong>of</strong> Economics<br />
Stanford University<br />
Edward J. Vytlacil<br />
Department <strong>of</strong> Economics<br />
Columbia University<br />
February 8, 2006<br />
Abstract<br />
Marianne Simonsen<br />
Department <strong>of</strong> Economics<br />
University <strong>of</strong> Aarhus<br />
Nese Yildiz<br />
Department <strong>of</strong> Economics<br />
University <strong>of</strong> Pittsburgh<br />
<strong>Propensity</strong> score matching has become a popular method for <strong>the</strong> evaluation <strong>of</strong><br />
treatment effects. In empirical applications, researchers almost always impose a<br />
parametric model for <strong>the</strong> propensity score. This raises <strong>the</strong> possibility that <strong>the</strong><br />
model is misspecified and <strong>the</strong>refore <strong>the</strong> propensity score matching estimator may<br />
be inconsistent. We show that <strong>the</strong> common practice <strong>of</strong> calculating estimates <strong>of</strong> <strong>the</strong><br />
densities <strong>of</strong> <strong>the</strong> propensity score conditional on <strong>the</strong> participation decision provides<br />
a means for examining whe<strong>the</strong>r <strong>the</strong> propensity score is misspecified. In particular,<br />
we derive a restriction between <strong>the</strong> density <strong>of</strong> <strong>the</strong> propensity score among participants<br />
versus <strong>the</strong> density among nonparticipants. We show that this restriction<br />
between <strong>the</strong> two conditional densities is equivalent to a particular orthogonality<br />
restriction and derive a formal test based upon it. The resulting test is shown to<br />
have dramatically greater power than competing tests for many alternatives. The<br />
principle disadvantage <strong>of</strong> this approach is loss <strong>of</strong> power against some alternatives.<br />
Acknowledgements:<br />
The authors wish to thank Jeff Smith for valuable comments and suggestions.
1 Introduction<br />
<strong>Propensity</strong> score matching is a widely used approach for <strong>the</strong> estimation <strong>of</strong> treatment effects.<br />
See Heckman, LaLonde, and Smith [7] for a detailed survey <strong>of</strong> its use in <strong>the</strong> treatment effect<br />
literature. The method is based on <strong>the</strong> following well-known result <strong>of</strong> Rosenbaum and Rubin<br />
[13]: if selection into <strong>the</strong> program is independent <strong>of</strong> <strong>the</strong> potential outcomes <strong>of</strong> interest con-<br />
ditional on a vector <strong>of</strong> covariates, <strong>the</strong>n selection into <strong>the</strong> program is also independent <strong>of</strong> <strong>the</strong><br />
potential outcomes conditional on <strong>the</strong> propensity score, where <strong>the</strong> propensity score denotes <strong>the</strong><br />
probability <strong>of</strong> selection into <strong>the</strong> program conditional on <strong>the</strong> same vector <strong>of</strong> covariates. The high<br />
dimensionality <strong>of</strong> <strong>the</strong> vector <strong>of</strong> observed covariates <strong>of</strong>ten forces researchers to adopt a finite-<br />
dimensional approximation to <strong>the</strong> propensity score, in lieu <strong>of</strong> a more flexible nonparametric<br />
model. Imposing such restrictions on <strong>the</strong> functional form for <strong>the</strong> propensity score raises <strong>the</strong><br />
possibility that <strong>the</strong> propensity score model is misspecified. If <strong>the</strong> propensity score model is mis-<br />
specified, <strong>the</strong>n in general <strong>the</strong> estimator <strong>of</strong> <strong>the</strong> propensity score will be inconsistent for <strong>the</strong> true<br />
propensity score and <strong>the</strong> resulting propensity score matching estimator may be inconsistent<br />
for <strong>the</strong> treatment effect <strong>of</strong> interest.<br />
First suggested by Heckman, Ichimura, Smith, and Todd [5], it is now common to calculate<br />
estimates <strong>of</strong> <strong>the</strong> densities <strong>of</strong> <strong>the</strong> propensity score conditional on <strong>the</strong> participation decision.<br />
These estimates are calculated in order to determine <strong>the</strong> region <strong>of</strong> common support on which<br />
to perform matching. We show that <strong>the</strong>se conditional densities provide a means for examining<br />
whe<strong>the</strong>r <strong>the</strong> propensity score is misspecified. In particular, we derive a restriction between<br />
<strong>the</strong> density <strong>of</strong> <strong>the</strong> propensity score model among participants and <strong>the</strong> density among nonpar-<br />
ticipants. Failure <strong>of</strong> this restriction to hold for <strong>the</strong> estimated conditional densities provides<br />
evidence that <strong>the</strong> propensity score model is misspecified. We show that this restriction between<br />
<strong>the</strong> two conditional densities is equivalent to a particular orthogonality restriction and derive<br />
a formal test based upon it. Unlike o<strong>the</strong>r tests for correct specification <strong>of</strong> <strong>the</strong> propensity score,<br />
<strong>the</strong> resulting test does not require nonparametric estimation <strong>of</strong> high dimensional objects. As<br />
a result, our test does not suffer from <strong>the</strong> curse <strong>of</strong> dimensionality and has dramatically greater<br />
power than competing tests for many alternatives. We provide striking evidence <strong>of</strong> this fact<br />
in a limited Monte Carlo study. The principle drawback <strong>of</strong> this approach is that our test does<br />
not have power against certain alternatives, but we argue that <strong>the</strong>se alternatives are ra<strong>the</strong>r<br />
exceptional.<br />
Following [12], applied researchers <strong>of</strong>ten employ heuristic balancing score tests for misspeci-<br />
fication <strong>of</strong> <strong>the</strong> propensity score to guide <strong>the</strong> choice <strong>of</strong> specification <strong>of</strong> <strong>the</strong> parametric propensity<br />
score. Specifically, [12] suggests examining whe<strong>the</strong>r <strong>the</strong> observable characteristics <strong>of</strong> <strong>the</strong> pop-<br />
ulation are independent <strong>of</strong> participation conditional on <strong>the</strong> propensity score. See e.g. [3], [8],<br />
[15], and [17] for applications <strong>of</strong> such balancing score tests. The formal misspecification test<br />
2
presented in this paper may be considered a complement to this practice.<br />
Our paper proceeds as follows. In Section 2, we first provide a summary <strong>of</strong> <strong>the</strong> method <strong>of</strong><br />
propensity score matching. We <strong>the</strong>n present in Section 3 <strong>the</strong> restriction upon which our test<br />
is based. In Section 4, we use this result to derive an asymptotic test <strong>of</strong> misspecification and<br />
examine its finite sample properties in a Monte Carlo study. Finally, Section 5 concludes.<br />
2 A Review <strong>of</strong> Matching<br />
Before proceeding, it is useful to review <strong>the</strong> key <strong>the</strong>oretical underpinnings <strong>of</strong> matching as a<br />
means <strong>of</strong> program evaluation. There are two groups <strong>of</strong> individuals, participants and nonpar-<br />
ticipants, in a program <strong>of</strong> interest. Participation in <strong>the</strong> program <strong>of</strong> interest is denoted by <strong>the</strong><br />
dummy variable D, with D = 1 if <strong>the</strong> individual chooses to participate and D = 0 o<strong>the</strong>rwise.<br />
Individuals in each <strong>of</strong> <strong>the</strong>se two groups are associated with observed characteristics X. Two<br />
commonly used metrics for evaluating <strong>the</strong> effect <strong>of</strong> participation in a program are <strong>the</strong> average<br />
treatment effect, given by E[Y1 −Y0], and <strong>the</strong> average treatment effect on <strong>the</strong> treated, given by<br />
E[Y1 − Y0|D = 1], where Y1 is <strong>the</strong> potential outcome in <strong>the</strong> case <strong>of</strong> participation and Y0 is <strong>the</strong><br />
potential outcome in <strong>the</strong> case <strong>of</strong> nonparticipation. The difficulty with estimation <strong>of</strong> this object<br />
lies with <strong>the</strong> following missing data problem: <strong>the</strong> econometrician never observes <strong>the</strong> counter-<br />
factual outcome Y1−D, which precludes direct estimation <strong>of</strong> E[Y0|D = 1] and E[Y1|D = 0].<br />
The method <strong>of</strong> matching resolves this difficulty by matching each participant with a non-<br />
participant that is similar in terms <strong>of</strong> observed characteristics X. As described in Rosenbaum<br />
and Rubin [13], matching formally requires that:<br />
A1. (Y0, Y1) ⊥ D | X.<br />
A2. 0 < P (x) < 1 where P (x) = Pr[D = 1|X = x] for all x ∈ supp(X).<br />
Note that a consequence <strong>of</strong> A2 is that E[Y0|D = 1, X = x] and E[Y1|D = 0, X = x] are<br />
well defined for all x ∈ supp(X). Hence, matching suggests estimation <strong>of</strong> E[Y0|D = 1] and<br />
E[Y1|D = 0] by a two-step procedure in which E[Y0|D = 1, X] and E[Y1|D = 0, X] are first<br />
estimated by exploiting A1, and <strong>the</strong>n integrated with respect to <strong>the</strong> empirical distribution <strong>of</strong><br />
X in order to obtain an estimate <strong>of</strong> E[Y1 − Y0] or integrated with respect to <strong>the</strong> empirical<br />
distribution <strong>of</strong> X conditional on D = 1 to obtain an estimate <strong>of</strong> E[Y1 − Y0|D = 1]. Following<br />
Heckman, Ichimura, Smith, and Todd [5], it is clear that one can relax <strong>the</strong> first condition to<br />
only require mean independence instead <strong>of</strong> full independence. Using this procedure, it is in<br />
3
principle possible to construct a √ n-normal estimator <strong>of</strong> <strong>the</strong> parameter <strong>of</strong> interest without<br />
imposing any finite-dimensional restrictions. 1<br />
An alternative approach for estimating E[Y0|D = 1] and E[Y1|D = 0] is propensity score<br />
matching, which relies upon a celebrated result <strong>of</strong> Rosenbaum and Rubin [13]. They show that<br />
<strong>the</strong> assumptions underlying matching conditional on X, A1 and A2, imply:<br />
A1*. (Y0, Y1) ⊥ D | P (X).<br />
A2*. 0 < Pr[D = 1|P (X) = p] < 1 for all p ∈ supp(P (X)).<br />
This result can be restated as follows: if matching on X is valid, so is matching based on <strong>the</strong><br />
propensity score P (X). This result motivates propensity score matching, in which one first<br />
estimates <strong>the</strong> propensity score in a first step and <strong>the</strong>n perform matching, as described above,<br />
using <strong>the</strong> estimated propensity score.<br />
In practice, both matching on X and matching on P (X) suffer from <strong>the</strong> so-called “curse<br />
<strong>of</strong> dimensionality”. While matching on X requires <strong>the</strong> researcher to estimate E[Y0|D = 1, X],<br />
matching on P (X) requires <strong>the</strong> researcher to estimate E[D|X], an equally high dimensional<br />
object. Thus, if X has more than a few dimensions, nonparametric procedures for estimating<br />
<strong>the</strong>se objects are undesirable due to sizable finite sample bias. This difficulty leads to <strong>the</strong><br />
common practice <strong>of</strong> implementing propensity score matching with a parametric specification<br />
for <strong>the</strong> propensity score. More specifically, in practice propensity score matching involves<br />
estimating a parametric model for <strong>the</strong> propensity score by maximum likelihood in a first step,<br />
and <strong>the</strong>n using kernel regression or some o<strong>the</strong>r comparable nonparametric procedure to perform<br />
matching on <strong>the</strong> basis <strong>of</strong> <strong>the</strong> estimated propensity scores. 2<br />
If <strong>the</strong> propensity score is estimated using a parametric model, <strong>the</strong>n it is possible that <strong>the</strong><br />
model is misspecified and that <strong>the</strong> resulting estimator <strong>of</strong> P (X) is inconsistent. Let Q(X)<br />
denote <strong>the</strong> probability limit <strong>of</strong> <strong>the</strong> estimator <strong>of</strong> P (X). If P (X) = Q(X), as will likely be<br />
<strong>the</strong> case if <strong>the</strong> model for P (X) is misspecified, <strong>the</strong>n in general we cannot expect (Y0, Y1) to<br />
be independent (or even mean independent) <strong>of</strong> D conditional on Q(X). In this case, <strong>the</strong><br />
propensity score matching will in general provide inconsistent estimates <strong>of</strong> both E[Y1 −Y0] and<br />
E[Y1 − Y0|D = 1].<br />
1 See Heckman, Ichimura, and Todd [6], Hahn [4], and Abadie and Imbens [1].<br />
2 Since binary variables are bounded, <strong>the</strong>re is potentially less room for misspecification <strong>of</strong> a<br />
parametric model for binary outcome as compared to a continous one, see [2].<br />
4
3 The Restriction<br />
In order to ensure that <strong>the</strong> appropriate conditional expectations exist, propensity score match-<br />
ing is only valid over <strong>the</strong> region <strong>of</strong> common support <strong>of</strong> <strong>the</strong> densities <strong>of</strong> <strong>the</strong> propensity score con-<br />
ditional on <strong>the</strong> participation decision. 3 Following <strong>the</strong> influential work <strong>of</strong> Heckman, Ichimura,<br />
Smith, and Todd [5], researchers <strong>the</strong>refore compute estimates <strong>of</strong> both <strong>of</strong> <strong>the</strong>se densities to de-<br />
termine <strong>the</strong> appropriate region over which to perform matching. The test for misspecification<br />
that we develop in <strong>the</strong> following section will exploit a restriction that must hold between <strong>the</strong>se<br />
two densities.<br />
Throughout <strong>the</strong> following we will assume that <strong>the</strong> propensity score P (X) has a density with<br />
respect to Lebesgue measure. Let f(p) denote this density, f1(p) <strong>the</strong> density <strong>of</strong> <strong>the</strong> propensity<br />
score conditional on participation, and f0(p) <strong>the</strong> density <strong>of</strong> <strong>the</strong> propensity score conditional on<br />
nonparticipation. Using this notation, we have <strong>the</strong> following result:<br />
Lemma 3.1. Let α = Pr[D=0]<br />
Pr[D=1]<br />
p ∈ supp(P ) we have that<br />
and assume 0 < Pr[D = 0] < 1. Then for all 0 < p < 1 and<br />
f1(p) = α p<br />
1 − p f0(p) .<br />
P: Consider 0 < p < 1 and p ∈ supp(P ). For such p, by <strong>the</strong> Law <strong>of</strong> Iterated Expectations,<br />
we have that<br />
Bayes’ Theorem implies fur<strong>the</strong>r that<br />
Similarly, we have that<br />
Pr[D = 1|P (X) = p] = E[E[D|X]|P (X) = p] = p .<br />
f1(p) Pr[D = 1] = Pr[D = 1|P (X) = p]f(p) = pf(p) > 0 .<br />
f0(p) Pr[D = 0] = (1 − p)f(p) > 0 .<br />
Combining <strong>the</strong>se two implications, we have that<br />
f1(p) p<br />
= α<br />
f0(p) 1 − p ,<br />
from which <strong>the</strong> asserted conclusion follows immediately. <br />
Lemma 3.1 suggests that to <strong>the</strong> extent that <strong>the</strong> parametric model for P (X) is correctly<br />
specified, we would expect <strong>the</strong> stated relationship between <strong>the</strong> estimates <strong>of</strong> <strong>the</strong> conditional<br />
densities <strong>of</strong> <strong>the</strong> propensity score to hold approximately. The property can be easily checked<br />
3 <strong>Propensity</strong> score matching is only valid for <strong>the</strong> common support for <strong>the</strong> population but<br />
in practice, all we can make conclusions about is <strong>the</strong> common support in <strong>the</strong> sample. As <strong>the</strong><br />
sample size gets larger, <strong>of</strong> course, <strong>the</strong> two sets <strong>of</strong> common support converge.<br />
5
given estimates ˆ f1,n(·) and ˆ f0,n(·) by graphing ˆ f1,n(p) along with <strong>the</strong> function ˆαn p<br />
1−p ˆ f0,n(p),<br />
where ˆαn is <strong>the</strong> sample analogue <strong>of</strong> α. Dramatic departures <strong>of</strong> one graph from ano<strong>the</strong>r should<br />
be taken as evidence <strong>of</strong> misspecification <strong>of</strong> <strong>the</strong> propensity score.<br />
We now provide an illustration <strong>of</strong> this heuristic approach to detecting misspecifcation <strong>of</strong><br />
<strong>the</strong> propensity score. Define<br />
X1 = Z1<br />
X2 = Z1 + Z2<br />
√<br />
2<br />
(1a)<br />
, (1b)<br />
where Z1 and Z2 are independent standard normal variables. Consider <strong>the</strong> model<br />
D ∗ = 1 + X1 + X2 + X1X2 − ǫi, ǫ ∼ N(0, σ 2 ) (2a)<br />
D = 1 [D ∗ > 0] , (2b)<br />
where ǫ ⊥ (X1, X2), and 1[·] is <strong>the</strong> logical indicator function. We consider <strong>the</strong> results <strong>of</strong> fitting<br />
a misspecified model<br />
Q(X, β) = Φ(β0 + β1X1 + β2X2) (3)<br />
that only differs from <strong>the</strong> true model by omitting <strong>the</strong> quadratic term.<br />
We generate a sample <strong>of</strong> 10, 000 observations (Di, X1i, X2i, ǫi) according to <strong>the</strong> model given<br />
by (2a) and (2b). We <strong>the</strong>n estimate using maximum likelihood <strong>the</strong> propensity score under <strong>the</strong><br />
incorrectly specified model (3). Estimates <strong>of</strong> <strong>the</strong> densities <strong>of</strong> <strong>the</strong> propensity score conditional<br />
on <strong>the</strong> participation decision are <strong>the</strong>n constructed using kernel density estimation. 4<br />
Figure 1 depicts <strong>the</strong> resulting estimates <strong>of</strong> ˆ f1,n(p) and ˆαn p<br />
1−p ˆ f0,n(p). We see that graphs<br />
<strong>of</strong> <strong>the</strong>se two objects deviate substantially from each o<strong>the</strong>r, which can be taken as evidence <strong>of</strong><br />
misspecification.<br />
[FIGURE 1 ABOUT HERE]<br />
[FIGURE 2 ABOUT HERE]<br />
We also consider <strong>the</strong> case where <strong>the</strong> true data generating process is given by<br />
D ∗ = 1 + X1 + X2 − ǫ, ǫ ∼ U(−1, 1) (4)<br />
and we again fit <strong>the</strong> misspecified model (3). The estimated densities from this exercise are<br />
displayed in Figure 2. We find that <strong>the</strong> two graphs lie very close to one ano<strong>the</strong>r, suggesting<br />
4 We employ <strong>the</strong> biweight kernel and choose <strong>the</strong> bandwidth using Silverman’s rule <strong>of</strong> thumb,<br />
see [16].<br />
6
that misspecification is not very severe. This feature is perhaps not too surprising when one<br />
considers <strong>the</strong> fact that <strong>the</strong> only source <strong>of</strong> misspecification is <strong>the</strong> distribution for ǫ and <strong>the</strong><br />
assumed distribution shares many features with <strong>the</strong> true distribution. In particular, both are<br />
symmetric about zero.<br />
These examples shows that misspecification <strong>of</strong> <strong>the</strong> propensity score may lead to noticeable<br />
departures from <strong>the</strong> restriction <strong>of</strong> Lemma 3.1. In order to develop a formal test based on this<br />
restriction, it will be useful to restate it in terms <strong>of</strong> an equivalent orthogonality condition. To<br />
this end, we have <strong>the</strong> following result.<br />
Lemma 3.2. Let α = Pr[D=0]<br />
Pr[D=1] and assume that 0 < Pr[D = 1] < 1. Let Q be a random<br />
variable on <strong>the</strong> unit interval with density with respect to Lebesgue measure. Denote by g1(·) <strong>the</strong><br />
density <strong>of</strong> Q conditional on D = 1 and by g0(·) <strong>the</strong> density <strong>of</strong> Q conditional on D = 0. Then,<br />
g1(q) = α q<br />
1−q g0(q) for all q ∈ supp(Q) such that 0 < q < 1 if and only if E[D − Q|Q = q] = 0<br />
for all q ∈ supp(Q) such that 0 < q < 1.<br />
P: First consider necessity. Consider q ∈ supp(Q) such that 0 < q < 1. E[D − Q|Q =<br />
q] = 0 is equivalent to Pr[D = 1|Q = q] = E[D|Q = q] = q. For such q, Pr[D = 1|Q = q] = q<br />
and Bayes’ Theorem imply toge<strong>the</strong>r that<br />
g1(q) Pr[D = 1] = Pr[D = 1|Q = q]g(q) = qg(q) > 0 ,<br />
where g(q) is <strong>the</strong> density <strong>of</strong> <strong>the</strong> Q. Similarly, we have that<br />
g0(q) Pr[D = 0] = (1 − q)g(q) > 0 .<br />
Combining <strong>the</strong>se two implications, we have that<br />
g1(q) q<br />
= α<br />
g0(q) 1 − q ,<br />
from which <strong>the</strong> asserted conclusion follows. Now consider sufficiency. Assume that g1(q) =<br />
α q<br />
1−q g0(q) for all q ∈ supp(Q) such that 0 < q < 1. By Bayes’ Theorem,<br />
E[D|Q = q] =<br />
g1(q) Pr[D = 1]<br />
= q<br />
Pr[D = 0]g0(q) + Pr[D = 1]g1(q)<br />
where <strong>the</strong> last equality follows from plugging in g1(q) = α q<br />
1−q g0(q). It follows that E[D−Q|Q =<br />
q] = q, as desired. <br />
It is important to observe that this characterization <strong>of</strong> <strong>the</strong> restriction only involves low<br />
dimensional conditional expections. We will show in <strong>the</strong> following section that a consequence<br />
<strong>of</strong> this fact is that, unlike more conventional nonparametric tests for correct specification <strong>of</strong><br />
7
<strong>the</strong> propensity score, our test will not suffer from a curse <strong>of</strong> dimensionality and will <strong>the</strong>refore<br />
have much greater power in finite samples against many alternatives. The principle drawback<br />
<strong>of</strong> this approach is that <strong>the</strong>re will be certain alternatives for which we will not have power.<br />
To examine this issue fur<strong>the</strong>r, consider <strong>the</strong> following example. Suppose X = (X1, X2), and<br />
E[D|X1, X2] = E[D|X1]. Let Q(X) = E[D|X1]. Thus, Q(X) = P (X), and yet Q(X) will<br />
satisfy E[D − Q|Q] = 0. More generally, this restriction will not be able to detect <strong>the</strong> omission<br />
<strong>of</strong> covariates provided that <strong>the</strong> conditional expectation <strong>of</strong> D given <strong>the</strong> included covariates is<br />
correctly specified. Of course, this example is ra<strong>the</strong>r exceptional, since Q(X) is in this case<br />
correctly specified for a subvector <strong>of</strong> X.<br />
4 A Formal Test<br />
We now propose and develop a formal test for misspecification based on Lemma 3.2, which<br />
allows us to draw upon <strong>the</strong> large literature on testing orthogonality constraints. In particular,<br />
we adapt <strong>the</strong> testing methodology <strong>of</strong> Zheng [18] for our problem. Zheng considers testing if<br />
<strong>the</strong> conditional expectation E[Y |X] is correctly specified. Under his null hypo<strong>the</strong>sis, E[Y |X]<br />
is assumed to belong to a parametric family <strong>of</strong> real valued functions Q(X, θ) on R m ×Θ, where<br />
Θ ⊆ R l . Concretely, his null and alternative hypo<strong>the</strong>ses are<br />
H0: ∃θ0 ∈ Θ s.t. Pr[E[Y |X] = Q(X, θ0)] = 1.<br />
H1: Pr[E[Y |X] = Q(X, θ)] < 1 ∀θ ∈ Θ.<br />
Note that any θ0 satisfying <strong>the</strong> null hypo<strong>the</strong>sis also solves minθ∈Θ E[Y − Q(X, θ)] 2 .<br />
Zheng’s test is based on <strong>the</strong> following idea. Let ǫ = Y − Q(X, θ0) and let fX(·) denote <strong>the</strong><br />
density <strong>of</strong> X. Then, under <strong>the</strong> null hypo<strong>the</strong>sis,<br />
while under <strong>the</strong> alternative hypo<strong>the</strong>sis<br />
E[ǫE[ǫ|X]fX(X)] = 0 ,<br />
E[ǫE[ǫ|X]fX(X)] = E[[E[ǫ|X]] 2 fX(X)] > 0 .<br />
The last inequality follows because under <strong>the</strong> alternative hypo<strong>the</strong>sis (E[ǫ|X]) 2 > 0 with positive<br />
probability. Based on this observation, he uses <strong>the</strong> sample analogue <strong>of</strong> E[ǫE[ǫ|X]fX(X)] to<br />
form his test. The test statistic is given by<br />
ˆVn :=<br />
1<br />
n(n − 1)<br />
n <br />
1<br />
h<br />
i=1 j=i<br />
m n<br />
8<br />
<br />
Xi − Xj<br />
K<br />
ˆεiˆεj ,<br />
hn
where ˆεi := Yi − Q(Xi, ˆ θn), ˆ θn is a √ n-consistent estimator <strong>of</strong> arg minθ∈Θ E[Y − Q(X, θ)] 2 , h<br />
is a smoothing parameter, and K(·) is a kernel.<br />
Zheng shows that under certain assumptions<br />
where Σ can be consistently estimated by<br />
ˆΣn :=<br />
2<br />
n(n − 1)<br />
nh m/2 ˆVn n<br />
d → N(0, Σ)<br />
n <br />
1<br />
h<br />
i=1 j=i<br />
m n<br />
K 2<br />
<br />
Xi − Xj<br />
hn<br />
<br />
ˆε 2 i ˆε 2 j .<br />
We would like to test whe<strong>the</strong>r E[D|Q] = Q, where Q(X) = Q(X, θ0). Our null and<br />
alternative hypo<strong>the</strong>ses are <strong>the</strong>refore given by<br />
H0: ∃θ0 ∈ Θ s.t. Pr[E[D|Q(X, θ0)] = Q(X, θ0)] = 1<br />
H1: Pr[E[D|Q(X, θ)] = Q(X, θ)] < 1 ∀θ ∈ Θ<br />
Note that this differs from <strong>the</strong> framework <strong>of</strong> Zheng [18] in that <strong>the</strong> variable that is conditioned<br />
on is not observed but has to be estimated. Never<strong>the</strong>less, by analogy with <strong>the</strong> test statistic<br />
above, we consider testing based upon<br />
n<br />
<br />
1 1 Q(Xi,<br />
ˆVn :=<br />
K<br />
n(n − 1)<br />
ˆ θn) − Q(Xj, ˆ <br />
θn)<br />
ˆεiˆεj<br />
hn<br />
i=1 j=i<br />
where ˆεi = Di − Q(Xi, ˆ θn). Unfortunately, <strong>the</strong> analysis <strong>of</strong> Zheng [18] does not apply directly<br />
because in our case <strong>the</strong> estimated Q appears in <strong>the</strong> kernel function as well. Thus, we require<br />
<strong>the</strong> following extension <strong>of</strong> Zheng’s analysis:<br />
Theorem 4.1. Make <strong>the</strong> following assumptions:<br />
(i) {Di, Xi} n i=1 is a random sample;<br />
(ii) Q is Lipschitz continuous in (x, θ). Q(·, θ0) is continuously differentiable. Moreover, it<br />
has density with respect to Lebesgue measure on [0,1] that is continuously differentiable;<br />
(iii) K(·) is a nonnegative, bounded, symmetric and Lipschitz continuous function such that<br />
K(u)du = 1;<br />
(iv) The bandwidth sequence and <strong>the</strong> sequence <strong>of</strong> positive numbers {an} are such that:<br />
9<br />
hn
(a) n √ hn → ∞, (nh 3/2<br />
n ) −1 → c < ∞;<br />
(b)<br />
1<br />
anh 2 n<br />
→ a < ∞, na−2 n<br />
log n → ∞, n√hn a2 n<br />
→ b < ∞.<br />
Let ˆ θn be a √ n-consistent estimator <strong>of</strong> arg minθ∈Θ E[D − Q(X, θ)] 2 , where Θ is a compact<br />
and convex subset <strong>of</strong> R l . Then,<br />
hn<br />
i=1 j=n<br />
n hn ˆVn d → N(0, Σ)<br />
where Σ can be consistently estimated by<br />
n 2 <br />
n(n − 1)<br />
1<br />
K 2<br />
<br />
Q(Xi, ˆ θn) − Q(Xj, ˆ <br />
θn)<br />
P: See Appendix A. <br />
Assumptions (i)-(iii) are almost <strong>the</strong> same as Assumption 1 and 5 <strong>of</strong> [18] adapted to<br />
our problem. The only difference is that here we require that Q and K are Lipschitz con-<br />
tinuous functions. In addition, <strong>the</strong> requirement that ˆ θn be a √ n-consistent estimator <strong>of</strong><br />
arg minθ∈Θ E[D − Q(X, θ)] 2 is guaranteed by Assumptions 2-4 in [18]. The extra assump-<br />
tions on <strong>the</strong> bandwidth sequence are <strong>the</strong>re because in our analysis Q(x, ˆ θ) appears in <strong>the</strong><br />
argument <strong>of</strong> <strong>the</strong> kernel function as well.<br />
As noted earlier, <strong>the</strong>re are alternatives against which Zheng’s test has power tending to<br />
1, but our test does not. However, Zheng’s procedure requires estimation <strong>of</strong> expectations<br />
conditional on X, which in practice is high dimensional, and <strong>the</strong>refore suffers from a curse <strong>of</strong><br />
dimensionality. Our procedure only requires estimation <strong>of</strong> expectations conditional on Q and<br />
thus avoids this difficulty. As a result, we would expect our test to perform noticeably better<br />
in finite samples against many alternatives <strong>of</strong> interest.<br />
We now examine <strong>the</strong> finite sample properties <strong>of</strong> this testing procedure in a limited Monte<br />
Carlo study. Our setup will follow Zheng [18] closely. As before, let X1 and X2 be given by<br />
(1a) and (1b). We define now several different data generating processes for D ∗ that will be<br />
used at different points in our Monte Carlo study.<br />
hn<br />
ˆε 2 i ˆε 2 j<br />
D ∗ = 1 + X1 + X2 − ǫ, ǫ ∼ N(0, σ 2 ) (5)<br />
D ∗ = 1 + X1 + X2 + X1X2 − ǫ, ǫ ∼ N(0, σ 2 ) (6)<br />
D ∗ = (1 + X1 + X2) 2 − ǫ, ǫ ∼ N(0, σ 2 ) (7)<br />
D ∗ = 1 + X1 + X2 − ǫ, ǫ ∼ χ 2 1<br />
D ∗ = 1 + X1 + X2 − ǫ, ǫ ∼ U(−1, 1) (9)<br />
For each <strong>of</strong> <strong>the</strong>se data generating processes, D = 1[D ∗ > 0] and ǫ ⊥ (X1, X2).<br />
10<br />
(8)
Throughout <strong>the</strong> simulations presented below, we use <strong>the</strong> normal kernel given by<br />
K(u) = 1<br />
<br />
−u2 √ exp .<br />
2π 2<br />
1<br />
− The bandwidth hn is chosen to be equal to cn 5 . In our simulations, we report results for values<br />
<strong>of</strong> c equal to 0.05, 0.10, and 0.15. We consider samples sizes n equal to 100, 200, 400, 500, 800,<br />
and 1000. The number <strong>of</strong> replications for each simulation is always 1000. The bandwidth is<br />
chosen to fulfill <strong>the</strong> requirements from Theorem 4.1. 5<br />
We first examine <strong>the</strong> size <strong>of</strong> our test. The data is generated according to (5). We use our<br />
procedure to test <strong>the</strong> null hypo<strong>the</strong>sis<br />
H0: ∃β0 ∈ R 3 s.t. Pr[E[D|Q(X, β0)] = Q(X, β0)] = 1<br />
H1: Pr[E[D|Q(X, β)] = Q(X, β)] < 1 ∀β ∈R 3 ,<br />
where Q(X, β) = Φ(β0+β1X1+β2X2). These results are summarized in Table 1. Our principle<br />
finding is that <strong>the</strong> actual size <strong>of</strong> <strong>the</strong> test is conservative and sensitive to <strong>the</strong> choice <strong>of</strong> c, but,<br />
as expected, <strong>the</strong> actual size approaches <strong>the</strong> asymptotic size as <strong>the</strong> sample size increases.<br />
We now go on to consider <strong>the</strong> power <strong>of</strong> our test against certain misspecifications <strong>of</strong> <strong>the</strong><br />
propensity score. In Tables 2 - 5, we consider four different scenarios in which <strong>the</strong> true data<br />
generating process is given by (6) - (9), respectively. For each scenario, <strong>the</strong> null hypo<strong>the</strong>sis is<br />
<strong>the</strong> same as <strong>the</strong> one above. The test performs admirably when <strong>the</strong> true data generating process<br />
is given by (6) - (8), showing high power for even moderately sized samples. Recall that (6)<br />
is precisely that <strong>of</strong> our heuristic from Lemma 3.1 presented in Figure 1. Given <strong>the</strong> noticeable<br />
departure <strong>of</strong> <strong>the</strong> two graphs in Figure 1, it is not surprising that our test performs well. The<br />
test performs less well, however, when <strong>the</strong> true data generating process is given by (9), which<br />
is again consonant with our earlier findings in Figure 2. This outcome is to be expected due to<br />
<strong>the</strong> shared features <strong>of</strong> <strong>the</strong> distributions for ǫ. <strong>On</strong> <strong>the</strong> o<strong>the</strong>r hand, as evidenced by Table 4, <strong>the</strong><br />
test performs much better when <strong>the</strong> distribution is assumed to be χ2 1 , which is not symmetric<br />
about zero.<br />
We consider next a scenario in which <strong>the</strong> true data generating process is (5) and <strong>the</strong> null<br />
hypo<strong>the</strong>sis is <strong>the</strong> same as <strong>the</strong> one above, but now Q(X, β) = β0 + β1X1 + β2X2. Naturally, we<br />
would expect any sensible test to have high power against such alternatives. Q(X, β) is not<br />
5 − The bandwidth employed in [18], c·n 2<br />
5 does not fulfill <strong>the</strong> assumptions required by <strong>the</strong>orem<br />
1<br />
− 4.1. However, using this bandwidth instead <strong>of</strong> c · n 5 does not change <strong>the</strong> conclusion from <strong>the</strong><br />
Monte Carlo analysis presented below. Results are available on request.<br />
11
even guaranteed to be between 0 and 1. Fortunately, we find that our test has high power for<br />
even very small sample sizes in this situation. These results are summarized in Table 6.<br />
Finally, we compare our test with <strong>the</strong> test proposed by Zheng [18]. First, we consider <strong>the</strong><br />
same setup <strong>of</strong> Table 2 above, where <strong>the</strong> data generating process is given by (6). The results are<br />
presented in Table 7. As noted earlier, Zheng’s test suffers from a curse <strong>of</strong> dimensionality and<br />
so we might expect that our test would perform better in finite samples for many alternatives.<br />
Indeed, this feature is borne out by our Monte Carlo study: our test rejects <strong>the</strong> null hypo<strong>the</strong>sis<br />
much more frequently for nearly all samples sizes. To investigate <strong>the</strong> severity <strong>of</strong> this problem,<br />
we consider a fur<strong>the</strong>r model which includes an additional covariate. Define<br />
X3 = Z1 + Z3<br />
√<br />
2<br />
where Z3 is a standard normal random variable independent <strong>of</strong> Z1 and Z2. The data generating<br />
process for D ∗ in this instance is given by<br />
D ∗ = 1 + X1 + X2 + X1X2 + X3 − ǫ, ǫ ∼ N(0, σ 2 ) ,<br />
where ǫ ⊥ (X1, X2, X3). The null hypo<strong>the</strong>sis is as before but now Q(X, β) = Φ(β0 + β1X1 +<br />
β2X2 + β3X3). The results <strong>of</strong> this comparison are presented in Tables 8 and 9. In this case,<br />
with <strong>the</strong> additional covariate, we find that our test retains high power at all sample sizes, but<br />
<strong>the</strong> highest power <strong>of</strong> Zheng’s test is approximately 20 percent and <strong>of</strong>ten much lower. Thus, we<br />
believe that our test will in practice be <strong>of</strong> much greater use than Zheng’s, especially when X<br />
is high dimensional. Of course, it should be emphasized again that this advantage <strong>of</strong> our test<br />
comes at <strong>the</strong> expense <strong>of</strong> power against certain alternatives.<br />
[TABLE 1 ABOUT HERE]<br />
[TABLE 2 ABOUT HERE]<br />
[TABLE 3 ABOUT HERE]<br />
[TABLE 4 ABOUT HERE]<br />
[TABLE 5 ABOUT HERE]<br />
[TABLE 6 ABOUT HERE]<br />
[TABLE 7 ABOUT HERE]<br />
12<br />
,
5 Conclusion<br />
[TABLE 8 ABOUT HERE]<br />
[TABLE 9 ABOUT HERE]<br />
In this paper, we have shown that <strong>the</strong> commonly computed estimates <strong>of</strong> <strong>the</strong> densities <strong>of</strong> <strong>the</strong><br />
propensity score conditional on participation provide a means <strong>of</strong> examining whe<strong>the</strong>r <strong>the</strong> para-<br />
metric model is correctly specified. In particular, correct specification <strong>of</strong> <strong>the</strong> propensity score<br />
implies a certain restriction between <strong>the</strong> estimated densities. We have shown fur<strong>the</strong>r that this<br />
restriction is equivalent to an orthogonality restriction, which can be used as <strong>the</strong> basis <strong>of</strong> a<br />
formal test for correct specification. While our test does not have power against all forms <strong>of</strong><br />
misspecification <strong>of</strong> <strong>the</strong> propensity score, we argue that for a large class <strong>of</strong> alternatives our test<br />
will perform better in finite samples than existing tests that have power against all forms <strong>of</strong><br />
misspecification. Our Monte Carlo study <strong>of</strong> <strong>the</strong> finite sample behavior <strong>of</strong> our test corroborates<br />
this hypo<strong>the</strong>sis. The study shows fur<strong>the</strong>r that our test performs well against many types <strong>of</strong><br />
misspecification. Since our test is also easily implemented, it is our hope that this work will<br />
persuade researchers to examine <strong>the</strong> specification <strong>of</strong> <strong>the</strong>ir model for <strong>the</strong> propensity score, as<br />
its validity is essential for consistency <strong>of</strong> <strong>the</strong>ir estimates <strong>of</strong> <strong>the</strong> treatment effect.<br />
A Pro<strong>of</strong> <strong>of</strong> Theorem 4.1:<br />
We start by proving <strong>the</strong> first claim, namely, that<br />
n hn ˆVn d → N(0, Σ). (10)<br />
We start proving this result by first showing that<br />
<br />
ˆVn − Vn0 = oP (1), (11)<br />
where<br />
ˆVn :=<br />
with ˆεi = Di − Q(Xi, ˆ θ), and<br />
Vn0 :=<br />
1<br />
n(n − 1)<br />
1<br />
n(n − 1)<br />
nh 1/2<br />
n<br />
n<br />
<br />
1 Q(Xi,<br />
K<br />
ˆ θ) − Q(Xj, ˆ <br />
θ)<br />
ˆεiˆεj,<br />
h<br />
hn<br />
i=1 j=i<br />
n <br />
<br />
1 Q(Xi, θ0) − Q(Xj, θ0)<br />
K<br />
hn<br />
i=1 j=i<br />
13<br />
hn<br />
εi0εj0,
with εi0 = Di − Q(Xi, θ0). Then we use <strong>the</strong> analysis in [18] to argue that<br />
so that combining (11) and (12) gives us (10).<br />
n hnVn0 d → N(0, Σ), (12)<br />
To prove (11) we are going to rely on Lemma 3 given on page 286 <strong>of</strong> Heckman, Ichimura,<br />
and Todd [6]. To use that lemma, we define 6<br />
Θn := {θ<br />
<br />
∈ Θ : an||θ − θ0|| ≤ ǫθ}<br />
Ψn :=<br />
ψn0(Di,Xi,Dj,Xj) :=<br />
ˆψn(Di,Xi,Dj,Xj) :=<br />
so that n √ hn ˆVn = <br />
i<br />
ψn : ψn(Di, Xi, Dj, Xj) = εiεj<br />
(n−1) √ hn<br />
Q(Xi,θ0)−Q(Xj,θ0)<br />
εi0εj0<br />
(n−1) √ hn K<br />
ˆεi ˆεj<br />
(n−1) √ hn K<br />
hn <br />
Q(Xi, ˆ θ)−Q(Xj, ˆ θ)<br />
hn<br />
j=i ˆ ψn(i, j) and n √ hnVn0 = <br />
will verify two conditions: (i) that <strong>the</strong> U-process <br />
i<br />
<br />
i<br />
K<br />
Q(Xi,θ)−Q(Xj,θ)<br />
hn<br />
<br />
, θ ∈ Θn<br />
<br />
j=i ψn0(i, j). To show (11), we<br />
<br />
j=i ψn(i, j) is equicontinuous over Ψn<br />
in a neighborhood <strong>of</strong> ψn0, and (ii) that, with probability approaching to 1, ˆ ψn lies in that<br />
neighborhood.<br />
To show equicontinuity, we will rely on <strong>the</strong> equicontinuity lemma which requires a sym-<br />
metric and degenerate process. The process <br />
i<br />
<br />
j=i ψn(i, j) is symmetric, if <strong>the</strong> kernel is<br />
symmetric, but it is not degenerate. By repeatedly adding and subtracting certain terms,<br />
however, we see that <br />
i,j ψn(i, j) equals<br />
<br />
<br />
εiεj<br />
(n − 1) √ <br />
Q(Xi, θ) − Q(Xj, θ)<br />
εiεj<br />
K<br />
− E<br />
hn<br />
hn<br />
(n − 1) √ <br />
Q(Xi, θ) − Q(Xj, θ)<br />
K<br />
|Dj, Xj<br />
hn<br />
hn<br />
i<br />
+ n√ hn<br />
n−1<br />
j=i<br />
− <br />
j<br />
+ <br />
j<br />
<br />
εi0εj0 j E hn K<br />
n √ hnεj0<br />
n − 1 E<br />
Q(Xi ,θ)−Q(X j ,θ)<br />
hn<br />
hn<br />
<br />
|Dj,Xj<br />
<br />
<br />
εi0<br />
−[Q(Xj,θ)−Q(Xj,θ0)]E hn K<br />
hn<br />
Q(Xi ,θ)−Q(X j ,θ)<br />
<br />
[Q(Xi, θ) − Q(Xi, θ0)] Q(Xi, θ) − Q(Xj, θ)<br />
K<br />
|Dj, Xj<br />
n √ hn[Q(Xj, θ) − Q(Xj, θ0)]<br />
E<br />
n − 1<br />
hn<br />
hn<br />
hn<br />
<br />
|Dj,Xj<br />
<br />
(13)<br />
(14)<br />
<br />
[Q(Xi, θ) − Q(Xi, θ0)] Q(Xi, θ) − Q(Xj, θ)<br />
K<br />
|Dj, Xj<br />
By <strong>the</strong> law <strong>of</strong> iterated expectations, <strong>the</strong> expression on <strong>the</strong> second line is zero. Fur<strong>the</strong>rmore,<br />
<strong>the</strong> expressions in (13) and (14) are respectively second and first order symmetric degenerate<br />
U-processes, thus, <strong>the</strong>ir asymptotic properties can be analyzed using Lemma 3 <strong>of</strong> Heckman,<br />
6 an is an increasing sequence <strong>of</strong> numbers that satisfies certain conditions.<br />
14<br />
(15)
Ichimura, and Todd [6]. In addition, <strong>the</strong> expression in (15) can be fur<strong>the</strong>r broken into<br />
n √ hn<br />
n − 1<br />
<br />
<br />
[Q(Xj, θ) − Q(Xj, θ0)][Q(Xi, θ) − Q(Xi, θ0)] Q(Xi, θ) − Q(Xj, θ)<br />
E<br />
K<br />
|Dj, Xj<br />
j<br />
hn<br />
<br />
[Q(Xj, θ) − Q(Xj, θ0)][Q(Xi, θ) − Q(Xi, θ0)] Q(Xi, θ) − Q(Xj, θ)<br />
−E<br />
K<br />
hn<br />
(16)<br />
+n <br />
[Q(Xj, θ) − Q(Xj, θ0)][Q(Xi, θ) − Q(Xi, θ0)] Q(Xi, θ) − Q(Xj, θ)<br />
hnE<br />
K<br />
.<br />
hn<br />
hn<br />
(17)<br />
(16) is a degenerate order 1 process, and hence could be analyzed using <strong>the</strong> same lemma. We<br />
study (17) at <strong>the</strong> end.<br />
Our first step is verifying <strong>the</strong> conditions <strong>of</strong> <strong>the</strong> equicontinuity lemma for {ψn(i, j) −<br />
2<br />
E[ψn(i, j)|j]}. Note that since |D − Q(x, θ)| ≤ 1 for all x, θ,<br />
(n−1) √ <br />
Q(Xi,θ)−Q(Xj,θ) <br />
K<br />
is<br />
hn<br />
hn<br />
an envelope for this process. Moreover,<br />
<br />
lim sup<br />
4<br />
<br />
E K<br />
n→∞<br />
2<br />
<br />
Q(Xi, θ)− Q(Xj, θ)<br />
= lim sup<br />
<br />
Q(Xi, θ)− Q(Xj, θ)<br />
.<br />
n→∞<br />
i<br />
j<br />
(n − 1) 2 hn<br />
In addition,<br />
<br />
1<br />
E K<br />
hn<br />
2<br />
<br />
Q(Xi, θ)− Q(Xj, θ)<br />
=<br />
hn<br />
1<br />
<br />
E K<br />
hn<br />
2<br />
<br />
Q(Xi, θ)− Q(Xj, θ)<br />
hn<br />
<br />
1 Q(Xi, θ0)−Q(Xj, θ0)<br />
+E<br />
.<br />
K<br />
hn<br />
2<br />
hn<br />
hn<br />
hn<br />
<br />
4<br />
E K<br />
hn<br />
2<br />
−K 2<br />
hn<br />
hn<br />
<br />
Q(Xi, θ0)−Q(Xj, θ0)<br />
Since K and Q are Lipschitz, and K is bounded, for some constant C, <strong>the</strong> first expression is<br />
bounded by C/(anh 2 n ), which has a finite limit. Moreover, since Q(X, θ0) is assumed to have<br />
a Lebesgue density, denoted by fQ, we have<br />
<br />
Q(Xi, θ0)− Q(Xj, θ0)<br />
=<br />
<br />
1<br />
E K<br />
hn<br />
2<br />
hn<br />
hn<br />
K 2 (u)fQ(Q(Xj, θ0) + uhn)dufQ(Q(Xj, θ0))dQ.<br />
But boundedness <strong>of</strong> K implies that this last expression has a finite limit as well. Thus, <strong>the</strong> first<br />
condition <strong>of</strong> <strong>the</strong> equicontinuity lemma is satisfied for our process. To<br />
verify <strong>the</strong> second condi-<br />
1<br />
tion, we note that for each δ > 0, if <strong>the</strong> kernel function is bounded, 1<br />
(n−1) √ hn K hn<br />
equals 1 for only finitely many n because n √ hn → ∞. <strong>On</strong> <strong>the</strong> o<strong>the</strong>r hand, by <strong>the</strong> dominated<br />
convergence <strong>the</strong>orem, we have<br />
<br />
limn→∞ E 1 <br />
= E<br />
<br />
limn→∞<br />
(n−1) 2<br />
<br />
i<br />
i<br />
<br />
j<br />
<br />
j 1<br />
hn K2<br />
1<br />
(n−1) 2 hn K2<br />
Q(Xi,θ)−Q(Xj,θ)<br />
hn <br />
Q(Xi,θ)−Q(Xj,θ)<br />
hn<br />
15<br />
<br />
1 1<br />
(n−1) √ hn K<br />
<br />
1 1<br />
(n−1) √ hn K<br />
Q(Xi,θ)−Q(Xj,θ)<br />
hn <br />
Q(Xi,θ)−Q(Xj,θ)<br />
hn<br />
Q(Xi,θ)−Q(Xj,θ)<br />
<br />
<br />
> δ<br />
<br />
> δ<br />
= 0.<br />
<br />
<br />
> δ
Thus, <strong>the</strong> second condition <strong>of</strong> <strong>the</strong> equicontinuity lemma is also satisfied.<br />
Next, we verify <strong>the</strong> same conditions for <strong>the</strong> process given in (14). First, we observe that<br />
we can rewrite each term <strong>of</strong> that summation as<br />
n √ hnεj0<br />
E<br />
an(n − 1)hn<br />
<br />
an[Q(Xi, θ) − Q(Xi, θ0)]K<br />
<br />
Q(Xi, θ) − Q(Xj, θ)<br />
|Xj .<br />
Since Q is Lipschitz with Lipschitz constant LQ, each <strong>of</strong> <strong>the</strong>se terms is bounded by<br />
n √ hn|εj0|LQǫθ<br />
an(n − 1) E<br />
<br />
1 <br />
<br />
hn<br />
K <br />
Q(Xi, θ) − Q(Xj, θ) <br />
|Xj .<br />
hn<br />
Then using <strong>the</strong> Lipschitz continuity <strong>of</strong> K and Q, we have<br />
n √ hn|εj0|LQǫθ<br />
an(n − 1) E<br />
<br />
1 <br />
<br />
hn<br />
K <br />
Q(Xi, θ) − Q(Xj, θ) <br />
|Xj<br />
hn<br />
= n|εj0|LQǫθ<br />
an(n − 1) √ <br />
Q(Xi, θ) − Q(Xj, θ) Q(Xi, θ0) − Q(Xj, θ0) <br />
E K<br />
−K<br />
|Xj +<br />
hn<br />
hn<br />
hn<br />
n|εj0|LQǫθ<br />
an(n − 1) √ <br />
Q(Xi, θ0) − Q(Xj, θ0) <br />
E K<br />
|Xj<br />
hn<br />
hn<br />
nC<br />
≤<br />
a2 nh 3/2<br />
n (n − 1) |εj0| + n√hn|εj0|LQǫθ an(n − 1) E<br />
<br />
1 <br />
<br />
hn<br />
K <br />
Q(Xi, θ0) − Q(Xj, θ0) <br />
|Xj ,<br />
hn<br />
where C = 2LKL2 gǫ2 θ . Then using a change <strong>of</strong> variables operation, we see that <strong>the</strong> second term<br />
in <strong>the</strong> last expression equals<br />
n √ hn|εj0|LQǫθ<br />
an(n − 1)<br />
<br />
hn<br />
<br />
|K (u)| fQ Q(Xj, θ0) + uhn du.<br />
Since K is bounded <strong>the</strong> integrals in this last expression are all finite. Therefore,<br />
n √ hn<br />
an(n − 1) |εj0|<br />
<br />
1<br />
<br />
,<br />
anh2 C + C1<br />
n<br />
where C1 is <strong>the</strong> appropriate constant, is an envelope for (14). To verify <strong>the</strong> first condition <strong>of</strong><br />
<strong>the</strong> equicontinuity lemma, consider<br />
n<br />
lim sup<br />
n→∞<br />
j<br />
2hn a2 n(n − 1) 2 E ε 2 <br />
j0<br />
1<br />
anh2 2 nhn<br />
C + C1 ≤ lim sup<br />
n<br />
n→∞ a2 <br />
1<br />
n anh2 2 C + C1 .<br />
n<br />
Sufficient conditions for this to be finite are that limn→∞ nhna −2<br />
n < ∞, and limn→∞ 1<br />
anh 2 n<br />
< ∞,<br />
which are implied by our assumptions. Thus <strong>the</strong> first condition <strong>of</strong> <strong>the</strong> equicontinuity lemma<br />
is satisfied.<br />
16
To check <strong>the</strong> second condition <strong>of</strong> <strong>the</strong> lemma, let δ > 0,<br />
<br />
limn→∞ j<br />
n 2 hn<br />
a2 n (n−1)2<br />
<br />
1<br />
anh2 2 <br />
C+C1 E ε<br />
n<br />
2 j01 =limn→∞ hn<br />
a2 <br />
1<br />
n anh2 C+C1<br />
n<br />
≤limn→∞ hn<br />
a 2 n<br />
<br />
1<br />
anh2 C+C1<br />
n<br />
2 <br />
j E<br />
<br />
ε2 j01 2 <br />
j E<br />
<br />
n √ hn|εj0 |<br />
an(n−1)<br />
n √ hn|εj0 |<br />
an(n−1)<br />
ε2 j01 √<br />
n hn<br />
an(n−1)<br />
<br />
1<br />
anh2 <br />
C+C1 >δ<br />
n<br />
<br />
1<br />
anh2 C+C1<br />
n<br />
<br />
1<br />
anh2 C+C1<br />
n<br />
<br />
>δ<br />
<br />
>δ ,<br />
where we use <strong>the</strong> fact that |εj0| ≤ 1 to write <strong>the</strong> last inequality. Note that limn→∞[1/(anh2 n )] <<br />
∞ implies that 1/(a2 nh 3/2<br />
n ) → 0. This in turn implies that for each δ > 0 only finitely many<br />
terms in <strong>the</strong> last summation will be non-zero, and <strong>the</strong>refore <strong>the</strong> second condition <strong>of</strong> <strong>the</strong> lemma<br />
is satisfied as well.<br />
<strong>On</strong> <strong>the</strong> o<strong>the</strong>r hand, <strong>the</strong> same arguments show that <strong>the</strong> same conditions are satisfied for <strong>the</strong><br />
process<br />
<br />
<br />
n[Q(Xj, θ) − Q(Xj, θ0)] Q(Xi, θ) − Q(Xi, θ0) Q(Xi, θ) − Q(Xj, θ)<br />
E √ K<br />
|Xj<br />
n − 1<br />
hn<br />
hn<br />
j<br />
<br />
n[Q(Xj, θ) − Q(Xj, θ0)][Q(Xi, θ) − Q(Xi, θ0)]<br />
− E<br />
(n − 1) √ <br />
Q(Xi, θ) − Q(Xj, θ)<br />
K<br />
hn<br />
hn<br />
To verify <strong>the</strong> third condition note that since Θ is a compact subset <strong>of</strong> R l , its L 2 packing<br />
number, D2(τ, Θ), is less than or equal to 2lτ −l . If K(·) is bounded and Lipschitz, Q(·, ·) is<br />
Lipschitz, and n−1h −3/2<br />
n = O(1), <strong>the</strong>n D2(τ, Ψn) ≤ C · τ −l . Thus, <strong>the</strong> third condition <strong>of</strong> <strong>the</strong><br />
lemma is also satisfied.<br />
We know that ψn0 ∈ Ψn. In addition, if<br />
√ n( ˆ θ − θ0) d → N(0, Σθ),<br />
<strong>the</strong>n for any {an} sequence such that an/ √ n → 0, we will have an( ˆ θ − θ0) P → 0, and ˆ ψn will<br />
belong to Ψn with probability approaching to 1.<br />
<br />
<br />
<br />
<br />
j<br />
The arguments given so far allow us to argue that<br />
<br />
<br />
<br />
ˆψn(i, j) − E[<br />
i,j<br />
ˆ <br />
<br />
ψn(i, j)|j] − ψn0(i, j) − E[ψn0 (i, j)|j]<br />
<br />
<br />
=oP (1),<br />
<br />
<br />
<br />
√<br />
hnεj0<br />
<br />
<br />
n − 1 E<br />
<br />
(εi0 − ˆεi) 1<br />
<br />
Q(Xi,<br />
K<br />
hn<br />
ˆ θ) − Q(Xj, ˆ <br />
θ)<br />
|Xj<br />
hn<br />
<br />
<br />
=oP (1), and<br />
n √ hn(εj0 −ˆε j )<br />
n−1<br />
E<br />
(εi0 −ˆε i )<br />
hn<br />
i,j<br />
K<br />
Q(Xi , ˆ θ)−Q(X j , ˆ θ)<br />
hn<br />
<br />
|Xj<br />
<br />
−E<br />
17<br />
n(εj0 −ε j )(ε i0 −ˆε i )<br />
(n−1) √ hn<br />
K<br />
Q(Xi , ˆ θ)−Q(X j , ˆ θ)<br />
hn<br />
=oP (1).
The last thing we need to show is that<br />
lim<br />
n→∞ E<br />
<br />
<br />
n(εj0 − ˆεj)(εi0 − ˆεi) Q(Xi,<br />
√ K<br />
hn<br />
ˆ θ) − Q(Xj, ˆ <br />
θ)<br />
= 0.<br />
hn<br />
But this expression equals<br />
<br />
an[Q(Xj, ˆ θ) − Q(Xj, θ0)]an[Q(Xi, ˆ <br />
θ) − Q(Xi, θ0)] Q(Xi,<br />
K<br />
ˆ θ) − Q(Xj, ˆ <br />
θ)<br />
.<br />
n √ hn<br />
a2 E<br />
n<br />
Since n√ hn<br />
a 2 n<br />
lim<br />
n→∞ E<br />
hn<br />
= O(1), <strong>the</strong> above expression will go to 0 if we can show that<br />
<br />
an[Q(Xj, ˆ θ) − Q(Xj, θ0)]an[Q(Xi, ˆ <br />
θ) − Q(Xi, θ0)] Q(Xi,<br />
K<br />
ˆ θ) − Q(Xj, ˆ <br />
θ)<br />
= 0.<br />
hn<br />
To show (18) we will first argue that<br />
hn<br />
hn<br />
(18)<br />
an|| ˆ θ − θ0|| → 0 a.s. (19)<br />
Then using this almost sure convergence result and <strong>the</strong> Lipshitz continuity <strong>of</strong> Q and K we are<br />
going to argue that<br />
an[Q(Xj, ˆ θ) − Q(Xj, θ0)]an[Q(Xi, ˆ <br />
θ) − Q(Xi, θ0)] Q(Xi,<br />
K<br />
ˆ θ) − Q(Xj, ˆ <br />
θ)<br />
hn<br />
is almost surely bounded by an integrable function. Then <strong>the</strong> result will follow from Theorem<br />
1.3.6 <strong>of</strong> [14].<br />
Since our estimator ˆ θ for θ0 is defined as <strong>the</strong> minimizer <strong>of</strong><br />
ˆϕn(θ) := 1<br />
n<br />
n<br />
[Dj − Q(Xj, θ)] 2<br />
j=1<br />
over a compact parameter space Θ, <strong>the</strong> first step towards showing (19) is to show that<br />
where<br />
1<br />
a −1<br />
n<br />
sup | ˆϕn(θ) − ϕ0(θ)| → 0 a.s., (20)<br />
θ∈Θ<br />
ϕ0(θ) := E[(Dj − Q(Xj, θ)) 2 ].<br />
To show (20) we rely on Theorem 37 from Pollard [10]. We apply that <strong>the</strong>orem on <strong>the</strong> following<br />
family <strong>of</strong> functions<br />
Φn := {(D − Q(x, θ)) 2 /M : θ ∈ Θ}<br />
18<br />
hn
where M = sup θ∈Θ[D−Q(x, θ)] 2 . Since Θ is a compact subset <strong>of</strong> a finite dimensional Euclidean<br />
space, <strong>the</strong> covering number condition needed to apply <strong>the</strong> <strong>the</strong>orem is satisfied by Theorem 4.8<br />
<strong>of</strong> Pollard [11], and applying Theorem 37 <strong>of</strong> Pollard [10] with δn = M and αn = a −1<br />
n<br />
<strong>the</strong> desired result (20) as long as nM2a −2<br />
n<br />
log n<br />
= M 2 na −2<br />
n<br />
log n<br />
gives us<br />
→ ∞. Then, a small modification <strong>of</strong> <strong>the</strong><br />
consistency pro<strong>of</strong> in [9] yields an|| ˆ θ − θ0|| → 0 a.s.. This in turn implies that for ǫθ > 0 and<br />
sufficiently large n, we almost surely have<br />
<br />
<br />
an[Q(Xj,<br />
<br />
<br />
ˆ θ) − Q(Xj, θ0)]an[Q(Xi, ˆ <br />
θ) − Q(Xi, θ0)] Q(Xi,<br />
K<br />
hn<br />
ˆ θ) − Q(Xj, ˆ <br />
θ)<br />
<br />
hn<br />
<br />
<br />
<br />
<br />
K<br />
<br />
Q(Xi, ˆ θ) − Q(Xj, ˆ <br />
θ)<br />
<br />
≤ ǫ3 θLKLQ <br />
<br />
<br />
K <br />
Q(Xi, θ0) − Q(Xj, θ0) <br />
.<br />
≤ ǫ2 θ<br />
hn<br />
Since lim 1<br />
anh 2 n<br />
[14]. Thus,<br />
hn<br />
< ∞ and E 1<br />
<br />
<br />
K hn<br />
anh 2 n<br />
Q(Xi,θ0)−Q(Xj,θ0)<br />
hn<br />
+ ǫ2 θ<br />
hn<br />
n <br />
hn<br />
ˆVn − Vn0 = oP (1).<br />
hn<br />
< ∞, (18) follows from Theorem 1.3.6 <strong>of</strong><br />
Combining this with Lemma 3.3a <strong>of</strong> [18] completes <strong>the</strong> pro<strong>of</strong> for <strong>the</strong> first part <strong>of</strong> Theorem 4.1.<br />
Next we argue that Σ = 2 K2 (u)du [E(D2 i |Qi0)] 2f 2 Q (Qi0)dQi0<br />
<br />
, and Σ can be consistently<br />
estimated by<br />
n 2 1<br />
K<br />
n(n − 1)<br />
2<br />
<br />
Q(Xi, ˆ θ) − Q(Xj, ˆ <br />
θ)<br />
ˆε 2 i ˆε 2 j.<br />
hn<br />
i=1 j=n<br />
We argue this in two steps again. First, note that if K is symmetric, bounded and Lipschitz,<br />
<strong>the</strong>n so is K2 . Thus, under <strong>the</strong> same conditions as before <strong>the</strong> analysis above yields that,<br />
n<br />
<br />
2 1<br />
K<br />
n(n − 1)<br />
2<br />
<br />
Q(Xi, ˆ θ) − Q(Xj, ˆ <br />
θ)<br />
ˆε 2 i ˆε 2 j −K 2<br />
<br />
Q(Xi, θ0) − Q(Xj, θ0)<br />
ε 2 i ε 2 <br />
j = op(1).<br />
hn<br />
i=1 j=i<br />
hn<br />
Second, <strong>the</strong> arguments in <strong>the</strong> pro<strong>of</strong> <strong>of</strong> Lemma 3.3e <strong>of</strong> [18] show that<br />
hn<br />
i=1 j=i<br />
hn<br />
n 2 1<br />
K<br />
n(n − 1)<br />
2<br />
<br />
Q(Xi, θ0) − Q(Xj, θ0)<br />
Combining <strong>the</strong>se gives us <strong>the</strong> second result in Theorem 4.1. <br />
References<br />
hn<br />
hn<br />
<br />
ε 2 i ε 2 j = Σ + oP (1).<br />
[1] A, A., G. I (2004): “Simple and Bias-Corrected Matching Estimators<br />
for Average Treatment Effect,” mimeo, UC-Berkeley and Harvard University.<br />
19<br />
(21)
[2] B, D. A., J. A. S (2004): “How Robust is <strong>the</strong> Evidence on <strong>the</strong> Effects <strong>of</strong><br />
College Quality? Evidence from Matching,” Journal <strong>of</strong> Econometrics, 121, 99—124.<br />
[3] D, R. H., S. W (2002): “<strong>Propensity</strong> Score Matching Methods for Non-<br />
experimental Causal Studies,” The Review <strong>of</strong> Economics and Statistics, 84, 151—161.<br />
[4] H, J. (1998): “<strong>On</strong> <strong>the</strong> Role <strong>of</strong> <strong>the</strong> <strong>Propensity</strong> Score in Efficient Semiparametric Esti-<br />
mation <strong>of</strong> Average Treatment Effects,” Econometrica, 66, 315—331.<br />
[5] H, J., H. I, J. S, P. T (1998): “Characterizing Selection<br />
Bias Using Experimental Data,” Econometrica, 66(5), 1017=1098.<br />
[6] H, J., H. I, P. T (1998): “Matching as an Econometric Eval-<br />
uation Estimator,” Review <strong>of</strong> Economic Studies, 65(2), 261—204.<br />
[7] H, J., R. LL, J. S (1999): “The Economics and Ecconometrics<br />
<strong>of</strong> Active Labor Market Programs,” in Handbook <strong>of</strong> Labor Economics, ed. by A. Orley, and<br />
D. Card. Elsevier Science, North Holland, Amsterdam, New York and Oxford.<br />
[8] L, M. (2002): “Program Heterogeneity and <strong>Propensity</strong> Score Matching: an Appli-<br />
cation to <strong>the</strong> Evaluation <strong>of</strong> Active Labor Market Policies,” The Review <strong>of</strong> Economics and<br />
Statistics, 84, 205—220.<br />
[9] N, W., D. MF (1994): “Large Sample Estimation and Hypo<strong>the</strong>sis<br />
Testing,” in Handbook <strong>of</strong> Econometrics, ed. by R. Engle, and D. McFadden. North-Holland,<br />
Amsterdam.<br />
[10] P, D. (1984): Convergence <strong>of</strong> Stochastic Processes. Springer-Verlag, New York.<br />
[11] (1990): “Empirical Processes: Theory and Applications,” in NSF-CBMS Re-<br />
gional Conference Series in Probability and Statistics, vol. 2. Institute <strong>of</strong> Ma<strong>the</strong>matical<br />
Statistics, Hayward, American Statistical Association, Alexandria.<br />
[12] R, P. R., D. B. R (1985): “Constructing a Control Group Us-<br />
ing Multivariate Matched Sampling Methods That Incorporate <strong>the</strong> <strong>Propensity</strong> Score,” The<br />
American Statistician, 39, 33—38.<br />
[13] R, R., D. B. R (1983): “The Central Role <strong>of</strong> <strong>the</strong> <strong>Propensity</strong> Score<br />
in Observational Studies for Causal Effects,” Biometrika, 70, 41—55.<br />
[14] S, R. J. (1980): Approximation Theory <strong>of</strong> Ma<strong>the</strong>matical Statistics. Wiley, Hobo-<br />
ken, New Jersey.<br />
20
[15] S, B. (2004): “An Evaluation <strong>of</strong> <strong>the</strong> Swedish System <strong>of</strong> Active Labor Market<br />
Programs in <strong>the</strong> 1990s,” The Review <strong>of</strong> Economics and Statistics, 86, 133—155.<br />
[16] S, B. W. (1986): Density Estimation for Statistics and Data Analysis. Chap-<br />
man and Hall/CRC.<br />
[17] S, J. A., P. E. T (2005): “Rejoinder,” Journal <strong>of</strong> Econometrics, 125,<br />
365—375.<br />
[18] Z, J. X. (1996): “A Consistent Test <strong>of</strong> Functional Form via Nonparametric Esti-<br />
mation Techniques,” Journal <strong>of</strong> Econometrics, 75(2), 263—289.<br />
21
FIGURE 1<br />
E D P S<br />
Note: ––––- ˆαn p<br />
1−p ˆ f0,n(p), - - - - - - - - ˆ f1,n(p)<br />
DGP: D ∗ = 1 + X1 + X2 + X1X2 − ǫi, ǫ ∼ N(0, σ 2 ),<br />
Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2)<br />
Density estimated using <strong>the</strong> biweight kernel and<br />
Silverman’s rule <strong>of</strong> thumb, see Silverman[16]
FIGURE 2<br />
E D P S<br />
Note: ––––- ˆαn p<br />
1−p ˆ f0,n(p), - - - - - - - - ˆ f1,n(p)<br />
DGP: D ∗ = 1 + X1 + X2 + −ǫi, ǫ ∼ U(−1, 1),<br />
Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2)<br />
Density estimated using <strong>the</strong> biweight kernel and<br />
Silverman’s rule <strong>of</strong> thumb, see Silverman[16]
TABLE 1: S<br />
c=0.05<br />
Proportion <strong>of</strong> Rejections<br />
1% level 5% level 10% level<br />
n=100 0.011 0.046 0.089<br />
n=200 0.006 0.037 0.082<br />
n=400 0.003 0.031 0.081<br />
n=500 0.006 0.044 0.101<br />
n=800 0.006 0.040 0.088<br />
n=1000<br />
c=0.10<br />
0.005 0.038 0.086<br />
n=100 0.006 0.043 0.101<br />
n=200 0.002 0.024 0.070<br />
n=400 0.004 0.021 0.060<br />
n=500 0.004 0.027 0.074<br />
n=800 0.004 0.035 0.085<br />
n=1000<br />
c=0.15<br />
0.005 0.022 0.070<br />
n=100 0.005 0.027 0.079<br />
n=200 0.003 0.011 0.057<br />
n=400 0.003 0.008 0.049<br />
n=500 0.004 0.012 0.061<br />
n=800 0.008 0.024 0.070<br />
n=1000 0.005 0.019 0.053<br />
DGP: D ∗ = 1 + X1 + X2 − ǫi, ǫ ∼ N(0, σ 2 ),<br />
Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2)
TABLE 2: P<br />
c=0.05<br />
Proportion <strong>of</strong> Rejections<br />
1% level 5% level 10% level<br />
n=100 0.061 0.146 0.218<br />
n=200 0.273 0.432 0.518<br />
n=400 0.787 0.886 0.918<br />
n=500 0.916 0.964 0.975<br />
n=800 0.997 0.999 0.999<br />
n=1000<br />
c=0.10<br />
1.000 1.000 1.000<br />
n=100 0.094 0.189 0.237<br />
n=200 0.417 0.559 0.618<br />
n=400 0.910 0.962 0.978<br />
n=500 0.977 0.990 0.993<br />
n=800 1.000 1.000 1.000<br />
n=1000<br />
c=0.15<br />
1.000 1.000 1.000<br />
n=100 0.101 0.185 0.241<br />
n=200 0.459 0.597 0.664<br />
n=400 0.948 0.976 0.984<br />
n=500 0.984 0.993 0.996<br />
n=800 1.000 1.000 1.000<br />
n=1000 1.000 1.000 1.000<br />
DGP: D ∗ = 1 + X1 + X2 + X1X2 − ǫi, ǫ ∼ N(0, σ 2 ),<br />
Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2)
TABLE 3: P<br />
c=0.05<br />
Proportion <strong>of</strong> Rejections<br />
1% level 5% level 10% level<br />
n=100 0.358 0.479 0.537<br />
n=200 0.818 0.874 0.896<br />
n=400 0.991 0.993 0.994<br />
n=500 1.000 1.000 1.000<br />
n=800 1.000 1.000 1.000<br />
n=1000<br />
c=0.10<br />
1.000 1.000 1.000<br />
n=100 0.290 0.401 0.483<br />
n=200 0.821 0.876 0.902<br />
n=400 0.993 0.994 0.995<br />
n=500 1.000 1.000 1.000<br />
n=800 1.000 1.000 1.000<br />
n=1000<br />
c=0.15<br />
1.000 1.000 1.000<br />
n=100 0.193 0.283 0.343<br />
n=200 0.737 0.814 0.840<br />
n=400 0.990 0.994 0.994<br />
n=500 1.000 1.000 1.000<br />
n=800 1.000 1.000 1.000<br />
n=1000 1.000 1.000 1.000<br />
DGP: D ∗ = (1 + X1 + X2) 2 − ǫi, ǫ ∼ N(0, σ 2 ),<br />
Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2)
TABLE 4: P<br />
c=0.05<br />
Proportion <strong>of</strong> Rejections<br />
1% level 5% level 10% level<br />
n=100 0.023 0.082 0.150<br />
n=200 0.074 0.169 0.232<br />
n=400 0.321 0.490 0.587<br />
n=500 0.440 0.621 0.698<br />
n=800 0.838 0.917 0.942<br />
n=1000<br />
c=0.10<br />
0.929 0.977 0.986<br />
n=100 0.037 0.076 0.134<br />
n=200 0.131 0.242 0.303<br />
n=400 0.457 0.620 0.691<br />
n=500 0.642 0.776 0.848<br />
n=800 0.935 0.969 0.980<br />
n=1000<br />
c=0.15<br />
0.986 0.997 0.997<br />
n=100 0.039 0.076 0.124<br />
n=200 0.162 0.271 0.331<br />
n=400 0.557 0.689 0.751<br />
n=500 0.707 0.825 0.877<br />
n=800 0.957 0.978 0.986<br />
n=1000 0.993 0.998 0.998<br />
DGP: D ∗ = 1 + X1 + X2 − ǫi, ǫ ∼ χ 2 1 ,<br />
Estimated model: Q(X, β) = Φ(β 0 +β 1 X1+β 2 X2)
TABLE 5: P<br />
c=0.05<br />
Proportion <strong>of</strong> Rejections<br />
1% level 5% level 10% level<br />
n=100 0.008 0.057 0.127<br />
n=200 0.007 0.032 0.078<br />
n=400 0.005 0.048 0.091<br />
n=500 0.009 0.045 0.087<br />
n=800 0.021 0.073 0.126<br />
n=1000<br />
c=0.10<br />
0.033 0.094 0.147<br />
n=100 0.008 0.048 0.117<br />
n=200 0.005 0.021 0.063<br />
n=400 0.009 0.039 0.074<br />
n=500 0.013 0.035 0.071<br />
n=800 0.035 0.075 0.124<br />
n=1000<br />
c=0.15<br />
0.048 0.114 0.169<br />
n=100 0.005 0.038 0.107<br />
n=200 0.007 0.021 0.057<br />
n=400 0.009 0.034 0.066<br />
n=500 0.014 0.035 0.074<br />
n=800 0.036 0.081 0.135<br />
n=1000 0.056 0.110 0.176<br />
DGP: D ∗ = 1 + X1 + X2 − ǫi, ǫ ∼ U(−1, 1)<br />
Estimated model: Q(X, β) = Φ(β 0 +β 1 X1+β 2 X2)
TABLE 6: P<br />
c=0.05<br />
Proportion <strong>of</strong> Rejections<br />
1% level 5% level 10% level<br />
n=100 0.218 0.337 0.426<br />
n=200 0.637 0.781 0.826<br />
n=400 0.984 0.993 0.997<br />
n=500 1.000 1.000 1.000<br />
n=800 1.000 1.000 1.000<br />
n=1000<br />
c=0.10<br />
1.000 1.000 1.000<br />
n=100 0.371 0.505 0.592<br />
n=200 0.813 0.904 0.936<br />
n=400 0.998 1.000 1.000<br />
n=500 1.000 1.000 1.000<br />
n=800 1.000 1.000 1.000<br />
n=1000<br />
c=0.15<br />
1.000 1.000 1.000<br />
n=100 0.449 0.584 0.662<br />
n=200 0.891 0.952 0.970<br />
n=400 1.000 1.000 1.000<br />
n=500 1.000 1.000 1.000<br />
n=800 1.000 1.000 1.000<br />
n=1000 1.000 1.000 1.000<br />
DGP: D ∗ = 1 + X1 + X2 − ǫi, ǫ ∼ N(0, σ 2 ),<br />
Estimated model: Q(X, β) = β 0 +β 1X1+β 2X2
TABLE 7: P, Z[18]<br />
c=0.05<br />
Proportion <strong>of</strong> Rejections<br />
1% level 5% level 10% level<br />
n=100 0.000 0.011 0.053<br />
n=200 0.001 0.016 0.070<br />
n=400 0.005 0.033 0.079<br />
n=500 0.005 0.038 0.106<br />
n=800 0.012 0.059 0.125<br />
n=1000<br />
c=0.10<br />
0.009 0.067 0.131<br />
n=100 0.002 0.030 0.085<br />
n=200 0.005 0.052 0.110<br />
n=400 0.011 0.072 0.126<br />
n=500 0.014 0.082 0.157<br />
n=800 0.041 0.132 0.210<br />
n=1000<br />
c=0.15<br />
0.063 0.175 0.294<br />
n=100 0.002 0.046 0.106<br />
n=200 0.008 0.061 0.115<br />
n=400 0.035 0.109 0.176<br />
n=500 0.043 0.141 0.227<br />
n=800 0.107 0.271 0.370<br />
n=1000 0.166 0.373 0.494<br />
DGP: D ∗ = 1 + X1 + X2 + X1X2 − ǫi, ǫ ∼ N(0, σ 2 )<br />
Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2)
TABLE 8: P<br />
c=0.05<br />
Proportion <strong>of</strong> Rejections<br />
1% level 5% level 10% level<br />
n=100 0.034 0.079 0.127<br />
n=200 0.049 0.106 0.163<br />
n=400 0.162 0.288 0.364<br />
n=500 0.229 0.377 0.448<br />
n=800 0.495 0.620 0.696<br />
n=1000<br />
c=0.10<br />
0.594 0.751 0.807<br />
n=100 0.038 0.087 0.134<br />
n=200 0.081 0.143 0.190<br />
n=400 0.274 0.387 0.465<br />
n=500 0.356 0.480 0.560<br />
n=800 0.637 0.744 0.797<br />
n=1000<br />
c=0.15<br />
0.768 0.857 0.895<br />
n=100 0.042 0.078 0.128<br />
n=200 0.102 0.170 0.217<br />
n=400 0.320 0.448 0.508<br />
n=500 0.415 0.550 0.619<br />
n=800 0.700 0.789 0.837<br />
n=1000 0.829 0.895 0.924<br />
DGP: D ∗ = 1 + X1 + X2 + X1X2 + X3 − ǫi, ǫ ∼ N(0, σ 2 )<br />
Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2 + β3X3)
TABLE 9: P, Z[18]<br />
c=0.05<br />
Proportion <strong>of</strong> Rejections<br />
1% level 5% level 10% level<br />
n=100 0.000 0.003 0.009<br />
n=200 0.000 0.002 0.009<br />
n=400 0.000 0.002 0.015<br />
n=500 0.000 0.003 0.021<br />
n=800 0.000 0.013 0.041<br />
n=1000<br />
c=0.10<br />
0.000 0.012 0.052<br />
n=100 0.000 0.000 0.014<br />
n=200 0.000 0.009 0.044<br />
n=400 0.003 0.018 0.064<br />
n=500 0.001 0.019 0.079<br />
n=800 0.001 0.034 0.092<br />
n=1000<br />
c=0.15<br />
0.001 0.041 0.114<br />
n=100 0.001 0.011 0.041<br />
n=200 0.002 0.021 0.071<br />
n=400 0.002 0.036 0.096<br />
n=500 0.003 0.037 0.089<br />
n=800 0.003 0.055 0.110<br />
n=1000 0.012 0.064 0.128<br />
DGP: D ∗ = 1 + X1 + X2 + X1X2 + X3 − ǫi, ǫ ∼ N(0, σ 2 )<br />
Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2 + β3X3)