16.08.2013 Views

On the Identification of Misspecified Propensity Scores - School of ...

On the Identification of Misspecified Propensity Scores - School of ...

On the Identification of Misspecified Propensity Scores - School of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>On</strong> <strong>the</strong> <strong>Identification</strong> <strong>of</strong> <strong>Misspecified</strong> <strong>Propensity</strong> <strong>Scores</strong><br />

Azeem M. Shaikh<br />

Department <strong>of</strong> Economics<br />

Stanford University<br />

Edward J. Vytlacil<br />

Department <strong>of</strong> Economics<br />

Columbia University<br />

February 8, 2006<br />

Abstract<br />

Marianne Simonsen<br />

Department <strong>of</strong> Economics<br />

University <strong>of</strong> Aarhus<br />

Nese Yildiz<br />

Department <strong>of</strong> Economics<br />

University <strong>of</strong> Pittsburgh<br />

<strong>Propensity</strong> score matching has become a popular method for <strong>the</strong> evaluation <strong>of</strong><br />

treatment effects. In empirical applications, researchers almost always impose a<br />

parametric model for <strong>the</strong> propensity score. This raises <strong>the</strong> possibility that <strong>the</strong><br />

model is misspecified and <strong>the</strong>refore <strong>the</strong> propensity score matching estimator may<br />

be inconsistent. We show that <strong>the</strong> common practice <strong>of</strong> calculating estimates <strong>of</strong> <strong>the</strong><br />

densities <strong>of</strong> <strong>the</strong> propensity score conditional on <strong>the</strong> participation decision provides<br />

a means for examining whe<strong>the</strong>r <strong>the</strong> propensity score is misspecified. In particular,<br />

we derive a restriction between <strong>the</strong> density <strong>of</strong> <strong>the</strong> propensity score among participants<br />

versus <strong>the</strong> density among nonparticipants. We show that this restriction<br />

between <strong>the</strong> two conditional densities is equivalent to a particular orthogonality<br />

restriction and derive a formal test based upon it. The resulting test is shown to<br />

have dramatically greater power than competing tests for many alternatives. The<br />

principle disadvantage <strong>of</strong> this approach is loss <strong>of</strong> power against some alternatives.<br />

Acknowledgements:<br />

The authors wish to thank Jeff Smith for valuable comments and suggestions.


1 Introduction<br />

<strong>Propensity</strong> score matching is a widely used approach for <strong>the</strong> estimation <strong>of</strong> treatment effects.<br />

See Heckman, LaLonde, and Smith [7] for a detailed survey <strong>of</strong> its use in <strong>the</strong> treatment effect<br />

literature. The method is based on <strong>the</strong> following well-known result <strong>of</strong> Rosenbaum and Rubin<br />

[13]: if selection into <strong>the</strong> program is independent <strong>of</strong> <strong>the</strong> potential outcomes <strong>of</strong> interest con-<br />

ditional on a vector <strong>of</strong> covariates, <strong>the</strong>n selection into <strong>the</strong> program is also independent <strong>of</strong> <strong>the</strong><br />

potential outcomes conditional on <strong>the</strong> propensity score, where <strong>the</strong> propensity score denotes <strong>the</strong><br />

probability <strong>of</strong> selection into <strong>the</strong> program conditional on <strong>the</strong> same vector <strong>of</strong> covariates. The high<br />

dimensionality <strong>of</strong> <strong>the</strong> vector <strong>of</strong> observed covariates <strong>of</strong>ten forces researchers to adopt a finite-<br />

dimensional approximation to <strong>the</strong> propensity score, in lieu <strong>of</strong> a more flexible nonparametric<br />

model. Imposing such restrictions on <strong>the</strong> functional form for <strong>the</strong> propensity score raises <strong>the</strong><br />

possibility that <strong>the</strong> propensity score model is misspecified. If <strong>the</strong> propensity score model is mis-<br />

specified, <strong>the</strong>n in general <strong>the</strong> estimator <strong>of</strong> <strong>the</strong> propensity score will be inconsistent for <strong>the</strong> true<br />

propensity score and <strong>the</strong> resulting propensity score matching estimator may be inconsistent<br />

for <strong>the</strong> treatment effect <strong>of</strong> interest.<br />

First suggested by Heckman, Ichimura, Smith, and Todd [5], it is now common to calculate<br />

estimates <strong>of</strong> <strong>the</strong> densities <strong>of</strong> <strong>the</strong> propensity score conditional on <strong>the</strong> participation decision.<br />

These estimates are calculated in order to determine <strong>the</strong> region <strong>of</strong> common support on which<br />

to perform matching. We show that <strong>the</strong>se conditional densities provide a means for examining<br />

whe<strong>the</strong>r <strong>the</strong> propensity score is misspecified. In particular, we derive a restriction between<br />

<strong>the</strong> density <strong>of</strong> <strong>the</strong> propensity score model among participants and <strong>the</strong> density among nonpar-<br />

ticipants. Failure <strong>of</strong> this restriction to hold for <strong>the</strong> estimated conditional densities provides<br />

evidence that <strong>the</strong> propensity score model is misspecified. We show that this restriction between<br />

<strong>the</strong> two conditional densities is equivalent to a particular orthogonality restriction and derive<br />

a formal test based upon it. Unlike o<strong>the</strong>r tests for correct specification <strong>of</strong> <strong>the</strong> propensity score,<br />

<strong>the</strong> resulting test does not require nonparametric estimation <strong>of</strong> high dimensional objects. As<br />

a result, our test does not suffer from <strong>the</strong> curse <strong>of</strong> dimensionality and has dramatically greater<br />

power than competing tests for many alternatives. We provide striking evidence <strong>of</strong> this fact<br />

in a limited Monte Carlo study. The principle drawback <strong>of</strong> this approach is that our test does<br />

not have power against certain alternatives, but we argue that <strong>the</strong>se alternatives are ra<strong>the</strong>r<br />

exceptional.<br />

Following [12], applied researchers <strong>of</strong>ten employ heuristic balancing score tests for misspeci-<br />

fication <strong>of</strong> <strong>the</strong> propensity score to guide <strong>the</strong> choice <strong>of</strong> specification <strong>of</strong> <strong>the</strong> parametric propensity<br />

score. Specifically, [12] suggests examining whe<strong>the</strong>r <strong>the</strong> observable characteristics <strong>of</strong> <strong>the</strong> pop-<br />

ulation are independent <strong>of</strong> participation conditional on <strong>the</strong> propensity score. See e.g. [3], [8],<br />

[15], and [17] for applications <strong>of</strong> such balancing score tests. The formal misspecification test<br />

2


presented in this paper may be considered a complement to this practice.<br />

Our paper proceeds as follows. In Section 2, we first provide a summary <strong>of</strong> <strong>the</strong> method <strong>of</strong><br />

propensity score matching. We <strong>the</strong>n present in Section 3 <strong>the</strong> restriction upon which our test<br />

is based. In Section 4, we use this result to derive an asymptotic test <strong>of</strong> misspecification and<br />

examine its finite sample properties in a Monte Carlo study. Finally, Section 5 concludes.<br />

2 A Review <strong>of</strong> Matching<br />

Before proceeding, it is useful to review <strong>the</strong> key <strong>the</strong>oretical underpinnings <strong>of</strong> matching as a<br />

means <strong>of</strong> program evaluation. There are two groups <strong>of</strong> individuals, participants and nonpar-<br />

ticipants, in a program <strong>of</strong> interest. Participation in <strong>the</strong> program <strong>of</strong> interest is denoted by <strong>the</strong><br />

dummy variable D, with D = 1 if <strong>the</strong> individual chooses to participate and D = 0 o<strong>the</strong>rwise.<br />

Individuals in each <strong>of</strong> <strong>the</strong>se two groups are associated with observed characteristics X. Two<br />

commonly used metrics for evaluating <strong>the</strong> effect <strong>of</strong> participation in a program are <strong>the</strong> average<br />

treatment effect, given by E[Y1 −Y0], and <strong>the</strong> average treatment effect on <strong>the</strong> treated, given by<br />

E[Y1 − Y0|D = 1], where Y1 is <strong>the</strong> potential outcome in <strong>the</strong> case <strong>of</strong> participation and Y0 is <strong>the</strong><br />

potential outcome in <strong>the</strong> case <strong>of</strong> nonparticipation. The difficulty with estimation <strong>of</strong> this object<br />

lies with <strong>the</strong> following missing data problem: <strong>the</strong> econometrician never observes <strong>the</strong> counter-<br />

factual outcome Y1−D, which precludes direct estimation <strong>of</strong> E[Y0|D = 1] and E[Y1|D = 0].<br />

The method <strong>of</strong> matching resolves this difficulty by matching each participant with a non-<br />

participant that is similar in terms <strong>of</strong> observed characteristics X. As described in Rosenbaum<br />

and Rubin [13], matching formally requires that:<br />

A1. (Y0, Y1) ⊥ D | X.<br />

A2. 0 < P (x) < 1 where P (x) = Pr[D = 1|X = x] for all x ∈ supp(X).<br />

Note that a consequence <strong>of</strong> A2 is that E[Y0|D = 1, X = x] and E[Y1|D = 0, X = x] are<br />

well defined for all x ∈ supp(X). Hence, matching suggests estimation <strong>of</strong> E[Y0|D = 1] and<br />

E[Y1|D = 0] by a two-step procedure in which E[Y0|D = 1, X] and E[Y1|D = 0, X] are first<br />

estimated by exploiting A1, and <strong>the</strong>n integrated with respect to <strong>the</strong> empirical distribution <strong>of</strong><br />

X in order to obtain an estimate <strong>of</strong> E[Y1 − Y0] or integrated with respect to <strong>the</strong> empirical<br />

distribution <strong>of</strong> X conditional on D = 1 to obtain an estimate <strong>of</strong> E[Y1 − Y0|D = 1]. Following<br />

Heckman, Ichimura, Smith, and Todd [5], it is clear that one can relax <strong>the</strong> first condition to<br />

only require mean independence instead <strong>of</strong> full independence. Using this procedure, it is in<br />

3


principle possible to construct a √ n-normal estimator <strong>of</strong> <strong>the</strong> parameter <strong>of</strong> interest without<br />

imposing any finite-dimensional restrictions. 1<br />

An alternative approach for estimating E[Y0|D = 1] and E[Y1|D = 0] is propensity score<br />

matching, which relies upon a celebrated result <strong>of</strong> Rosenbaum and Rubin [13]. They show that<br />

<strong>the</strong> assumptions underlying matching conditional on X, A1 and A2, imply:<br />

A1*. (Y0, Y1) ⊥ D | P (X).<br />

A2*. 0 < Pr[D = 1|P (X) = p] < 1 for all p ∈ supp(P (X)).<br />

This result can be restated as follows: if matching on X is valid, so is matching based on <strong>the</strong><br />

propensity score P (X). This result motivates propensity score matching, in which one first<br />

estimates <strong>the</strong> propensity score in a first step and <strong>the</strong>n perform matching, as described above,<br />

using <strong>the</strong> estimated propensity score.<br />

In practice, both matching on X and matching on P (X) suffer from <strong>the</strong> so-called “curse<br />

<strong>of</strong> dimensionality”. While matching on X requires <strong>the</strong> researcher to estimate E[Y0|D = 1, X],<br />

matching on P (X) requires <strong>the</strong> researcher to estimate E[D|X], an equally high dimensional<br />

object. Thus, if X has more than a few dimensions, nonparametric procedures for estimating<br />

<strong>the</strong>se objects are undesirable due to sizable finite sample bias. This difficulty leads to <strong>the</strong><br />

common practice <strong>of</strong> implementing propensity score matching with a parametric specification<br />

for <strong>the</strong> propensity score. More specifically, in practice propensity score matching involves<br />

estimating a parametric model for <strong>the</strong> propensity score by maximum likelihood in a first step,<br />

and <strong>the</strong>n using kernel regression or some o<strong>the</strong>r comparable nonparametric procedure to perform<br />

matching on <strong>the</strong> basis <strong>of</strong> <strong>the</strong> estimated propensity scores. 2<br />

If <strong>the</strong> propensity score is estimated using a parametric model, <strong>the</strong>n it is possible that <strong>the</strong><br />

model is misspecified and that <strong>the</strong> resulting estimator <strong>of</strong> P (X) is inconsistent. Let Q(X)<br />

denote <strong>the</strong> probability limit <strong>of</strong> <strong>the</strong> estimator <strong>of</strong> P (X). If P (X) = Q(X), as will likely be<br />

<strong>the</strong> case if <strong>the</strong> model for P (X) is misspecified, <strong>the</strong>n in general we cannot expect (Y0, Y1) to<br />

be independent (or even mean independent) <strong>of</strong> D conditional on Q(X). In this case, <strong>the</strong><br />

propensity score matching will in general provide inconsistent estimates <strong>of</strong> both E[Y1 −Y0] and<br />

E[Y1 − Y0|D = 1].<br />

1 See Heckman, Ichimura, and Todd [6], Hahn [4], and Abadie and Imbens [1].<br />

2 Since binary variables are bounded, <strong>the</strong>re is potentially less room for misspecification <strong>of</strong> a<br />

parametric model for binary outcome as compared to a continous one, see [2].<br />

4


3 The Restriction<br />

In order to ensure that <strong>the</strong> appropriate conditional expectations exist, propensity score match-<br />

ing is only valid over <strong>the</strong> region <strong>of</strong> common support <strong>of</strong> <strong>the</strong> densities <strong>of</strong> <strong>the</strong> propensity score con-<br />

ditional on <strong>the</strong> participation decision. 3 Following <strong>the</strong> influential work <strong>of</strong> Heckman, Ichimura,<br />

Smith, and Todd [5], researchers <strong>the</strong>refore compute estimates <strong>of</strong> both <strong>of</strong> <strong>the</strong>se densities to de-<br />

termine <strong>the</strong> appropriate region over which to perform matching. The test for misspecification<br />

that we develop in <strong>the</strong> following section will exploit a restriction that must hold between <strong>the</strong>se<br />

two densities.<br />

Throughout <strong>the</strong> following we will assume that <strong>the</strong> propensity score P (X) has a density with<br />

respect to Lebesgue measure. Let f(p) denote this density, f1(p) <strong>the</strong> density <strong>of</strong> <strong>the</strong> propensity<br />

score conditional on participation, and f0(p) <strong>the</strong> density <strong>of</strong> <strong>the</strong> propensity score conditional on<br />

nonparticipation. Using this notation, we have <strong>the</strong> following result:<br />

Lemma 3.1. Let α = Pr[D=0]<br />

Pr[D=1]<br />

p ∈ supp(P ) we have that<br />

and assume 0 < Pr[D = 0] < 1. Then for all 0 < p < 1 and<br />

f1(p) = α p<br />

1 − p f0(p) .<br />

P: Consider 0 < p < 1 and p ∈ supp(P ). For such p, by <strong>the</strong> Law <strong>of</strong> Iterated Expectations,<br />

we have that<br />

Bayes’ Theorem implies fur<strong>the</strong>r that<br />

Similarly, we have that<br />

Pr[D = 1|P (X) = p] = E[E[D|X]|P (X) = p] = p .<br />

f1(p) Pr[D = 1] = Pr[D = 1|P (X) = p]f(p) = pf(p) > 0 .<br />

f0(p) Pr[D = 0] = (1 − p)f(p) > 0 .<br />

Combining <strong>the</strong>se two implications, we have that<br />

f1(p) p<br />

= α<br />

f0(p) 1 − p ,<br />

from which <strong>the</strong> asserted conclusion follows immediately. <br />

Lemma 3.1 suggests that to <strong>the</strong> extent that <strong>the</strong> parametric model for P (X) is correctly<br />

specified, we would expect <strong>the</strong> stated relationship between <strong>the</strong> estimates <strong>of</strong> <strong>the</strong> conditional<br />

densities <strong>of</strong> <strong>the</strong> propensity score to hold approximately. The property can be easily checked<br />

3 <strong>Propensity</strong> score matching is only valid for <strong>the</strong> common support for <strong>the</strong> population but<br />

in practice, all we can make conclusions about is <strong>the</strong> common support in <strong>the</strong> sample. As <strong>the</strong><br />

sample size gets larger, <strong>of</strong> course, <strong>the</strong> two sets <strong>of</strong> common support converge.<br />

5


given estimates ˆ f1,n(·) and ˆ f0,n(·) by graphing ˆ f1,n(p) along with <strong>the</strong> function ˆαn p<br />

1−p ˆ f0,n(p),<br />

where ˆαn is <strong>the</strong> sample analogue <strong>of</strong> α. Dramatic departures <strong>of</strong> one graph from ano<strong>the</strong>r should<br />

be taken as evidence <strong>of</strong> misspecification <strong>of</strong> <strong>the</strong> propensity score.<br />

We now provide an illustration <strong>of</strong> this heuristic approach to detecting misspecifcation <strong>of</strong><br />

<strong>the</strong> propensity score. Define<br />

X1 = Z1<br />

X2 = Z1 + Z2<br />

√<br />

2<br />

(1a)<br />

, (1b)<br />

where Z1 and Z2 are independent standard normal variables. Consider <strong>the</strong> model<br />

D ∗ = 1 + X1 + X2 + X1X2 − ǫi, ǫ ∼ N(0, σ 2 ) (2a)<br />

D = 1 [D ∗ > 0] , (2b)<br />

where ǫ ⊥ (X1, X2), and 1[·] is <strong>the</strong> logical indicator function. We consider <strong>the</strong> results <strong>of</strong> fitting<br />

a misspecified model<br />

Q(X, β) = Φ(β0 + β1X1 + β2X2) (3)<br />

that only differs from <strong>the</strong> true model by omitting <strong>the</strong> quadratic term.<br />

We generate a sample <strong>of</strong> 10, 000 observations (Di, X1i, X2i, ǫi) according to <strong>the</strong> model given<br />

by (2a) and (2b). We <strong>the</strong>n estimate using maximum likelihood <strong>the</strong> propensity score under <strong>the</strong><br />

incorrectly specified model (3). Estimates <strong>of</strong> <strong>the</strong> densities <strong>of</strong> <strong>the</strong> propensity score conditional<br />

on <strong>the</strong> participation decision are <strong>the</strong>n constructed using kernel density estimation. 4<br />

Figure 1 depicts <strong>the</strong> resulting estimates <strong>of</strong> ˆ f1,n(p) and ˆαn p<br />

1−p ˆ f0,n(p). We see that graphs<br />

<strong>of</strong> <strong>the</strong>se two objects deviate substantially from each o<strong>the</strong>r, which can be taken as evidence <strong>of</strong><br />

misspecification.<br />

[FIGURE 1 ABOUT HERE]<br />

[FIGURE 2 ABOUT HERE]<br />

We also consider <strong>the</strong> case where <strong>the</strong> true data generating process is given by<br />

D ∗ = 1 + X1 + X2 − ǫ, ǫ ∼ U(−1, 1) (4)<br />

and we again fit <strong>the</strong> misspecified model (3). The estimated densities from this exercise are<br />

displayed in Figure 2. We find that <strong>the</strong> two graphs lie very close to one ano<strong>the</strong>r, suggesting<br />

4 We employ <strong>the</strong> biweight kernel and choose <strong>the</strong> bandwidth using Silverman’s rule <strong>of</strong> thumb,<br />

see [16].<br />

6


that misspecification is not very severe. This feature is perhaps not too surprising when one<br />

considers <strong>the</strong> fact that <strong>the</strong> only source <strong>of</strong> misspecification is <strong>the</strong> distribution for ǫ and <strong>the</strong><br />

assumed distribution shares many features with <strong>the</strong> true distribution. In particular, both are<br />

symmetric about zero.<br />

These examples shows that misspecification <strong>of</strong> <strong>the</strong> propensity score may lead to noticeable<br />

departures from <strong>the</strong> restriction <strong>of</strong> Lemma 3.1. In order to develop a formal test based on this<br />

restriction, it will be useful to restate it in terms <strong>of</strong> an equivalent orthogonality condition. To<br />

this end, we have <strong>the</strong> following result.<br />

Lemma 3.2. Let α = Pr[D=0]<br />

Pr[D=1] and assume that 0 < Pr[D = 1] < 1. Let Q be a random<br />

variable on <strong>the</strong> unit interval with density with respect to Lebesgue measure. Denote by g1(·) <strong>the</strong><br />

density <strong>of</strong> Q conditional on D = 1 and by g0(·) <strong>the</strong> density <strong>of</strong> Q conditional on D = 0. Then,<br />

g1(q) = α q<br />

1−q g0(q) for all q ∈ supp(Q) such that 0 < q < 1 if and only if E[D − Q|Q = q] = 0<br />

for all q ∈ supp(Q) such that 0 < q < 1.<br />

P: First consider necessity. Consider q ∈ supp(Q) such that 0 < q < 1. E[D − Q|Q =<br />

q] = 0 is equivalent to Pr[D = 1|Q = q] = E[D|Q = q] = q. For such q, Pr[D = 1|Q = q] = q<br />

and Bayes’ Theorem imply toge<strong>the</strong>r that<br />

g1(q) Pr[D = 1] = Pr[D = 1|Q = q]g(q) = qg(q) > 0 ,<br />

where g(q) is <strong>the</strong> density <strong>of</strong> <strong>the</strong> Q. Similarly, we have that<br />

g0(q) Pr[D = 0] = (1 − q)g(q) > 0 .<br />

Combining <strong>the</strong>se two implications, we have that<br />

g1(q) q<br />

= α<br />

g0(q) 1 − q ,<br />

from which <strong>the</strong> asserted conclusion follows. Now consider sufficiency. Assume that g1(q) =<br />

α q<br />

1−q g0(q) for all q ∈ supp(Q) such that 0 < q < 1. By Bayes’ Theorem,<br />

E[D|Q = q] =<br />

g1(q) Pr[D = 1]<br />

= q<br />

Pr[D = 0]g0(q) + Pr[D = 1]g1(q)<br />

where <strong>the</strong> last equality follows from plugging in g1(q) = α q<br />

1−q g0(q). It follows that E[D−Q|Q =<br />

q] = q, as desired. <br />

It is important to observe that this characterization <strong>of</strong> <strong>the</strong> restriction only involves low<br />

dimensional conditional expections. We will show in <strong>the</strong> following section that a consequence<br />

<strong>of</strong> this fact is that, unlike more conventional nonparametric tests for correct specification <strong>of</strong><br />

7


<strong>the</strong> propensity score, our test will not suffer from a curse <strong>of</strong> dimensionality and will <strong>the</strong>refore<br />

have much greater power in finite samples against many alternatives. The principle drawback<br />

<strong>of</strong> this approach is that <strong>the</strong>re will be certain alternatives for which we will not have power.<br />

To examine this issue fur<strong>the</strong>r, consider <strong>the</strong> following example. Suppose X = (X1, X2), and<br />

E[D|X1, X2] = E[D|X1]. Let Q(X) = E[D|X1]. Thus, Q(X) = P (X), and yet Q(X) will<br />

satisfy E[D − Q|Q] = 0. More generally, this restriction will not be able to detect <strong>the</strong> omission<br />

<strong>of</strong> covariates provided that <strong>the</strong> conditional expectation <strong>of</strong> D given <strong>the</strong> included covariates is<br />

correctly specified. Of course, this example is ra<strong>the</strong>r exceptional, since Q(X) is in this case<br />

correctly specified for a subvector <strong>of</strong> X.<br />

4 A Formal Test<br />

We now propose and develop a formal test for misspecification based on Lemma 3.2, which<br />

allows us to draw upon <strong>the</strong> large literature on testing orthogonality constraints. In particular,<br />

we adapt <strong>the</strong> testing methodology <strong>of</strong> Zheng [18] for our problem. Zheng considers testing if<br />

<strong>the</strong> conditional expectation E[Y |X] is correctly specified. Under his null hypo<strong>the</strong>sis, E[Y |X]<br />

is assumed to belong to a parametric family <strong>of</strong> real valued functions Q(X, θ) on R m ×Θ, where<br />

Θ ⊆ R l . Concretely, his null and alternative hypo<strong>the</strong>ses are<br />

H0: ∃θ0 ∈ Θ s.t. Pr[E[Y |X] = Q(X, θ0)] = 1.<br />

H1: Pr[E[Y |X] = Q(X, θ)] < 1 ∀θ ∈ Θ.<br />

Note that any θ0 satisfying <strong>the</strong> null hypo<strong>the</strong>sis also solves minθ∈Θ E[Y − Q(X, θ)] 2 .<br />

Zheng’s test is based on <strong>the</strong> following idea. Let ǫ = Y − Q(X, θ0) and let fX(·) denote <strong>the</strong><br />

density <strong>of</strong> X. Then, under <strong>the</strong> null hypo<strong>the</strong>sis,<br />

while under <strong>the</strong> alternative hypo<strong>the</strong>sis<br />

E[ǫE[ǫ|X]fX(X)] = 0 ,<br />

E[ǫE[ǫ|X]fX(X)] = E[[E[ǫ|X]] 2 fX(X)] > 0 .<br />

The last inequality follows because under <strong>the</strong> alternative hypo<strong>the</strong>sis (E[ǫ|X]) 2 > 0 with positive<br />

probability. Based on this observation, he uses <strong>the</strong> sample analogue <strong>of</strong> E[ǫE[ǫ|X]fX(X)] to<br />

form his test. The test statistic is given by<br />

ˆVn :=<br />

1<br />

n(n − 1)<br />

n <br />

1<br />

h<br />

i=1 j=i<br />

m n<br />

8<br />

<br />

Xi − Xj<br />

K<br />

ˆεiˆεj ,<br />

hn


where ˆεi := Yi − Q(Xi, ˆ θn), ˆ θn is a √ n-consistent estimator <strong>of</strong> arg minθ∈Θ E[Y − Q(X, θ)] 2 , h<br />

is a smoothing parameter, and K(·) is a kernel.<br />

Zheng shows that under certain assumptions<br />

where Σ can be consistently estimated by<br />

ˆΣn :=<br />

2<br />

n(n − 1)<br />

nh m/2 ˆVn n<br />

d → N(0, Σ)<br />

n <br />

1<br />

h<br />

i=1 j=i<br />

m n<br />

K 2<br />

<br />

Xi − Xj<br />

hn<br />

<br />

ˆε 2 i ˆε 2 j .<br />

We would like to test whe<strong>the</strong>r E[D|Q] = Q, where Q(X) = Q(X, θ0). Our null and<br />

alternative hypo<strong>the</strong>ses are <strong>the</strong>refore given by<br />

H0: ∃θ0 ∈ Θ s.t. Pr[E[D|Q(X, θ0)] = Q(X, θ0)] = 1<br />

H1: Pr[E[D|Q(X, θ)] = Q(X, θ)] < 1 ∀θ ∈ Θ<br />

Note that this differs from <strong>the</strong> framework <strong>of</strong> Zheng [18] in that <strong>the</strong> variable that is conditioned<br />

on is not observed but has to be estimated. Never<strong>the</strong>less, by analogy with <strong>the</strong> test statistic<br />

above, we consider testing based upon<br />

n<br />

<br />

1 1 Q(Xi,<br />

ˆVn :=<br />

K<br />

n(n − 1)<br />

ˆ θn) − Q(Xj, ˆ <br />

θn)<br />

ˆεiˆεj<br />

hn<br />

i=1 j=i<br />

where ˆεi = Di − Q(Xi, ˆ θn). Unfortunately, <strong>the</strong> analysis <strong>of</strong> Zheng [18] does not apply directly<br />

because in our case <strong>the</strong> estimated Q appears in <strong>the</strong> kernel function as well. Thus, we require<br />

<strong>the</strong> following extension <strong>of</strong> Zheng’s analysis:<br />

Theorem 4.1. Make <strong>the</strong> following assumptions:<br />

(i) {Di, Xi} n i=1 is a random sample;<br />

(ii) Q is Lipschitz continuous in (x, θ). Q(·, θ0) is continuously differentiable. Moreover, it<br />

has density with respect to Lebesgue measure on [0,1] that is continuously differentiable;<br />

(iii) K(·) is a nonnegative, bounded, symmetric and Lipschitz continuous function such that<br />

K(u)du = 1;<br />

(iv) The bandwidth sequence and <strong>the</strong> sequence <strong>of</strong> positive numbers {an} are such that:<br />

9<br />

hn


(a) n √ hn → ∞, (nh 3/2<br />

n ) −1 → c < ∞;<br />

(b)<br />

1<br />

anh 2 n<br />

→ a < ∞, na−2 n<br />

log n → ∞, n√hn a2 n<br />

→ b < ∞.<br />

Let ˆ θn be a √ n-consistent estimator <strong>of</strong> arg minθ∈Θ E[D − Q(X, θ)] 2 , where Θ is a compact<br />

and convex subset <strong>of</strong> R l . Then,<br />

hn<br />

i=1 j=n<br />

n hn ˆVn d → N(0, Σ)<br />

where Σ can be consistently estimated by<br />

n 2 <br />

n(n − 1)<br />

1<br />

K 2<br />

<br />

Q(Xi, ˆ θn) − Q(Xj, ˆ <br />

θn)<br />

P: See Appendix A. <br />

Assumptions (i)-(iii) are almost <strong>the</strong> same as Assumption 1 and 5 <strong>of</strong> [18] adapted to<br />

our problem. The only difference is that here we require that Q and K are Lipschitz con-<br />

tinuous functions. In addition, <strong>the</strong> requirement that ˆ θn be a √ n-consistent estimator <strong>of</strong><br />

arg minθ∈Θ E[D − Q(X, θ)] 2 is guaranteed by Assumptions 2-4 in [18]. The extra assump-<br />

tions on <strong>the</strong> bandwidth sequence are <strong>the</strong>re because in our analysis Q(x, ˆ θ) appears in <strong>the</strong><br />

argument <strong>of</strong> <strong>the</strong> kernel function as well.<br />

As noted earlier, <strong>the</strong>re are alternatives against which Zheng’s test has power tending to<br />

1, but our test does not. However, Zheng’s procedure requires estimation <strong>of</strong> expectations<br />

conditional on X, which in practice is high dimensional, and <strong>the</strong>refore suffers from a curse <strong>of</strong><br />

dimensionality. Our procedure only requires estimation <strong>of</strong> expectations conditional on Q and<br />

thus avoids this difficulty. As a result, we would expect our test to perform noticeably better<br />

in finite samples against many alternatives <strong>of</strong> interest.<br />

We now examine <strong>the</strong> finite sample properties <strong>of</strong> this testing procedure in a limited Monte<br />

Carlo study. Our setup will follow Zheng [18] closely. As before, let X1 and X2 be given by<br />

(1a) and (1b). We define now several different data generating processes for D ∗ that will be<br />

used at different points in our Monte Carlo study.<br />

hn<br />

ˆε 2 i ˆε 2 j<br />

D ∗ = 1 + X1 + X2 − ǫ, ǫ ∼ N(0, σ 2 ) (5)<br />

D ∗ = 1 + X1 + X2 + X1X2 − ǫ, ǫ ∼ N(0, σ 2 ) (6)<br />

D ∗ = (1 + X1 + X2) 2 − ǫ, ǫ ∼ N(0, σ 2 ) (7)<br />

D ∗ = 1 + X1 + X2 − ǫ, ǫ ∼ χ 2 1<br />

D ∗ = 1 + X1 + X2 − ǫ, ǫ ∼ U(−1, 1) (9)<br />

For each <strong>of</strong> <strong>the</strong>se data generating processes, D = 1[D ∗ > 0] and ǫ ⊥ (X1, X2).<br />

10<br />

(8)


Throughout <strong>the</strong> simulations presented below, we use <strong>the</strong> normal kernel given by<br />

K(u) = 1<br />

<br />

−u2 √ exp .<br />

2π 2<br />

1<br />

− The bandwidth hn is chosen to be equal to cn 5 . In our simulations, we report results for values<br />

<strong>of</strong> c equal to 0.05, 0.10, and 0.15. We consider samples sizes n equal to 100, 200, 400, 500, 800,<br />

and 1000. The number <strong>of</strong> replications for each simulation is always 1000. The bandwidth is<br />

chosen to fulfill <strong>the</strong> requirements from Theorem 4.1. 5<br />

We first examine <strong>the</strong> size <strong>of</strong> our test. The data is generated according to (5). We use our<br />

procedure to test <strong>the</strong> null hypo<strong>the</strong>sis<br />

H0: ∃β0 ∈ R 3 s.t. Pr[E[D|Q(X, β0)] = Q(X, β0)] = 1<br />

H1: Pr[E[D|Q(X, β)] = Q(X, β)] < 1 ∀β ∈R 3 ,<br />

where Q(X, β) = Φ(β0+β1X1+β2X2). These results are summarized in Table 1. Our principle<br />

finding is that <strong>the</strong> actual size <strong>of</strong> <strong>the</strong> test is conservative and sensitive to <strong>the</strong> choice <strong>of</strong> c, but,<br />

as expected, <strong>the</strong> actual size approaches <strong>the</strong> asymptotic size as <strong>the</strong> sample size increases.<br />

We now go on to consider <strong>the</strong> power <strong>of</strong> our test against certain misspecifications <strong>of</strong> <strong>the</strong><br />

propensity score. In Tables 2 - 5, we consider four different scenarios in which <strong>the</strong> true data<br />

generating process is given by (6) - (9), respectively. For each scenario, <strong>the</strong> null hypo<strong>the</strong>sis is<br />

<strong>the</strong> same as <strong>the</strong> one above. The test performs admirably when <strong>the</strong> true data generating process<br />

is given by (6) - (8), showing high power for even moderately sized samples. Recall that (6)<br />

is precisely that <strong>of</strong> our heuristic from Lemma 3.1 presented in Figure 1. Given <strong>the</strong> noticeable<br />

departure <strong>of</strong> <strong>the</strong> two graphs in Figure 1, it is not surprising that our test performs well. The<br />

test performs less well, however, when <strong>the</strong> true data generating process is given by (9), which<br />

is again consonant with our earlier findings in Figure 2. This outcome is to be expected due to<br />

<strong>the</strong> shared features <strong>of</strong> <strong>the</strong> distributions for ǫ. <strong>On</strong> <strong>the</strong> o<strong>the</strong>r hand, as evidenced by Table 4, <strong>the</strong><br />

test performs much better when <strong>the</strong> distribution is assumed to be χ2 1 , which is not symmetric<br />

about zero.<br />

We consider next a scenario in which <strong>the</strong> true data generating process is (5) and <strong>the</strong> null<br />

hypo<strong>the</strong>sis is <strong>the</strong> same as <strong>the</strong> one above, but now Q(X, β) = β0 + β1X1 + β2X2. Naturally, we<br />

would expect any sensible test to have high power against such alternatives. Q(X, β) is not<br />

5 − The bandwidth employed in [18], c·n 2<br />

5 does not fulfill <strong>the</strong> assumptions required by <strong>the</strong>orem<br />

1<br />

− 4.1. However, using this bandwidth instead <strong>of</strong> c · n 5 does not change <strong>the</strong> conclusion from <strong>the</strong><br />

Monte Carlo analysis presented below. Results are available on request.<br />

11


even guaranteed to be between 0 and 1. Fortunately, we find that our test has high power for<br />

even very small sample sizes in this situation. These results are summarized in Table 6.<br />

Finally, we compare our test with <strong>the</strong> test proposed by Zheng [18]. First, we consider <strong>the</strong><br />

same setup <strong>of</strong> Table 2 above, where <strong>the</strong> data generating process is given by (6). The results are<br />

presented in Table 7. As noted earlier, Zheng’s test suffers from a curse <strong>of</strong> dimensionality and<br />

so we might expect that our test would perform better in finite samples for many alternatives.<br />

Indeed, this feature is borne out by our Monte Carlo study: our test rejects <strong>the</strong> null hypo<strong>the</strong>sis<br />

much more frequently for nearly all samples sizes. To investigate <strong>the</strong> severity <strong>of</strong> this problem,<br />

we consider a fur<strong>the</strong>r model which includes an additional covariate. Define<br />

X3 = Z1 + Z3<br />

√<br />

2<br />

where Z3 is a standard normal random variable independent <strong>of</strong> Z1 and Z2. The data generating<br />

process for D ∗ in this instance is given by<br />

D ∗ = 1 + X1 + X2 + X1X2 + X3 − ǫ, ǫ ∼ N(0, σ 2 ) ,<br />

where ǫ ⊥ (X1, X2, X3). The null hypo<strong>the</strong>sis is as before but now Q(X, β) = Φ(β0 + β1X1 +<br />

β2X2 + β3X3). The results <strong>of</strong> this comparison are presented in Tables 8 and 9. In this case,<br />

with <strong>the</strong> additional covariate, we find that our test retains high power at all sample sizes, but<br />

<strong>the</strong> highest power <strong>of</strong> Zheng’s test is approximately 20 percent and <strong>of</strong>ten much lower. Thus, we<br />

believe that our test will in practice be <strong>of</strong> much greater use than Zheng’s, especially when X<br />

is high dimensional. Of course, it should be emphasized again that this advantage <strong>of</strong> our test<br />

comes at <strong>the</strong> expense <strong>of</strong> power against certain alternatives.<br />

[TABLE 1 ABOUT HERE]<br />

[TABLE 2 ABOUT HERE]<br />

[TABLE 3 ABOUT HERE]<br />

[TABLE 4 ABOUT HERE]<br />

[TABLE 5 ABOUT HERE]<br />

[TABLE 6 ABOUT HERE]<br />

[TABLE 7 ABOUT HERE]<br />

12<br />

,


5 Conclusion<br />

[TABLE 8 ABOUT HERE]<br />

[TABLE 9 ABOUT HERE]<br />

In this paper, we have shown that <strong>the</strong> commonly computed estimates <strong>of</strong> <strong>the</strong> densities <strong>of</strong> <strong>the</strong><br />

propensity score conditional on participation provide a means <strong>of</strong> examining whe<strong>the</strong>r <strong>the</strong> para-<br />

metric model is correctly specified. In particular, correct specification <strong>of</strong> <strong>the</strong> propensity score<br />

implies a certain restriction between <strong>the</strong> estimated densities. We have shown fur<strong>the</strong>r that this<br />

restriction is equivalent to an orthogonality restriction, which can be used as <strong>the</strong> basis <strong>of</strong> a<br />

formal test for correct specification. While our test does not have power against all forms <strong>of</strong><br />

misspecification <strong>of</strong> <strong>the</strong> propensity score, we argue that for a large class <strong>of</strong> alternatives our test<br />

will perform better in finite samples than existing tests that have power against all forms <strong>of</strong><br />

misspecification. Our Monte Carlo study <strong>of</strong> <strong>the</strong> finite sample behavior <strong>of</strong> our test corroborates<br />

this hypo<strong>the</strong>sis. The study shows fur<strong>the</strong>r that our test performs well against many types <strong>of</strong><br />

misspecification. Since our test is also easily implemented, it is our hope that this work will<br />

persuade researchers to examine <strong>the</strong> specification <strong>of</strong> <strong>the</strong>ir model for <strong>the</strong> propensity score, as<br />

its validity is essential for consistency <strong>of</strong> <strong>the</strong>ir estimates <strong>of</strong> <strong>the</strong> treatment effect.<br />

A Pro<strong>of</strong> <strong>of</strong> Theorem 4.1:<br />

We start by proving <strong>the</strong> first claim, namely, that<br />

n hn ˆVn d → N(0, Σ). (10)<br />

We start proving this result by first showing that<br />

<br />

ˆVn − Vn0 = oP (1), (11)<br />

where<br />

ˆVn :=<br />

with ˆεi = Di − Q(Xi, ˆ θ), and<br />

Vn0 :=<br />

1<br />

n(n − 1)<br />

1<br />

n(n − 1)<br />

nh 1/2<br />

n<br />

n<br />

<br />

1 Q(Xi,<br />

K<br />

ˆ θ) − Q(Xj, ˆ <br />

θ)<br />

ˆεiˆεj,<br />

h<br />

hn<br />

i=1 j=i<br />

n <br />

<br />

1 Q(Xi, θ0) − Q(Xj, θ0)<br />

K<br />

hn<br />

i=1 j=i<br />

13<br />

hn<br />

εi0εj0,


with εi0 = Di − Q(Xi, θ0). Then we use <strong>the</strong> analysis in [18] to argue that<br />

so that combining (11) and (12) gives us (10).<br />

n hnVn0 d → N(0, Σ), (12)<br />

To prove (11) we are going to rely on Lemma 3 given on page 286 <strong>of</strong> Heckman, Ichimura,<br />

and Todd [6]. To use that lemma, we define 6<br />

Θn := {θ<br />

<br />

∈ Θ : an||θ − θ0|| ≤ ǫθ}<br />

Ψn :=<br />

ψn0(Di,Xi,Dj,Xj) :=<br />

ˆψn(Di,Xi,Dj,Xj) :=<br />

so that n √ hn ˆVn = <br />

i<br />

ψn : ψn(Di, Xi, Dj, Xj) = εiεj<br />

(n−1) √ hn<br />

Q(Xi,θ0)−Q(Xj,θ0)<br />

εi0εj0<br />

(n−1) √ hn K<br />

ˆεi ˆεj<br />

(n−1) √ hn K<br />

hn <br />

Q(Xi, ˆ θ)−Q(Xj, ˆ θ)<br />

hn<br />

j=i ˆ ψn(i, j) and n √ hnVn0 = <br />

will verify two conditions: (i) that <strong>the</strong> U-process <br />

i<br />

<br />

i<br />

K<br />

Q(Xi,θ)−Q(Xj,θ)<br />

hn<br />

<br />

, θ ∈ Θn<br />

<br />

j=i ψn0(i, j). To show (11), we<br />

<br />

j=i ψn(i, j) is equicontinuous over Ψn<br />

in a neighborhood <strong>of</strong> ψn0, and (ii) that, with probability approaching to 1, ˆ ψn lies in that<br />

neighborhood.<br />

To show equicontinuity, we will rely on <strong>the</strong> equicontinuity lemma which requires a sym-<br />

metric and degenerate process. The process <br />

i<br />

<br />

j=i ψn(i, j) is symmetric, if <strong>the</strong> kernel is<br />

symmetric, but it is not degenerate. By repeatedly adding and subtracting certain terms,<br />

however, we see that <br />

i,j ψn(i, j) equals<br />

<br />

<br />

εiεj<br />

(n − 1) √ <br />

Q(Xi, θ) − Q(Xj, θ)<br />

εiεj<br />

K<br />

− E<br />

hn<br />

hn<br />

(n − 1) √ <br />

Q(Xi, θ) − Q(Xj, θ)<br />

K<br />

|Dj, Xj<br />

hn<br />

hn<br />

i<br />

+ n√ hn<br />

n−1<br />

j=i<br />

− <br />

j<br />

+ <br />

j<br />

<br />

εi0εj0 j E hn K<br />

n √ hnεj0<br />

n − 1 E<br />

Q(Xi ,θ)−Q(X j ,θ)<br />

hn<br />

hn<br />

<br />

|Dj,Xj<br />

<br />

<br />

εi0<br />

−[Q(Xj,θ)−Q(Xj,θ0)]E hn K<br />

hn<br />

Q(Xi ,θ)−Q(X j ,θ)<br />

<br />

[Q(Xi, θ) − Q(Xi, θ0)] Q(Xi, θ) − Q(Xj, θ)<br />

K<br />

|Dj, Xj<br />

n √ hn[Q(Xj, θ) − Q(Xj, θ0)]<br />

E<br />

n − 1<br />

hn<br />

hn<br />

hn<br />

<br />

|Dj,Xj<br />

<br />

(13)<br />

(14)<br />

<br />

[Q(Xi, θ) − Q(Xi, θ0)] Q(Xi, θ) − Q(Xj, θ)<br />

K<br />

|Dj, Xj<br />

By <strong>the</strong> law <strong>of</strong> iterated expectations, <strong>the</strong> expression on <strong>the</strong> second line is zero. Fur<strong>the</strong>rmore,<br />

<strong>the</strong> expressions in (13) and (14) are respectively second and first order symmetric degenerate<br />

U-processes, thus, <strong>the</strong>ir asymptotic properties can be analyzed using Lemma 3 <strong>of</strong> Heckman,<br />

6 an is an increasing sequence <strong>of</strong> numbers that satisfies certain conditions.<br />

14<br />

(15)


Ichimura, and Todd [6]. In addition, <strong>the</strong> expression in (15) can be fur<strong>the</strong>r broken into<br />

n √ hn<br />

n − 1<br />

<br />

<br />

[Q(Xj, θ) − Q(Xj, θ0)][Q(Xi, θ) − Q(Xi, θ0)] Q(Xi, θ) − Q(Xj, θ)<br />

E<br />

K<br />

|Dj, Xj<br />

j<br />

hn<br />

<br />

[Q(Xj, θ) − Q(Xj, θ0)][Q(Xi, θ) − Q(Xi, θ0)] Q(Xi, θ) − Q(Xj, θ)<br />

−E<br />

K<br />

hn<br />

(16)<br />

+n <br />

[Q(Xj, θ) − Q(Xj, θ0)][Q(Xi, θ) − Q(Xi, θ0)] Q(Xi, θ) − Q(Xj, θ)<br />

hnE<br />

K<br />

.<br />

hn<br />

hn<br />

(17)<br />

(16) is a degenerate order 1 process, and hence could be analyzed using <strong>the</strong> same lemma. We<br />

study (17) at <strong>the</strong> end.<br />

Our first step is verifying <strong>the</strong> conditions <strong>of</strong> <strong>the</strong> equicontinuity lemma for {ψn(i, j) −<br />

2<br />

E[ψn(i, j)|j]}. Note that since |D − Q(x, θ)| ≤ 1 for all x, θ,<br />

(n−1) √ <br />

Q(Xi,θ)−Q(Xj,θ) <br />

K<br />

is<br />

hn<br />

hn<br />

an envelope for this process. Moreover,<br />

<br />

lim sup<br />

4<br />

<br />

E K<br />

n→∞<br />

2<br />

<br />

Q(Xi, θ)− Q(Xj, θ)<br />

= lim sup<br />

<br />

Q(Xi, θ)− Q(Xj, θ)<br />

.<br />

n→∞<br />

i<br />

j<br />

(n − 1) 2 hn<br />

In addition,<br />

<br />

1<br />

E K<br />

hn<br />

2<br />

<br />

Q(Xi, θ)− Q(Xj, θ)<br />

=<br />

hn<br />

1<br />

<br />

E K<br />

hn<br />

2<br />

<br />

Q(Xi, θ)− Q(Xj, θ)<br />

hn<br />

<br />

1 Q(Xi, θ0)−Q(Xj, θ0)<br />

+E<br />

.<br />

K<br />

hn<br />

2<br />

hn<br />

hn<br />

hn<br />

<br />

4<br />

E K<br />

hn<br />

2<br />

−K 2<br />

hn<br />

hn<br />

<br />

Q(Xi, θ0)−Q(Xj, θ0)<br />

Since K and Q are Lipschitz, and K is bounded, for some constant C, <strong>the</strong> first expression is<br />

bounded by C/(anh 2 n ), which has a finite limit. Moreover, since Q(X, θ0) is assumed to have<br />

a Lebesgue density, denoted by fQ, we have<br />

<br />

Q(Xi, θ0)− Q(Xj, θ0)<br />

=<br />

<br />

1<br />

E K<br />

hn<br />

2<br />

hn<br />

hn<br />

K 2 (u)fQ(Q(Xj, θ0) + uhn)dufQ(Q(Xj, θ0))dQ.<br />

But boundedness <strong>of</strong> K implies that this last expression has a finite limit as well. Thus, <strong>the</strong> first<br />

condition <strong>of</strong> <strong>the</strong> equicontinuity lemma is satisfied for our process. To<br />

verify <strong>the</strong> second condi-<br />

1<br />

tion, we note that for each δ > 0, if <strong>the</strong> kernel function is bounded, 1<br />

(n−1) √ hn K hn<br />

equals 1 for only finitely many n because n √ hn → ∞. <strong>On</strong> <strong>the</strong> o<strong>the</strong>r hand, by <strong>the</strong> dominated<br />

convergence <strong>the</strong>orem, we have<br />

<br />

limn→∞ E 1 <br />

= E<br />

<br />

limn→∞<br />

(n−1) 2<br />

<br />

i<br />

i<br />

<br />

j<br />

<br />

j 1<br />

hn K2<br />

1<br />

(n−1) 2 hn K2<br />

Q(Xi,θ)−Q(Xj,θ)<br />

hn <br />

Q(Xi,θ)−Q(Xj,θ)<br />

hn<br />

15<br />

<br />

1 1<br />

(n−1) √ hn K<br />

<br />

1 1<br />

(n−1) √ hn K<br />

Q(Xi,θ)−Q(Xj,θ)<br />

hn <br />

Q(Xi,θ)−Q(Xj,θ)<br />

hn<br />

Q(Xi,θ)−Q(Xj,θ)<br />

<br />

<br />

> δ<br />

<br />

> δ<br />

= 0.<br />

<br />

<br />

> δ


Thus, <strong>the</strong> second condition <strong>of</strong> <strong>the</strong> equicontinuity lemma is also satisfied.<br />

Next, we verify <strong>the</strong> same conditions for <strong>the</strong> process given in (14). First, we observe that<br />

we can rewrite each term <strong>of</strong> that summation as<br />

n √ hnεj0<br />

E<br />

an(n − 1)hn<br />

<br />

an[Q(Xi, θ) − Q(Xi, θ0)]K<br />

<br />

Q(Xi, θ) − Q(Xj, θ)<br />

|Xj .<br />

Since Q is Lipschitz with Lipschitz constant LQ, each <strong>of</strong> <strong>the</strong>se terms is bounded by<br />

n √ hn|εj0|LQǫθ<br />

an(n − 1) E<br />

<br />

1 <br />

<br />

hn<br />

K <br />

Q(Xi, θ) − Q(Xj, θ) <br />

|Xj .<br />

hn<br />

Then using <strong>the</strong> Lipschitz continuity <strong>of</strong> K and Q, we have<br />

n √ hn|εj0|LQǫθ<br />

an(n − 1) E<br />

<br />

1 <br />

<br />

hn<br />

K <br />

Q(Xi, θ) − Q(Xj, θ) <br />

|Xj<br />

hn<br />

= n|εj0|LQǫθ<br />

an(n − 1) √ <br />

Q(Xi, θ) − Q(Xj, θ) Q(Xi, θ0) − Q(Xj, θ0) <br />

E K<br />

−K<br />

|Xj +<br />

hn<br />

hn<br />

hn<br />

n|εj0|LQǫθ<br />

an(n − 1) √ <br />

Q(Xi, θ0) − Q(Xj, θ0) <br />

E K<br />

|Xj<br />

hn<br />

hn<br />

nC<br />

≤<br />

a2 nh 3/2<br />

n (n − 1) |εj0| + n√hn|εj0|LQǫθ an(n − 1) E<br />

<br />

1 <br />

<br />

hn<br />

K <br />

Q(Xi, θ0) − Q(Xj, θ0) <br />

|Xj ,<br />

hn<br />

where C = 2LKL2 gǫ2 θ . Then using a change <strong>of</strong> variables operation, we see that <strong>the</strong> second term<br />

in <strong>the</strong> last expression equals<br />

n √ hn|εj0|LQǫθ<br />

an(n − 1)<br />

<br />

hn<br />

<br />

|K (u)| fQ Q(Xj, θ0) + uhn du.<br />

Since K is bounded <strong>the</strong> integrals in this last expression are all finite. Therefore,<br />

n √ hn<br />

an(n − 1) |εj0|<br />

<br />

1<br />

<br />

,<br />

anh2 C + C1<br />

n<br />

where C1 is <strong>the</strong> appropriate constant, is an envelope for (14). To verify <strong>the</strong> first condition <strong>of</strong><br />

<strong>the</strong> equicontinuity lemma, consider<br />

n<br />

lim sup<br />

n→∞<br />

j<br />

2hn a2 n(n − 1) 2 E ε 2 <br />

j0<br />

1<br />

anh2 2 nhn<br />

C + C1 ≤ lim sup<br />

n<br />

n→∞ a2 <br />

1<br />

n anh2 2 C + C1 .<br />

n<br />

Sufficient conditions for this to be finite are that limn→∞ nhna −2<br />

n < ∞, and limn→∞ 1<br />

anh 2 n<br />

< ∞,<br />

which are implied by our assumptions. Thus <strong>the</strong> first condition <strong>of</strong> <strong>the</strong> equicontinuity lemma<br />

is satisfied.<br />

16


To check <strong>the</strong> second condition <strong>of</strong> <strong>the</strong> lemma, let δ > 0,<br />

<br />

limn→∞ j<br />

n 2 hn<br />

a2 n (n−1)2<br />

<br />

1<br />

anh2 2 <br />

C+C1 E ε<br />

n<br />

2 j01 =limn→∞ hn<br />

a2 <br />

1<br />

n anh2 C+C1<br />

n<br />

≤limn→∞ hn<br />

a 2 n<br />

<br />

1<br />

anh2 C+C1<br />

n<br />

2 <br />

j E<br />

<br />

ε2 j01 2 <br />

j E<br />

<br />

n √ hn|εj0 |<br />

an(n−1)<br />

n √ hn|εj0 |<br />

an(n−1)<br />

ε2 j01 √<br />

n hn<br />

an(n−1)<br />

<br />

1<br />

anh2 <br />

C+C1 >δ<br />

n<br />

<br />

1<br />

anh2 C+C1<br />

n<br />

<br />

1<br />

anh2 C+C1<br />

n<br />

<br />

>δ<br />

<br />

>δ ,<br />

where we use <strong>the</strong> fact that |εj0| ≤ 1 to write <strong>the</strong> last inequality. Note that limn→∞[1/(anh2 n )] <<br />

∞ implies that 1/(a2 nh 3/2<br />

n ) → 0. This in turn implies that for each δ > 0 only finitely many<br />

terms in <strong>the</strong> last summation will be non-zero, and <strong>the</strong>refore <strong>the</strong> second condition <strong>of</strong> <strong>the</strong> lemma<br />

is satisfied as well.<br />

<strong>On</strong> <strong>the</strong> o<strong>the</strong>r hand, <strong>the</strong> same arguments show that <strong>the</strong> same conditions are satisfied for <strong>the</strong><br />

process<br />

<br />

<br />

n[Q(Xj, θ) − Q(Xj, θ0)] Q(Xi, θ) − Q(Xi, θ0) Q(Xi, θ) − Q(Xj, θ)<br />

E √ K<br />

|Xj<br />

n − 1<br />

hn<br />

hn<br />

j<br />

<br />

n[Q(Xj, θ) − Q(Xj, θ0)][Q(Xi, θ) − Q(Xi, θ0)]<br />

− E<br />

(n − 1) √ <br />

Q(Xi, θ) − Q(Xj, θ)<br />

K<br />

hn<br />

hn<br />

To verify <strong>the</strong> third condition note that since Θ is a compact subset <strong>of</strong> R l , its L 2 packing<br />

number, D2(τ, Θ), is less than or equal to 2lτ −l . If K(·) is bounded and Lipschitz, Q(·, ·) is<br />

Lipschitz, and n−1h −3/2<br />

n = O(1), <strong>the</strong>n D2(τ, Ψn) ≤ C · τ −l . Thus, <strong>the</strong> third condition <strong>of</strong> <strong>the</strong><br />

lemma is also satisfied.<br />

We know that ψn0 ∈ Ψn. In addition, if<br />

√ n( ˆ θ − θ0) d → N(0, Σθ),<br />

<strong>the</strong>n for any {an} sequence such that an/ √ n → 0, we will have an( ˆ θ − θ0) P → 0, and ˆ ψn will<br />

belong to Ψn with probability approaching to 1.<br />

<br />

<br />

<br />

<br />

j<br />

The arguments given so far allow us to argue that<br />

<br />

<br />

<br />

ˆψn(i, j) − E[<br />

i,j<br />

ˆ <br />

<br />

ψn(i, j)|j] − ψn0(i, j) − E[ψn0 (i, j)|j]<br />

<br />

<br />

=oP (1),<br />

<br />

<br />

<br />

√<br />

hnεj0<br />

<br />

<br />

n − 1 E<br />

<br />

(εi0 − ˆεi) 1<br />

<br />

Q(Xi,<br />

K<br />

hn<br />

ˆ θ) − Q(Xj, ˆ <br />

θ)<br />

|Xj<br />

hn<br />

<br />

<br />

=oP (1), and<br />

n √ hn(εj0 −ˆε j )<br />

n−1<br />

E<br />

(εi0 −ˆε i )<br />

hn<br />

i,j<br />

K<br />

Q(Xi , ˆ θ)−Q(X j , ˆ θ)<br />

hn<br />

<br />

|Xj<br />

<br />

−E<br />

17<br />

n(εj0 −ε j )(ε i0 −ˆε i )<br />

(n−1) √ hn<br />

K<br />

Q(Xi , ˆ θ)−Q(X j , ˆ θ)<br />

hn<br />

=oP (1).


The last thing we need to show is that<br />

lim<br />

n→∞ E<br />

<br />

<br />

n(εj0 − ˆεj)(εi0 − ˆεi) Q(Xi,<br />

√ K<br />

hn<br />

ˆ θ) − Q(Xj, ˆ <br />

θ)<br />

= 0.<br />

hn<br />

But this expression equals<br />

<br />

an[Q(Xj, ˆ θ) − Q(Xj, θ0)]an[Q(Xi, ˆ <br />

θ) − Q(Xi, θ0)] Q(Xi,<br />

K<br />

ˆ θ) − Q(Xj, ˆ <br />

θ)<br />

.<br />

n √ hn<br />

a2 E<br />

n<br />

Since n√ hn<br />

a 2 n<br />

lim<br />

n→∞ E<br />

hn<br />

= O(1), <strong>the</strong> above expression will go to 0 if we can show that<br />

<br />

an[Q(Xj, ˆ θ) − Q(Xj, θ0)]an[Q(Xi, ˆ <br />

θ) − Q(Xi, θ0)] Q(Xi,<br />

K<br />

ˆ θ) − Q(Xj, ˆ <br />

θ)<br />

= 0.<br />

hn<br />

To show (18) we will first argue that<br />

hn<br />

hn<br />

(18)<br />

an|| ˆ θ − θ0|| → 0 a.s. (19)<br />

Then using this almost sure convergence result and <strong>the</strong> Lipshitz continuity <strong>of</strong> Q and K we are<br />

going to argue that<br />

an[Q(Xj, ˆ θ) − Q(Xj, θ0)]an[Q(Xi, ˆ <br />

θ) − Q(Xi, θ0)] Q(Xi,<br />

K<br />

ˆ θ) − Q(Xj, ˆ <br />

θ)<br />

hn<br />

is almost surely bounded by an integrable function. Then <strong>the</strong> result will follow from Theorem<br />

1.3.6 <strong>of</strong> [14].<br />

Since our estimator ˆ θ for θ0 is defined as <strong>the</strong> minimizer <strong>of</strong><br />

ˆϕn(θ) := 1<br />

n<br />

n<br />

[Dj − Q(Xj, θ)] 2<br />

j=1<br />

over a compact parameter space Θ, <strong>the</strong> first step towards showing (19) is to show that<br />

where<br />

1<br />

a −1<br />

n<br />

sup | ˆϕn(θ) − ϕ0(θ)| → 0 a.s., (20)<br />

θ∈Θ<br />

ϕ0(θ) := E[(Dj − Q(Xj, θ)) 2 ].<br />

To show (20) we rely on Theorem 37 from Pollard [10]. We apply that <strong>the</strong>orem on <strong>the</strong> following<br />

family <strong>of</strong> functions<br />

Φn := {(D − Q(x, θ)) 2 /M : θ ∈ Θ}<br />

18<br />

hn


where M = sup θ∈Θ[D−Q(x, θ)] 2 . Since Θ is a compact subset <strong>of</strong> a finite dimensional Euclidean<br />

space, <strong>the</strong> covering number condition needed to apply <strong>the</strong> <strong>the</strong>orem is satisfied by Theorem 4.8<br />

<strong>of</strong> Pollard [11], and applying Theorem 37 <strong>of</strong> Pollard [10] with δn = M and αn = a −1<br />

n<br />

<strong>the</strong> desired result (20) as long as nM2a −2<br />

n<br />

log n<br />

= M 2 na −2<br />

n<br />

log n<br />

gives us<br />

→ ∞. Then, a small modification <strong>of</strong> <strong>the</strong><br />

consistency pro<strong>of</strong> in [9] yields an|| ˆ θ − θ0|| → 0 a.s.. This in turn implies that for ǫθ > 0 and<br />

sufficiently large n, we almost surely have<br />

<br />

<br />

an[Q(Xj,<br />

<br />

<br />

ˆ θ) − Q(Xj, θ0)]an[Q(Xi, ˆ <br />

θ) − Q(Xi, θ0)] Q(Xi,<br />

K<br />

hn<br />

ˆ θ) − Q(Xj, ˆ <br />

θ)<br />

<br />

hn<br />

<br />

<br />

<br />

<br />

K<br />

<br />

Q(Xi, ˆ θ) − Q(Xj, ˆ <br />

θ)<br />

<br />

≤ ǫ3 θLKLQ <br />

<br />

<br />

K <br />

Q(Xi, θ0) − Q(Xj, θ0) <br />

.<br />

≤ ǫ2 θ<br />

hn<br />

Since lim 1<br />

anh 2 n<br />

[14]. Thus,<br />

hn<br />

< ∞ and E 1<br />

<br />

<br />

K hn<br />

anh 2 n<br />

Q(Xi,θ0)−Q(Xj,θ0)<br />

hn<br />

+ ǫ2 θ<br />

hn<br />

n <br />

hn<br />

ˆVn − Vn0 = oP (1).<br />

hn<br />

< ∞, (18) follows from Theorem 1.3.6 <strong>of</strong><br />

Combining this with Lemma 3.3a <strong>of</strong> [18] completes <strong>the</strong> pro<strong>of</strong> for <strong>the</strong> first part <strong>of</strong> Theorem 4.1.<br />

Next we argue that Σ = 2 K2 (u)du [E(D2 i |Qi0)] 2f 2 Q (Qi0)dQi0<br />

<br />

, and Σ can be consistently<br />

estimated by<br />

n 2 1<br />

K<br />

n(n − 1)<br />

2<br />

<br />

Q(Xi, ˆ θ) − Q(Xj, ˆ <br />

θ)<br />

ˆε 2 i ˆε 2 j.<br />

hn<br />

i=1 j=n<br />

We argue this in two steps again. First, note that if K is symmetric, bounded and Lipschitz,<br />

<strong>the</strong>n so is K2 . Thus, under <strong>the</strong> same conditions as before <strong>the</strong> analysis above yields that,<br />

n<br />

<br />

2 1<br />

K<br />

n(n − 1)<br />

2<br />

<br />

Q(Xi, ˆ θ) − Q(Xj, ˆ <br />

θ)<br />

ˆε 2 i ˆε 2 j −K 2<br />

<br />

Q(Xi, θ0) − Q(Xj, θ0)<br />

ε 2 i ε 2 <br />

j = op(1).<br />

hn<br />

i=1 j=i<br />

hn<br />

Second, <strong>the</strong> arguments in <strong>the</strong> pro<strong>of</strong> <strong>of</strong> Lemma 3.3e <strong>of</strong> [18] show that<br />

hn<br />

i=1 j=i<br />

hn<br />

n 2 1<br />

K<br />

n(n − 1)<br />

2<br />

<br />

Q(Xi, θ0) − Q(Xj, θ0)<br />

Combining <strong>the</strong>se gives us <strong>the</strong> second result in Theorem 4.1. <br />

References<br />

hn<br />

hn<br />

<br />

ε 2 i ε 2 j = Σ + oP (1).<br />

[1] A, A., G. I (2004): “Simple and Bias-Corrected Matching Estimators<br />

for Average Treatment Effect,” mimeo, UC-Berkeley and Harvard University.<br />

19<br />

(21)


[2] B, D. A., J. A. S (2004): “How Robust is <strong>the</strong> Evidence on <strong>the</strong> Effects <strong>of</strong><br />

College Quality? Evidence from Matching,” Journal <strong>of</strong> Econometrics, 121, 99—124.<br />

[3] D, R. H., S. W (2002): “<strong>Propensity</strong> Score Matching Methods for Non-<br />

experimental Causal Studies,” The Review <strong>of</strong> Economics and Statistics, 84, 151—161.<br />

[4] H, J. (1998): “<strong>On</strong> <strong>the</strong> Role <strong>of</strong> <strong>the</strong> <strong>Propensity</strong> Score in Efficient Semiparametric Esti-<br />

mation <strong>of</strong> Average Treatment Effects,” Econometrica, 66, 315—331.<br />

[5] H, J., H. I, J. S, P. T (1998): “Characterizing Selection<br />

Bias Using Experimental Data,” Econometrica, 66(5), 1017=1098.<br />

[6] H, J., H. I, P. T (1998): “Matching as an Econometric Eval-<br />

uation Estimator,” Review <strong>of</strong> Economic Studies, 65(2), 261—204.<br />

[7] H, J., R. LL, J. S (1999): “The Economics and Ecconometrics<br />

<strong>of</strong> Active Labor Market Programs,” in Handbook <strong>of</strong> Labor Economics, ed. by A. Orley, and<br />

D. Card. Elsevier Science, North Holland, Amsterdam, New York and Oxford.<br />

[8] L, M. (2002): “Program Heterogeneity and <strong>Propensity</strong> Score Matching: an Appli-<br />

cation to <strong>the</strong> Evaluation <strong>of</strong> Active Labor Market Policies,” The Review <strong>of</strong> Economics and<br />

Statistics, 84, 205—220.<br />

[9] N, W., D. MF (1994): “Large Sample Estimation and Hypo<strong>the</strong>sis<br />

Testing,” in Handbook <strong>of</strong> Econometrics, ed. by R. Engle, and D. McFadden. North-Holland,<br />

Amsterdam.<br />

[10] P, D. (1984): Convergence <strong>of</strong> Stochastic Processes. Springer-Verlag, New York.<br />

[11] (1990): “Empirical Processes: Theory and Applications,” in NSF-CBMS Re-<br />

gional Conference Series in Probability and Statistics, vol. 2. Institute <strong>of</strong> Ma<strong>the</strong>matical<br />

Statistics, Hayward, American Statistical Association, Alexandria.<br />

[12] R, P. R., D. B. R (1985): “Constructing a Control Group Us-<br />

ing Multivariate Matched Sampling Methods That Incorporate <strong>the</strong> <strong>Propensity</strong> Score,” The<br />

American Statistician, 39, 33—38.<br />

[13] R, R., D. B. R (1983): “The Central Role <strong>of</strong> <strong>the</strong> <strong>Propensity</strong> Score<br />

in Observational Studies for Causal Effects,” Biometrika, 70, 41—55.<br />

[14] S, R. J. (1980): Approximation Theory <strong>of</strong> Ma<strong>the</strong>matical Statistics. Wiley, Hobo-<br />

ken, New Jersey.<br />

20


[15] S, B. (2004): “An Evaluation <strong>of</strong> <strong>the</strong> Swedish System <strong>of</strong> Active Labor Market<br />

Programs in <strong>the</strong> 1990s,” The Review <strong>of</strong> Economics and Statistics, 86, 133—155.<br />

[16] S, B. W. (1986): Density Estimation for Statistics and Data Analysis. Chap-<br />

man and Hall/CRC.<br />

[17] S, J. A., P. E. T (2005): “Rejoinder,” Journal <strong>of</strong> Econometrics, 125,<br />

365—375.<br />

[18] Z, J. X. (1996): “A Consistent Test <strong>of</strong> Functional Form via Nonparametric Esti-<br />

mation Techniques,” Journal <strong>of</strong> Econometrics, 75(2), 263—289.<br />

21


FIGURE 1<br />

E D P S<br />

Note: ––––- ˆαn p<br />

1−p ˆ f0,n(p), - - - - - - - - ˆ f1,n(p)<br />

DGP: D ∗ = 1 + X1 + X2 + X1X2 − ǫi, ǫ ∼ N(0, σ 2 ),<br />

Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2)<br />

Density estimated using <strong>the</strong> biweight kernel and<br />

Silverman’s rule <strong>of</strong> thumb, see Silverman[16]


FIGURE 2<br />

E D P S<br />

Note: ––––- ˆαn p<br />

1−p ˆ f0,n(p), - - - - - - - - ˆ f1,n(p)<br />

DGP: D ∗ = 1 + X1 + X2 + −ǫi, ǫ ∼ U(−1, 1),<br />

Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2)<br />

Density estimated using <strong>the</strong> biweight kernel and<br />

Silverman’s rule <strong>of</strong> thumb, see Silverman[16]


TABLE 1: S<br />

c=0.05<br />

Proportion <strong>of</strong> Rejections<br />

1% level 5% level 10% level<br />

n=100 0.011 0.046 0.089<br />

n=200 0.006 0.037 0.082<br />

n=400 0.003 0.031 0.081<br />

n=500 0.006 0.044 0.101<br />

n=800 0.006 0.040 0.088<br />

n=1000<br />

c=0.10<br />

0.005 0.038 0.086<br />

n=100 0.006 0.043 0.101<br />

n=200 0.002 0.024 0.070<br />

n=400 0.004 0.021 0.060<br />

n=500 0.004 0.027 0.074<br />

n=800 0.004 0.035 0.085<br />

n=1000<br />

c=0.15<br />

0.005 0.022 0.070<br />

n=100 0.005 0.027 0.079<br />

n=200 0.003 0.011 0.057<br />

n=400 0.003 0.008 0.049<br />

n=500 0.004 0.012 0.061<br />

n=800 0.008 0.024 0.070<br />

n=1000 0.005 0.019 0.053<br />

DGP: D ∗ = 1 + X1 + X2 − ǫi, ǫ ∼ N(0, σ 2 ),<br />

Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2)


TABLE 2: P<br />

c=0.05<br />

Proportion <strong>of</strong> Rejections<br />

1% level 5% level 10% level<br />

n=100 0.061 0.146 0.218<br />

n=200 0.273 0.432 0.518<br />

n=400 0.787 0.886 0.918<br />

n=500 0.916 0.964 0.975<br />

n=800 0.997 0.999 0.999<br />

n=1000<br />

c=0.10<br />

1.000 1.000 1.000<br />

n=100 0.094 0.189 0.237<br />

n=200 0.417 0.559 0.618<br />

n=400 0.910 0.962 0.978<br />

n=500 0.977 0.990 0.993<br />

n=800 1.000 1.000 1.000<br />

n=1000<br />

c=0.15<br />

1.000 1.000 1.000<br />

n=100 0.101 0.185 0.241<br />

n=200 0.459 0.597 0.664<br />

n=400 0.948 0.976 0.984<br />

n=500 0.984 0.993 0.996<br />

n=800 1.000 1.000 1.000<br />

n=1000 1.000 1.000 1.000<br />

DGP: D ∗ = 1 + X1 + X2 + X1X2 − ǫi, ǫ ∼ N(0, σ 2 ),<br />

Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2)


TABLE 3: P<br />

c=0.05<br />

Proportion <strong>of</strong> Rejections<br />

1% level 5% level 10% level<br />

n=100 0.358 0.479 0.537<br />

n=200 0.818 0.874 0.896<br />

n=400 0.991 0.993 0.994<br />

n=500 1.000 1.000 1.000<br />

n=800 1.000 1.000 1.000<br />

n=1000<br />

c=0.10<br />

1.000 1.000 1.000<br />

n=100 0.290 0.401 0.483<br />

n=200 0.821 0.876 0.902<br />

n=400 0.993 0.994 0.995<br />

n=500 1.000 1.000 1.000<br />

n=800 1.000 1.000 1.000<br />

n=1000<br />

c=0.15<br />

1.000 1.000 1.000<br />

n=100 0.193 0.283 0.343<br />

n=200 0.737 0.814 0.840<br />

n=400 0.990 0.994 0.994<br />

n=500 1.000 1.000 1.000<br />

n=800 1.000 1.000 1.000<br />

n=1000 1.000 1.000 1.000<br />

DGP: D ∗ = (1 + X1 + X2) 2 − ǫi, ǫ ∼ N(0, σ 2 ),<br />

Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2)


TABLE 4: P<br />

c=0.05<br />

Proportion <strong>of</strong> Rejections<br />

1% level 5% level 10% level<br />

n=100 0.023 0.082 0.150<br />

n=200 0.074 0.169 0.232<br />

n=400 0.321 0.490 0.587<br />

n=500 0.440 0.621 0.698<br />

n=800 0.838 0.917 0.942<br />

n=1000<br />

c=0.10<br />

0.929 0.977 0.986<br />

n=100 0.037 0.076 0.134<br />

n=200 0.131 0.242 0.303<br />

n=400 0.457 0.620 0.691<br />

n=500 0.642 0.776 0.848<br />

n=800 0.935 0.969 0.980<br />

n=1000<br />

c=0.15<br />

0.986 0.997 0.997<br />

n=100 0.039 0.076 0.124<br />

n=200 0.162 0.271 0.331<br />

n=400 0.557 0.689 0.751<br />

n=500 0.707 0.825 0.877<br />

n=800 0.957 0.978 0.986<br />

n=1000 0.993 0.998 0.998<br />

DGP: D ∗ = 1 + X1 + X2 − ǫi, ǫ ∼ χ 2 1 ,<br />

Estimated model: Q(X, β) = Φ(β 0 +β 1 X1+β 2 X2)


TABLE 5: P<br />

c=0.05<br />

Proportion <strong>of</strong> Rejections<br />

1% level 5% level 10% level<br />

n=100 0.008 0.057 0.127<br />

n=200 0.007 0.032 0.078<br />

n=400 0.005 0.048 0.091<br />

n=500 0.009 0.045 0.087<br />

n=800 0.021 0.073 0.126<br />

n=1000<br />

c=0.10<br />

0.033 0.094 0.147<br />

n=100 0.008 0.048 0.117<br />

n=200 0.005 0.021 0.063<br />

n=400 0.009 0.039 0.074<br />

n=500 0.013 0.035 0.071<br />

n=800 0.035 0.075 0.124<br />

n=1000<br />

c=0.15<br />

0.048 0.114 0.169<br />

n=100 0.005 0.038 0.107<br />

n=200 0.007 0.021 0.057<br />

n=400 0.009 0.034 0.066<br />

n=500 0.014 0.035 0.074<br />

n=800 0.036 0.081 0.135<br />

n=1000 0.056 0.110 0.176<br />

DGP: D ∗ = 1 + X1 + X2 − ǫi, ǫ ∼ U(−1, 1)<br />

Estimated model: Q(X, β) = Φ(β 0 +β 1 X1+β 2 X2)


TABLE 6: P<br />

c=0.05<br />

Proportion <strong>of</strong> Rejections<br />

1% level 5% level 10% level<br />

n=100 0.218 0.337 0.426<br />

n=200 0.637 0.781 0.826<br />

n=400 0.984 0.993 0.997<br />

n=500 1.000 1.000 1.000<br />

n=800 1.000 1.000 1.000<br />

n=1000<br />

c=0.10<br />

1.000 1.000 1.000<br />

n=100 0.371 0.505 0.592<br />

n=200 0.813 0.904 0.936<br />

n=400 0.998 1.000 1.000<br />

n=500 1.000 1.000 1.000<br />

n=800 1.000 1.000 1.000<br />

n=1000<br />

c=0.15<br />

1.000 1.000 1.000<br />

n=100 0.449 0.584 0.662<br />

n=200 0.891 0.952 0.970<br />

n=400 1.000 1.000 1.000<br />

n=500 1.000 1.000 1.000<br />

n=800 1.000 1.000 1.000<br />

n=1000 1.000 1.000 1.000<br />

DGP: D ∗ = 1 + X1 + X2 − ǫi, ǫ ∼ N(0, σ 2 ),<br />

Estimated model: Q(X, β) = β 0 +β 1X1+β 2X2


TABLE 7: P, Z[18]<br />

c=0.05<br />

Proportion <strong>of</strong> Rejections<br />

1% level 5% level 10% level<br />

n=100 0.000 0.011 0.053<br />

n=200 0.001 0.016 0.070<br />

n=400 0.005 0.033 0.079<br />

n=500 0.005 0.038 0.106<br />

n=800 0.012 0.059 0.125<br />

n=1000<br />

c=0.10<br />

0.009 0.067 0.131<br />

n=100 0.002 0.030 0.085<br />

n=200 0.005 0.052 0.110<br />

n=400 0.011 0.072 0.126<br />

n=500 0.014 0.082 0.157<br />

n=800 0.041 0.132 0.210<br />

n=1000<br />

c=0.15<br />

0.063 0.175 0.294<br />

n=100 0.002 0.046 0.106<br />

n=200 0.008 0.061 0.115<br />

n=400 0.035 0.109 0.176<br />

n=500 0.043 0.141 0.227<br />

n=800 0.107 0.271 0.370<br />

n=1000 0.166 0.373 0.494<br />

DGP: D ∗ = 1 + X1 + X2 + X1X2 − ǫi, ǫ ∼ N(0, σ 2 )<br />

Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2)


TABLE 8: P<br />

c=0.05<br />

Proportion <strong>of</strong> Rejections<br />

1% level 5% level 10% level<br />

n=100 0.034 0.079 0.127<br />

n=200 0.049 0.106 0.163<br />

n=400 0.162 0.288 0.364<br />

n=500 0.229 0.377 0.448<br />

n=800 0.495 0.620 0.696<br />

n=1000<br />

c=0.10<br />

0.594 0.751 0.807<br />

n=100 0.038 0.087 0.134<br />

n=200 0.081 0.143 0.190<br />

n=400 0.274 0.387 0.465<br />

n=500 0.356 0.480 0.560<br />

n=800 0.637 0.744 0.797<br />

n=1000<br />

c=0.15<br />

0.768 0.857 0.895<br />

n=100 0.042 0.078 0.128<br />

n=200 0.102 0.170 0.217<br />

n=400 0.320 0.448 0.508<br />

n=500 0.415 0.550 0.619<br />

n=800 0.700 0.789 0.837<br />

n=1000 0.829 0.895 0.924<br />

DGP: D ∗ = 1 + X1 + X2 + X1X2 + X3 − ǫi, ǫ ∼ N(0, σ 2 )<br />

Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2 + β3X3)


TABLE 9: P, Z[18]<br />

c=0.05<br />

Proportion <strong>of</strong> Rejections<br />

1% level 5% level 10% level<br />

n=100 0.000 0.003 0.009<br />

n=200 0.000 0.002 0.009<br />

n=400 0.000 0.002 0.015<br />

n=500 0.000 0.003 0.021<br />

n=800 0.000 0.013 0.041<br />

n=1000<br />

c=0.10<br />

0.000 0.012 0.052<br />

n=100 0.000 0.000 0.014<br />

n=200 0.000 0.009 0.044<br />

n=400 0.003 0.018 0.064<br />

n=500 0.001 0.019 0.079<br />

n=800 0.001 0.034 0.092<br />

n=1000<br />

c=0.15<br />

0.001 0.041 0.114<br />

n=100 0.001 0.011 0.041<br />

n=200 0.002 0.021 0.071<br />

n=400 0.002 0.036 0.096<br />

n=500 0.003 0.037 0.089<br />

n=800 0.003 0.055 0.110<br />

n=1000 0.012 0.064 0.128<br />

DGP: D ∗ = 1 + X1 + X2 + X1X2 + X3 − ǫi, ǫ ∼ N(0, σ 2 )<br />

Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2 + β3X3)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!