On the Identification of Misspecified Propensity Scores - School of ...

On the Identification of Misspecified Propensity Scores 

Azeem M. Shaikh 

Department of Economics 

Stanford University 

Edward J. Vytlacil 


Columbia University 

February 8, 2006 

Abstract 

Marianne Simonsen 


University of Aarhus 

Nese Yildiz 


University of Pittsburgh 

Propensity score matching has become a popular method for the evaluation of 

treatment effects. In empirical applications, researchers almost always impose a 

parametric model for the propensity score. This raises the possibility that the 

model is misspecified and therefore the propensity score matching estimator may 

be inconsistent. We show that the common practice of calculating estimates of the 

densities of the propensity score conditional on the participation decision provides 

a means for examining whether the propensity score is misspecified. In particular, 

we derive a restriction between the density of the propensity score among participants 

versus the density among nonparticipants. We show that this restriction 

between the two conditional densities is equivalent to a particular orthogonality 

restriction and derive a formal test based upon it. The resulting test is shown to 

have dramatically greater power than competing tests for many alternatives. The 

principle disadvantage of this approach is loss of power against some alternatives. 

Acknowledgements: 

The authors wish to thank Jeff Smith for valuable comments and suggestions.

1 Introduction 

Propensity score matching is a widely used approach for the estimation of treatment effects. 

See Heckman, LaLonde, and Smith [7] for a detailed survey of its use in the treatment effect 

literature. The method is based on the following well-known result of Rosenbaum and Rubin 

[13]: if selection into the program is independent of the potential outcomes of interest con- 

ditional on a vector of covariates, then selection into the program is also independent of the 

potential outcomes conditional on the propensity score, where the propensity score denotes the 

probability of selection into the program conditional on the same vector of covariates. The high 

dimensionality of the vector of observed covariates often forces researchers to adopt a finite- 

dimensional approximation to the propensity score, in lieu of a more flexible nonparametric 

model. Imposing such restrictions on the functional form for the propensity score raises the 

possibility that the propensity score model is misspecified. If the propensity score model is mis- 

specified, then in general the estimator of the propensity score will be inconsistent for the true 

propensity score and the resulting propensity score matching estimator may be inconsistent 

for the treatment effect of interest. 

First suggested by Heckman, Ichimura, Smith, and Todd [5], it is now common to calculate 

estimates of the densities of the propensity score conditional on the participation decision. 

These estimates are calculated in order to determine the region of common support on which 

to perform matching. We show that these conditional densities provide a means for examining 

whether the propensity score is misspecified. In particular, we derive a restriction between 

the density of the propensity score model among participants and the density among nonpar- 

ticipants. Failure of this restriction to hold for the estimated conditional densities provides 

evidence that the propensity score model is misspecified. We show that this restriction between 

the two conditional densities is equivalent to a particular orthogonality restriction and derive 

a formal test based upon it. Unlike other tests for correct specification of the propensity score, 

the resulting test does not require nonparametric estimation of high dimensional objects. As 

a result, our test does not suffer from the curse of dimensionality and has dramatically greater 

power than competing tests for many alternatives. We provide striking evidence of this fact 

in a limited Monte Carlo study. The principle drawback of this approach is that our test does 

not have power against certain alternatives, but we argue that these alternatives are rather 

exceptional. 

Following [12], applied researchers often employ heuristic balancing score tests for misspeci- 

fication of the propensity score to guide the choice of specification of the parametric propensity 

score. Specifically, [12] suggests examining whether the observable characteristics of the pop- 

ulation are independent of participation conditional on the propensity score. See e.g. [3], [8], 

[15], and [17] for applications of such balancing score tests. The formal misspecification test 

2

presented in this paper may be considered a complement to this practice. 

Our paper proceeds as follows. In Section 2, we first provide a summary of the method of 

propensity score matching. We then present in Section 3 the restriction upon which our test 

is based. In Section 4, we use this result to derive an asymptotic test of misspecification and 

examine its finite sample properties in a Monte Carlo study. Finally, Section 5 concludes. 

2 A Review of Matching 

Before proceeding, it is useful to review the key theoretical underpinnings of matching as a 

means of program evaluation. There are two groups of individuals, participants and nonpar- 

ticipants, in a program of interest. Participation in the program of interest is denoted by the 

dummy variable D, with D = 1 if the individual chooses to participate and D = 0 otherwise. 

Individuals in each of these two groups are associated with observed characteristics X. Two 

commonly used metrics for evaluating the effect of participation in a program are the average 

treatment effect, given by E[Y1 −Y0], and the average treatment effect on the treated, given by 

E[Y1 − Y0|D = 1], where Y1 is the potential outcome in the case of participation and Y0 is the 

potential outcome in the case of nonparticipation. The difficulty with estimation of this object 

lies with the following missing data problem: the econometrician never observes the counter- 

factual outcome Y1−D, which precludes direct estimation of E[Y0|D = 1] and E[Y1|D = 0]. 

The method of matching resolves this difficulty by matching each participant with a non- 

participant that is similar in terms of observed characteristics X. As described in Rosenbaum 

and Rubin [13], matching formally requires that: 

A1. (Y0, Y1) ⊥ D | X. 

A2. 0 

Note that a consequence of A2 is that E[Y0|D = 1, X = x] and E[Y1|D = 0, X = x] are 

well defined for all x ∈ supp(X). Hence, matching suggests estimation of E[Y0|D = 1] and 

E[Y1|D = 0] by a two-step procedure in which E[Y0|D = 1, X] and E[Y1|D = 0, X] are first 

estimated by exploiting A1, and then integrated with respect to the empirical distribution of 

X in order to obtain an estimate of E[Y1 − Y0] or integrated with respect to the empirical 

distribution of X conditional on D = 1 to obtain an estimate of E[Y1 − Y0|D = 1]. Following 

Heckman, Ichimura, Smith, and Todd [5], it is clear that one can relax the first condition to 

only require mean independence instead of full independence. Using this procedure, it is in 

3

principle possible to construct a √ n-normal estimator of the parameter of interest without 

imposing any finite-dimensional restrictions. 1 

An alternative approach for estimating E[Y0|D = 1] and E[Y1|D = 0] is propensity score 

matching, which relies upon a celebrated result of Rosenbaum and Rubin [13]. They show that 

the assumptions underlying matching conditional on X, A1 and A2, imply: 

A1*. (Y0, Y1) ⊥ D | P (X). 

A2*. 0 < Pr[D = 1|P (X) = p] < 1 for all p ∈ supp(P (X)). 

This result can be restated as follows: if matching on X is valid, so is matching based on the 

propensity score P (X). This result motivates propensity score matching, in which one first 

estimates the propensity score in a first step and then perform matching, as described above, 

using the estimated propensity score. 

In practice, both matching on X and matching on P (X) suffer from the so-called “curse 

of dimensionality”. While matching on X requires the researcher to estimate E[Y0|D = 1, X], 

matching on P (X) requires the researcher to estimate E[D|X], an equally high dimensional 

object. Thus, if X has more than a few dimensions, nonparametric procedures for estimating 

these objects are undesirable due to sizable finite sample bias. This difficulty leads to the 

common practice of implementing propensity score matching with a parametric specification 

for the propensity score. More specifically, in practice propensity score matching involves 

estimating a parametric model for the propensity score by maximum likelihood in a first step, 

and then using kernel regression or some other comparable nonparametric procedure to perform 

matching on the basis of the estimated propensity scores. 2 

If the propensity score is estimated using a parametric model, then it is possible that the 

model is misspecified and that the resulting estimator of P (X) is inconsistent. Let Q(X) 

denote the probability limit of the estimator of P (X). If P (X) = Q(X), as will likely be 

the case if the model for P (X) is misspecified, then in general we cannot expect (Y0, Y1) to 

be independent (or even mean independent) of D conditional on Q(X). In this case, the 

propensity score matching will in general provide inconsistent estimates of both E[Y1 −Y0] and 

E[Y1 − Y0|D = 1]. 

1 See Heckman, Ichimura, and Todd [6], Hahn [4], and Abadie and Imbens [1]. 

2 Since binary variables are bounded, there is potentially less room for misspecification of a 

parametric model for binary outcome as compared to a continous one, see [2]. 

4

3 The Restriction 

In order to ensure that the appropriate conditional expectations exist, propensity score match- 

ing is only valid over the region of common support of the densities of the propensity score con- 

ditional on the participation decision. 3 Following the influential work of Heckman, Ichimura, 

Smith, and Todd [5], researchers therefore compute estimates of both of these densities to de- 

termine the appropriate region over which to perform matching. The test for misspecification 

that we develop in the following section will exploit a restriction that must hold between these 

two densities. 

Throughout the following we will assume that the propensity score P (X) has a density with 

respect to Lebesgue measure. Let f(p) denote this density, f1(p) the density of the propensity 

score conditional on participation, and f0(p) the density of the propensity score conditional on 

nonparticipation. Using this notation, we have the following result: 

Lemma 3.1. Let α = Pr[D=0] 

Pr[D=1] 

p ∈ supp(P ) we have that 

and assume 0 < Pr[D = 0] < 1. Then for all 0 

f1(p) = α p 

1 − p f0(p) . 

P: Consider 0 the Law of Iterated Expectations, 

we have that 

Bayes’ Theorem implies further that 

Similarly, we have that 

Pr[D = 1|P (X) = p] = E[E[D|X]|P (X) = p] = p . 

f1(p) Pr[D = 1] = Pr[D = 1|P (X) = p]f(p) = pf(p) > 0 . 

f0(p) Pr[D = 0] = (1 − p)f(p) > 0 . 

Combining these two implications, we have that 

f1(p) p 

= α 

f0(p) 1 − p , 

from which the asserted conclusion follows immediately. 

Lemma 3.1 suggests that to the extent that the parametric model for P (X) is correctly 

specified, we would expect the stated relationship between the estimates of the conditional 

densities of the propensity score to hold approximately. The property can be easily checked 

3 Propensity score matching is only valid for the common support for the population but 

in practice, all we can make conclusions about is the common support in the sample. As the 

sample size gets larger, of course, the two sets of common support converge. 

5

given estimates ˆ f1,n(·) and ˆ f0,n(·) by graphing ˆ f1,n(p) along with the function ˆαn p 

1−p ˆ f0,n(p), 

where ˆαn is the sample analogue of α. Dramatic departures of one graph from another should 

be taken as evidence of misspecification of the propensity score. 

We now provide an illustration of this heuristic approach to detecting misspecifcation of 

the propensity score. Define 

X1 = Z1 

X2 = Z1 + Z2 

√ 

2 

(1a) 

, (1b) 

where Z1 and Z2 are independent standard normal variables. Consider the model 

D ∗ = 1 + X1 + X2 + X1X2 − ǫi, ǫ ∼ N(0, σ 2 ) (2a) 

D = 1 [D ∗ > 0] , (2b) 

where ǫ ⊥ (X1, X2), and 1[·] is the logical indicator function. We consider the results of fitting 

a misspecified model 

Q(X, β) = Φ(β0 + β1X1 + β2X2) (3) 

that only differs from the true model by omitting the quadratic term. 

We generate a sample of 10, 000 observations (Di, X1i, X2i, ǫi) according to the model given 

by (2a) and (2b). We then estimate using maximum likelihood the propensity score under the 

incorrectly specified model (3). Estimates of the densities of the propensity score conditional 

on the participation decision are then constructed using kernel density estimation. 4 

Figure 1 depicts the resulting estimates of ˆ f1,n(p) and ˆαn p 

1−p ˆ f0,n(p). We see that graphs 

of these two objects deviate substantially from each other, which can be taken as evidence of 

misspecification. 

[FIGURE 1 ABOUT HERE] 

[FIGURE 2 ABOUT HERE] 

We also consider the case where the true data generating process is given by 

D ∗ = 1 + X1 + X2 − ǫ, ǫ ∼ U(−1, 1) (4) 

and we again fit the misspecified model (3). The estimated densities from this exercise are 

displayed in Figure 2. We find that the two graphs lie very close to one another, suggesting 

4 We employ the biweight kernel and choose the bandwidth using Silverman’s rule of thumb, 

see [16]. 

6

that misspecification is not very severe. This feature is perhaps not too surprising when one 

considers the fact that the only source of misspecification is the distribution for ǫ and the 

assumed distribution shares many features with the true distribution. In particular, both are 

symmetric about zero. 

These examples shows that misspecification of the propensity score may lead to noticeable 

departures from the restriction of Lemma 3.1. In order to develop a formal test based on this 

restriction, it will be useful to restate it in terms of an equivalent orthogonality condition. To 

this end, we have the following result. 

Lemma 3.2. Let α = Pr[D=0] 

Pr[D=1] and assume that 0 < Pr[D = 1] < 1. Let Q be a random 

variable on the unit interval with density with respect to Lebesgue measure. Denote by g1(·) the 

density of Q conditional on D = 1 and by g0(·) the density of Q conditional on D = 0. Then, 

g1(q) = α q 

1−q g0(q) for all q ∈ supp(Q) such that 0 < q < 1 if and only if E[D − Q|Q = q] = 0 

for all q ∈ supp(Q) such that 0 < q < 1. 

P: First consider necessity. Consider q ∈ supp(Q) such that 0 < q < 1. E[D − Q|Q = 

q] = 0 is equivalent to Pr[D = 1|Q = q] = E[D|Q = q] = q. For such q, Pr[D = 1|Q = q] = q 

and Bayes’ Theorem imply together that 

g1(q) Pr[D = 1] = Pr[D = 1|Q = q]g(q) = qg(q) > 0 , 

where g(q) is the density of the Q. Similarly, we have that 

g0(q) Pr[D = 0] = (1 − q)g(q) > 0 . 

Combining these two implications, we have that 

g1(q) q 

= α 

g0(q) 1 − q , 

from which the asserted conclusion follows. Now consider sufficiency. Assume that g1(q) = 

α q 

1−q g0(q) for all q ∈ supp(Q) such that 0 < q < 1. By Bayes’ Theorem, 

E[D|Q = q] = 

g1(q) Pr[D = 1] 

= q 

Pr[D = 0]g0(q) + Pr[D = 1]g1(q) 

where the last equality follows from plugging in g1(q) = α q 

1−q g0(q). It follows that E[D−Q|Q = 

q] = q, as desired. 

It is important to observe that this characterization of the restriction only involves low 

dimensional conditional expections. We will show in the following section that a consequence 

of this fact is that, unlike more conventional nonparametric tests for correct specification of 

7

the propensity score, our test will not suffer from a curse of dimensionality and will therefore 

have much greater power in finite samples against many alternatives. The principle drawback 

of this approach is that there will be certain alternatives for which we will not have power. 

To examine this issue further, consider the following example. Suppose X = (X1, X2), and 

E[D|X1, X2] = E[D|X1]. Let Q(X) = E[D|X1]. Thus, Q(X) = P (X), and yet Q(X) will 

satisfy E[D − Q|Q] = 0. More generally, this restriction will not be able to detect the omission 

of covariates provided that the conditional expectation of D given the included covariates is 

correctly specified. Of course, this example is rather exceptional, since Q(X) is in this case 

correctly specified for a subvector of X. 

4 A Formal Test 

We now propose and develop a formal test for misspecification based on Lemma 3.2, which 

allows us to draw upon the large literature on testing orthogonality constraints. In particular, 

we adapt the testing methodology of Zheng [18] for our problem. Zheng considers testing if 

the conditional expectation E[Y |X] is correctly specified. Under his null hypothesis, E[Y |X] 

is assumed to belong to a parametric family of real valued functions Q(X, θ) on R m ×Θ, where 

Θ ⊆ R l . Concretely, his null and alternative hypotheses are 

H0: ∃θ0 ∈ Θ s.t. Pr[E[Y |X] = Q(X, θ0)] = 1. 

H1: Pr[E[Y |X] = Q(X, θ)] < 1 ∀θ ∈ Θ. 

Note that any θ0 satisfying the null hypothesis also solves minθ∈Θ E[Y − Q(X, θ)] 2 . 

Zheng’s test is based on the following idea. Let ǫ = Y − Q(X, θ0) and let fX(·) denote the 

density of X. Then, under the null hypothesis, 

while under the alternative hypothesis 

E[ǫE[ǫ|X]fX(X)] = 0 , 

E[ǫE[ǫ|X]fX(X)] = E[[E[ǫ|X]] 2 fX(X)] > 0 . 

The last inequality follows because under the alternative hypothesis (E[ǫ|X]) 2 > 0 with positive 

probability. Based on this observation, he uses the sample analogue of E[ǫE[ǫ|X]fX(X)] to 

form his test. The test statistic is given by 

ˆVn := 

1 

n(n − 1) 

n 

1 

h 

i=1 j=i 

m n 

8 

 

Xi − Xj 

K 

ˆεiˆεj , 

hn

where ˆεi := Yi − Q(Xi, ˆ θn), ˆ θn is a √ n-consistent estimator of arg minθ∈Θ E[Y − Q(X, θ)] 2 , h 

is a smoothing parameter, and K(·) is a kernel. 

Zheng shows that under certain assumptions 

where Σ can be consistently estimated by 

ˆΣn := 

2 

n(n − 1) 

nh m/2 ˆVn n 

d → N(0, Σ) 

n 

1 

h 

i=1 j=i 

m n 

K 2 

 

Xi − Xj 

hn 

 

ˆε 2 i ˆε 2 j . 

We would like to test whether E[D|Q] = Q, where Q(X) = Q(X, θ0). Our null and 

alternative hypotheses are therefore given by 

H0: ∃θ0 ∈ Θ s.t. Pr[E[D|Q(X, θ0)] = Q(X, θ0)] = 1 

H1: Pr[E[D|Q(X, θ)] = Q(X, θ)] < 1 ∀θ ∈ Θ 

Note that this differs from the framework of Zheng [18] in that the variable that is conditioned 

on is not observed but has to be estimated. Nevertheless, by analogy with the test statistic 

above, we consider testing based upon 

n 

 

1 1 Q(Xi, 

ˆVn := 

K 

n(n − 1) 

ˆ θn) − Q(Xj, ˆ 

θn) 

ˆεiˆεj 

hn 

i=1 j=i 

where ˆεi = Di − Q(Xi, ˆ θn). Unfortunately, the analysis of Zheng [18] does not apply directly 

because in our case the estimated Q appears in the kernel function as well. Thus, we require 

the following extension of Zheng’s analysis: 

Theorem 4.1. Make the following assumptions: 

(i) {Di, Xi} n i=1 is a random sample; 

(ii) Q is Lipschitz continuous in (x, θ). Q(·, θ0) is continuously differentiable. Moreover, it 

has density with respect to Lebesgue measure on [0,1] that is continuously differentiable; 

(iii) K(·) is a nonnegative, bounded, symmetric and Lipschitz continuous function such that 

K(u)du = 1; 

(iv) The bandwidth sequence and the sequence of positive numbers {an} are such that: 

9 

hn

(a) n √ hn → ∞, (nh 3/2 

n ) −1 → c < ∞; 

(b) 

1 

anh 2 n 

→ a < ∞, na−2 n 

log n → ∞, n√hn a2 n 

→ b < ∞. 

Let ˆ θn be a √ n-consistent estimator of arg minθ∈Θ E[D − Q(X, θ)] 2 , where Θ is a compact 

and convex subset of R l . Then, 

hn 

i=1 j=n 

n hn ˆVn d → N(0, Σ) 

where Σ can be consistently estimated by 

n 2 

n(n − 1) 

1 

K 2 

 

Q(Xi, ˆ θn) − Q(Xj, ˆ 

θn) 

P: See Appendix A. 

Assumptions (i)-(iii) are almost the same as Assumption 1 and 5 of [18] adapted to 

our problem. The only difference is that here we require that Q and K are Lipschitz con- 

tinuous functions. In addition, the requirement that ˆ θn be a √ n-consistent estimator of 

arg minθ∈Θ E[D − Q(X, θ)] 2 is guaranteed by Assumptions 2-4 in [18]. The extra assump- 

tions on the bandwidth sequence are there because in our analysis Q(x, ˆ θ) appears in the 

argument of the kernel function as well. 

As noted earlier, there are alternatives against which Zheng’s test has power tending to 

1, but our test does not. However, Zheng’s procedure requires estimation of expectations 

conditional on X, which in practice is high dimensional, and therefore suffers from a curse of 

dimensionality. Our procedure only requires estimation of expectations conditional on Q and 

thus avoids this difficulty. As a result, we would expect our test to perform noticeably better 

in finite samples against many alternatives of interest. 

We now examine the finite sample properties of this testing procedure in a limited Monte 

Carlo study. Our setup will follow Zheng [18] closely. As before, let X1 and X2 be given by 

(1a) and (1b). We define now several different data generating processes for D ∗ that will be 

used at different points in our Monte Carlo study. 

hn 

ˆε 2 i ˆε 2 j 

D ∗ = 1 + X1 + X2 − ǫ, ǫ ∼ N(0, σ 2 ) (5) 

D ∗ = 1 + X1 + X2 + X1X2 − ǫ, ǫ ∼ N(0, σ 2 ) (6) 

D ∗ = (1 + X1 + X2) 2 − ǫ, ǫ ∼ N(0, σ 2 ) (7) 

D ∗ = 1 + X1 + X2 − ǫ, ǫ ∼ χ 2 1 

D ∗ = 1 + X1 + X2 − ǫ, ǫ ∼ U(−1, 1) (9) 

For each of these data generating processes, D = 1[D ∗ > 0] and ǫ ⊥ (X1, X2). 

10 

(8)

Throughout the simulations presented below, we use the normal kernel given by 

K(u) = 1 

 

−u2 √ exp . 

2π 2 

1 

− The bandwidth hn is chosen to be equal to cn 5 . In our simulations, we report results for values 

of c equal to 0.05, 0.10, and 0.15. We consider samples sizes n equal to 100, 200, 400, 500, 800, 

and 1000. The number of replications for each simulation is always 1000. The bandwidth is 

chosen to fulfill the requirements from Theorem 4.1. 5 

We first examine the size of our test. The data is generated according to (5). We use our 

procedure to test the null hypothesis 

H0: ∃β0 ∈ R 3 s.t. Pr[E[D|Q(X, β0)] = Q(X, β0)] = 1 

H1: Pr[E[D|Q(X, β)] = Q(X, β)] < 1 ∀β ∈R 3 , 

where Q(X, β) = Φ(β0+β1X1+β2X2). These results are summarized in Table 1. Our principle 

finding is that the actual size of the test is conservative and sensitive to the choice of c, but, 

as expected, the actual size approaches the asymptotic size as the sample size increases. 

We now go on to consider the power of our test against certain misspecifications of the 

propensity score. In Tables 2 - 5, we consider four different scenarios in which the true data 

generating process is given by (6) - (9), respectively. For each scenario, the null hypothesis is 

the same as the one above. The test performs admirably when the true data generating process 

is given by (6) - (8), showing high power for even moderately sized samples. Recall that (6) 

is precisely that of our heuristic from Lemma 3.1 presented in Figure 1. Given the noticeable 

departure of the two graphs in Figure 1, it is not surprising that our test performs well. The 

test performs less well, however, when the true data generating process is given by (9), which 

is again consonant with our earlier findings in Figure 2. This outcome is to be expected due to 

the shared features of the distributions for ǫ. On the other hand, as evidenced by Table 4, the 

test performs much better when the distribution is assumed to be χ2 1 , which is not symmetric 

about zero. 

We consider next a scenario in which the true data generating process is (5) and the null 

hypothesis is the same as the one above, but now Q(X, β) = β0 + β1X1 + β2X2. Naturally, we 

would expect any sensible test to have high power against such alternatives. Q(X, β) is not 

5 − The bandwidth employed in [18], c·n 2 

5 does not fulfill the assumptions required by theorem 

1 

− 4.1. However, using this bandwidth instead of c · n 5 does not change the conclusion from the 

Monte Carlo analysis presented below. Results are available on request. 

11

even guaranteed to be between 0 and 1. Fortunately, we find that our test has high power for 

even very small sample sizes in this situation. These results are summarized in Table 6. 

Finally, we compare our test with the test proposed by Zheng [18]. First, we consider the 

same setup of Table 2 above, where the data generating process is given by (6). The results are 

presented in Table 7. As noted earlier, Zheng’s test suffers from a curse of dimensionality and 

so we might expect that our test would perform better in finite samples for many alternatives. 

Indeed, this feature is borne out by our Monte Carlo study: our test rejects the null hypothesis 

much more frequently for nearly all samples sizes. To investigate the severity of this problem, 

we consider a further model which includes an additional covariate. Define 

X3 = Z1 + Z3 

√ 

2 

where Z3 is a standard normal random variable independent of Z1 and Z2. The data generating 

process for D ∗ in this instance is given by 

D ∗ = 1 + X1 + X2 + X1X2 + X3 − ǫ, ǫ ∼ N(0, σ 2 ) , 

where ǫ ⊥ (X1, X2, X3). The null hypothesis is as before but now Q(X, β) = Φ(β0 + β1X1 + 

β2X2 + β3X3). The results of this comparison are presented in Tables 8 and 9. In this case, 

with the additional covariate, we find that our test retains high power at all sample sizes, but 

the highest power of Zheng’s test is approximately 20 percent and often much lower. Thus, we 

believe that our test will in practice be of much greater use than Zheng’s, especially when X 

is high dimensional. Of course, it should be emphasized again that this advantage of our test 

comes at the expense of power against certain alternatives. 

[TABLE 1 ABOUT HERE] 







12 

,

5 Conclusion 



In this paper, we have shown that the commonly computed estimates of the densities of the 

propensity score conditional on participation provide a means of examining whether the para- 

metric model is correctly specified. In particular, correct specification of the propensity score 

implies a certain restriction between the estimated densities. We have shown further that this 

restriction is equivalent to an orthogonality restriction, which can be used as the basis of a 

formal test for correct specification. While our test does not have power against all forms of 

misspecification of the propensity score, we argue that for a large class of alternatives our test 

will perform better in finite samples than existing tests that have power against all forms of 

misspecification. Our Monte Carlo study of the finite sample behavior of our test corroborates 

this hypothesis. The study shows further that our test performs well against many types of 

misspecification. Since our test is also easily implemented, it is our hope that this work will 

persuade researchers to examine the specification of their model for the propensity score, as 

its validity is essential for consistency of their estimates of the treatment effect. 

A Proof of Theorem 4.1: 

We start by proving the first claim, namely, that 

n hn ˆVn d → N(0, Σ). (10) 

We start proving this result by first showing that 

 

ˆVn − Vn0 = oP (1), (11) 

where 

ˆVn := 

with ˆεi = Di − Q(Xi, ˆ θ), and 

Vn0 := 

1 

n(n − 1) 

1 

n(n − 1) 

nh 1/2 

n 

n 

 

1 Q(Xi, 

K 

ˆ θ) − Q(Xj, ˆ 

θ) 

ˆεiˆεj, 

h 

hn 

i=1 j=i 

n 

 

1 Q(Xi, θ0) − Q(Xj, θ0) 

K 

hn 

i=1 j=i 

13 

hn 

εi0εj0,

with εi0 = Di − Q(Xi, θ0). Then we use the analysis in [18] to argue that 

so that combining (11) and (12) gives us (10). 

n hnVn0 d → N(0, Σ), (12) 

To prove (11) we are going to rely on Lemma 3 given on page 286 of Heckman, Ichimura, 

and Todd [6]. To use that lemma, we define 6 

Θn := {θ 

 

∈ Θ : an||θ − θ0|| ≤ ǫθ} 

Ψn := 

ψn0(Di,Xi,Dj,Xj) := 

ˆψn(Di,Xi,Dj,Xj) := 

so that n √ hn ˆVn = 

i 

ψn : ψn(Di, Xi, Dj, Xj) = εiεj 

(n−1) √ hn 

Q(Xi,θ0)−Q(Xj,θ0) 

εi0εj0 

(n−1) √ hn K 

ˆεi ˆεj 

(n−1) √ hn K 

hn 

Q(Xi, ˆ θ)−Q(Xj, ˆ θ) 

hn 

j=i ˆ ψn(i, j) and n √ hnVn0 = 

will verify two conditions: (i) that the U-process 

i 

 

i 

K 

Q(Xi,θ)−Q(Xj,θ) 

hn 

 

, θ ∈ Θn 

 

j=i ψn0(i, j). To show (11), we 

 

j=i ψn(i, j) is equicontinuous over Ψn 

in a neighborhood of ψn0, and (ii) that, with probability approaching to 1, ˆ ψn lies in that 

neighborhood. 

To show equicontinuity, we will rely on the equicontinuity lemma which requires a sym- 

metric and degenerate process. The process 

i 

 

j=i ψn(i, j) is symmetric, if the kernel is 

symmetric, but it is not degenerate. By repeatedly adding and subtracting certain terms, 

however, we see that 

i,j ψn(i, j) equals 

 

 

εiεj 

(n − 1) √ 

Q(Xi, θ) − Q(Xj, θ) 

εiεj 

K 

− E 

hn 

hn 

(n − 1) √ 


K 

|Dj, Xj 

hn 

hn 

i 

+ n√ hn 

n−1 

j=i 

− 

j 

+ 

j 

 

εi0εj0 j E hn K 

n √ hnεj0 

n − 1 E 

Q(Xi ,θ)−Q(X j ,θ) 

hn 

hn 

 

|Dj,Xj 

 

 

εi0 

−[Q(Xj,θ)−Q(Xj,θ0)]E hn K 

hn 

Q(Xi ,θ)−Q(X j ,θ) 

 

[Q(Xi, θ) − Q(Xi, θ0)] Q(Xi, θ) − Q(Xj, θ) 

K 

|Dj, Xj 

n √ hn[Q(Xj, θ) − Q(Xj, θ0)] 

E 

n − 1 

hn 

hn 

hn 

 

|Dj,Xj 

 

(13) 

(14) 

 

[Q(Xi, θ) − Q(Xi, θ0)] Q(Xi, θ) − Q(Xj, θ) 

K 

|Dj, Xj 

By the law of iterated expectations, the expression on the second line is zero. Furthermore, 

the expressions in (13) and (14) are respectively second and first order symmetric degenerate 

U-processes, thus, their asymptotic properties can be analyzed using Lemma 3 of Heckman, 

6 an is an increasing sequence of numbers that satisfies certain conditions. 

14 

(15)

Ichimura, and Todd [6]. In addition, the expression in (15) can be further broken into 

n √ hn 

n − 1 

 

 

[Q(Xj, θ) − Q(Xj, θ0)][Q(Xi, θ) − Q(Xi, θ0)] Q(Xi, θ) − Q(Xj, θ) 

E 

K 

|Dj, Xj 

j 

hn 

 


−E 

K 

hn 

(16) 

+n 


hnE 

K 

. 

hn 

hn 

(17) 

(16) is a degenerate order 1 process, and hence could be analyzed using the same lemma. We 

study (17) at the end. 

Our first step is verifying the conditions of the equicontinuity lemma for {ψn(i, j) − 

2 

E[ψn(i, j)|j]}. Note that since |D − Q(x, θ)| ≤ 1 for all x, θ, 

(n−1) √ 

Q(Xi,θ)−Q(Xj,θ) 

K 

is 

hn 

hn 

an envelope for this process. Moreover, 

 

lim sup 

4 

 

E K 

n→∞ 

2 

 

Q(Xi, θ)− Q(Xj, θ) 

= lim sup 

 


. 

n→∞ 

i 

j 

(n − 1) 2 hn 

In addition, 

 

1 

E K 

hn 

2 

 


= 

hn 

1 

 

E K 

hn 

2 

 


hn 

 

1 Q(Xi, θ0)−Q(Xj, θ0) 

+E 

. 

K 

hn 

2 

hn 

hn 

hn 

 

4 

E K 

hn 

2 

−K 2 

hn 

hn 

 

Q(Xi, θ0)−Q(Xj, θ0) 

Since K and Q are Lipschitz, and K is bounded, for some constant C, the first expression is 

bounded by C/(anh 2 n ), which has a finite limit. Moreover, since Q(X, θ0) is assumed to have 

a Lebesgue density, denoted by fQ, we have 

 

Q(Xi, θ0)− Q(Xj, θ0) 

= 

 

1 

E K 

hn 

2 

hn 

hn 

K 2 (u)fQ(Q(Xj, θ0) + uhn)dufQ(Q(Xj, θ0))dQ. 

But boundedness of K implies that this last expression has a finite limit as well. Thus, the first 

condition of the equicontinuity lemma is satisfied for our process. To 

verify the second condi- 

1 

tion, we note that for each δ > 0, if the kernel function is bounded, 1 

(n−1) √ hn K hn 

equals 1 for only finitely many n because n √ hn → ∞. On the other hand, by the dominated 

convergence theorem, we have 

 

limn→∞ E 1 

= E 

 

limn→∞ 

(n−1) 2 

 

i 

i 

 

j 

 

j 1 

hn K2 

1 

(n−1) 2 hn K2 


hn 


hn 

15 

 

1 1 

(n−1) √ hn K 

 

1 1 

(n−1) √ hn K 


hn 


hn 


 

 

> δ 

 

> δ 

= 0. 

 

 

> δ

Thus, the second condition of the equicontinuity lemma is also satisfied. 

Next, we verify the same conditions for the process given in (14). First, we observe that 

we can rewrite each term of that summation as 

n √ hnεj0 

E 

an(n − 1)hn 

 

an[Q(Xi, θ) − Q(Xi, θ0)]K 

 


|Xj . 

Since Q is Lipschitz with Lipschitz constant LQ, each of these terms is bounded by 

n √ hn|εj0|LQǫθ 

an(n − 1) E 

 

1 

 

hn 

K 

Q(Xi, θ) − Q(Xj, θ) 

|Xj . 

hn 

Then using the Lipschitz continuity of K and Q, we have 


an(n − 1) E 

 

1 

 

hn 

K 

Q(Xi, θ) − Q(Xj, θ) 

|Xj 

hn 

= n|εj0|LQǫθ 

an(n − 1) √ 

Q(Xi, θ) − Q(Xj, θ) Q(Xi, θ0) − Q(Xj, θ0) 

E K 

−K 

|Xj + 

hn 

hn 

hn 

n|εj0|LQǫθ 

an(n − 1) √ 

Q(Xi, θ0) − Q(Xj, θ0) 

E K 

|Xj 

hn 

hn 

nC 

≤ 

a2 nh 3/2 

n (n − 1) |εj0| + n√hn|εj0|LQǫθ an(n − 1) E 

 

1 

 

hn 

K 

Q(Xi, θ0) − Q(Xj, θ0) 

|Xj , 

hn 

where C = 2LKL2 gǫ2 θ . Then using a change of variables operation, we see that the second term 

in the last expression equals 


an(n − 1) 

 

hn 

 

|K (u)| fQ Q(Xj, θ0) + uhn du. 

Since K is bounded the integrals in this last expression are all finite. Therefore, 

n √ hn 

an(n − 1) |εj0| 

 

1 

 

, 

anh2 C + C1 

n 

where C1 is the appropriate constant, is an envelope for (14). To verify the first condition of 

the equicontinuity lemma, consider 

n 

lim sup 

n→∞ 

j 

2hn a2 n(n − 1) 2 E ε 2 

j0 

1 

anh2 2 nhn 

C + C1 ≤ lim sup 

n 

n→∞ a2 

1 

n anh2 2 C + C1 . 

n 

Sufficient conditions for this to be finite are that limn→∞ nhna −2 

n < ∞, and limn→∞ 1 

anh 2 n 

< ∞, 

which are implied by our assumptions. Thus the first condition of the equicontinuity lemma 

is satisfied. 

16

To check the second condition of the lemma, let δ > 0, 

 

limn→∞ j 

n 2 hn 

a2 n (n−1)2 

 

1 

anh2 2 

C+C1 E ε 

n 

2 j01 =limn→∞ hn 

a2 

1 

n anh2 C+C1 

n 

≤limn→∞ hn 

a 2 n 

 

1 

anh2 C+C1 

n 

2 

j E 

 

ε2 j01 2 

j E 

 

n √ hn|εj0 | 

an(n−1) 

n √ hn|εj0 | 

an(n−1) 

ε2 j01 √ 

n hn 

an(n−1) 

 

1 

anh2 

C+C1 >δ 

n 

 

1 

anh2 C+C1 

n 

 

1 

anh2 C+C1 

n 

 

>δ 

 

>δ , 

where we use the fact that |εj0| ≤ 1 to write the last inequality. Note that limn→∞[1/(anh2 n )] < 

∞ implies that 1/(a2 nh 3/2 

n ) → 0. This in turn implies that for each δ > 0 only finitely many 

terms in the last summation will be non-zero, and therefore the second condition of the lemma 

is satisfied as well. 

On the other hand, the same arguments show that the same conditions are satisfied for the 

process 

 

 

n[Q(Xj, θ) − Q(Xj, θ0)] Q(Xi, θ) − Q(Xi, θ0) Q(Xi, θ) − Q(Xj, θ) 

E √ K 

|Xj 

n − 1 

hn 

hn 

j 

 

n[Q(Xj, θ) − Q(Xj, θ0)][Q(Xi, θ) − Q(Xi, θ0)] 

− E 

(n − 1) √ 


K 

hn 

hn 

To verify the third condition note that since Θ is a compact subset of R l , its L 2 packing 

number, D2(τ, Θ), is less than or equal to 2lτ −l . If K(·) is bounded and Lipschitz, Q(·, ·) is 

Lipschitz, and n−1h −3/2 

n = O(1), then D2(τ, Ψn) ≤ C · τ −l . Thus, the third condition of the 

lemma is also satisfied. 

We know that ψn0 ∈ Ψn. In addition, if 

√ n( ˆ θ − θ0) d → N(0, Σθ), 

then for any {an} sequence such that an/ √ n → 0, we will have an( ˆ θ − θ0) P → 0, and ˆ ψn will 

belong to Ψn with probability approaching to 1. 

 

 

 

 

j 

The arguments given so far allow us to argue that 

 

 

 

ˆψn(i, j) − E[ 

i,j 

ˆ 

 

ψn(i, j)|j] − ψn0(i, j) − E[ψn0 (i, j)|j] 

 

 

=oP (1), 

 

 

 

√ 

hnεj0 

 

 

n − 1 E 

 

(εi0 − ˆεi) 1 

 

Q(Xi, 

K 

hn 

ˆ θ) − Q(Xj, ˆ 

θ) 

|Xj 

hn 

 

 

=oP (1), and 

n √ hn(εj0 −ˆε j ) 

n−1 

E 

(εi0 −ˆε i ) 

hn 

i,j 

K 

Q(Xi , ˆ θ)−Q(X j , ˆ θ) 

hn 

 

|Xj 

 

−E 

17 

n(εj0 −ε j )(ε i0 −ˆε i ) 

(n−1) √ hn 

K 

Q(Xi , ˆ θ)−Q(X j , ˆ θ) 

hn 

=oP (1).

The last thing we need to show is that 

lim 

n→∞ E 

 

 

n(εj0 − ˆεj)(εi0 − ˆεi) Q(Xi, 

√ K 

hn 

ˆ θ) − Q(Xj, ˆ 

θ) 

= 0. 

hn 

But this expression equals 

 

an[Q(Xj, ˆ θ) − Q(Xj, θ0)]an[Q(Xi, ˆ 

θ) − Q(Xi, θ0)] Q(Xi, 

K 

ˆ θ) − Q(Xj, ˆ 

θ) 

. 

n √ hn 

a2 E 

n 

Since n√ hn 

a 2 n 

lim 

n→∞ E 

hn 

= O(1), the above expression will go to 0 if we can show that 

 


θ) − Q(Xi, θ0)] Q(Xi, 

K 

ˆ θ) − Q(Xj, ˆ 

θ) 

= 0. 

hn 

To show (18) we will first argue that 

hn 

hn 

(18) 

an|| ˆ θ − θ0|| → 0 a.s. (19) 

Then using this almost sure convergence result and the Lipshitz continuity of Q and K we are 

going to argue that 


θ) − Q(Xi, θ0)] Q(Xi, 

K 

ˆ θ) − Q(Xj, ˆ 

θ) 

hn 

is almost surely bounded by an integrable function. Then the result will follow from Theorem 

1.3.6 of [14]. 

Since our estimator ˆ θ for θ0 is defined as the minimizer of 

ˆϕn(θ) := 1 

n 

n 

[Dj − Q(Xj, θ)] 2 

j=1 

over a compact parameter space Θ, the first step towards showing (19) is to show that 

where 

1 

a −1 

n 

sup | ˆϕn(θ) − ϕ0(θ)| → 0 a.s., (20) 

θ∈Θ 

ϕ0(θ) := E[(Dj − Q(Xj, θ)) 2 ]. 

To show (20) we rely on Theorem 37 from Pollard [10]. We apply that theorem on the following 

family of functions 

Φn := {(D − Q(x, θ)) 2 /M : θ ∈ Θ} 

18 

hn

where M = sup θ∈Θ[D−Q(x, θ)] 2 . Since Θ is a compact subset of a finite dimensional Euclidean 

space, the covering number condition needed to apply the theorem is satisfied by Theorem 4.8 

of Pollard [11], and applying Theorem 37 of Pollard [10] with δn = M and αn = a −1 

n 

the desired result (20) as long as nM2a −2 

n 

log n 

= M 2 na −2 

n 

log n 

gives us 

→ ∞. Then, a small modification of the 

consistency proof in [9] yields an|| ˆ θ − θ0|| → 0 a.s.. This in turn implies that for ǫθ > 0 and 

sufficiently large n, we almost surely have 

 

 

an[Q(Xj, 

 

 

ˆ θ) − Q(Xj, θ0)]an[Q(Xi, ˆ 

θ) − Q(Xi, θ0)] Q(Xi, 

K 

hn 

ˆ θ) − Q(Xj, ˆ 

θ) 

 

hn 

 

 

 

 

K 

 

Q(Xi, ˆ θ) − Q(Xj, ˆ 

θ) 

 

≤ ǫ3 θLKLQ 

 

 

K 

Q(Xi, θ0) − Q(Xj, θ0) 

. 

≤ ǫ2 θ 

hn 

Since lim 1 

anh 2 n 

[14]. Thus, 

hn 

< ∞ and E 1 

 

 

K hn 

anh 2 n 

Q(Xi,θ0)−Q(Xj,θ0) 

hn 

+ ǫ2 θ 

hn 

n 

hn 

ˆVn − Vn0 = oP (1). 

hn 

< ∞, (18) follows from Theorem 1.3.6 of 

Combining this with Lemma 3.3a of [18] completes the proof for the first part of Theorem 4.1. 

Next we argue that Σ = 2 K2 (u)du [E(D2 i |Qi0)] 2f 2 Q (Qi0)dQi0 

 

, and Σ can be consistently 

estimated by 

n 2 1 

K 

n(n − 1) 

2 

 


θ) 

ˆε 2 i ˆε 2 j. 

hn 

i=1 j=n 

We argue this in two steps again. First, note that if K is symmetric, bounded and Lipschitz, 

then so is K2 . Thus, under the same conditions as before the analysis above yields that, 

n 

 

2 1 

K 

n(n − 1) 

2 

 


θ) 

ˆε 2 i ˆε 2 j −K 2 

 

Q(Xi, θ0) − Q(Xj, θ0) 

ε 2 i ε 2 

j = op(1). 

hn 

i=1 j=i 

hn 

Second, the arguments in the proof of Lemma 3.3e of [18] show that 

hn 

i=1 j=i 

hn 

n 2 1 

K 

n(n − 1) 

2 

 

Q(Xi, θ0) − Q(Xj, θ0) 

Combining these gives us the second result in Theorem 4.1. 

References 

hn 

hn 

 

ε 2 i ε 2 j = Σ + oP (1). 

[1] A, A., G. I (2004): “Simple and Bias-Corrected Matching Estimators 

for Average Treatment Effect,” mimeo, UC-Berkeley and Harvard University. 

19 

(21)

[2] B, D. A., J. A. S (2004): “How Robust is the Evidence on the Effects of 

College Quality? Evidence from Matching,” Journal of Econometrics, 121, 99—124. 

[3] D, R. H., S. W (2002): “Propensity Score Matching Methods for Non- 

experimental Causal Studies,” The Review of Economics and Statistics, 84, 151—161. 

[4] H, J. (1998): “On the Role of the Propensity Score in Efficient Semiparametric Esti- 

mation of Average Treatment Effects,” Econometrica, 66, 315—331. 

[5] H, J., H. I, J. S, P. T (1998): “Characterizing Selection 

Bias Using Experimental Data,” Econometrica, 66(5), 1017=1098. 

[6] H, J., H. I, P. T (1998): “Matching as an Econometric Eval- 

uation Estimator,” Review of Economic Studies, 65(2), 261—204. 

[7] H, J., R. LL, J. S (1999): “The Economics and Ecconometrics 

of Active Labor Market Programs,” in Handbook of Labor Economics, ed. by A. Orley, and 

D. Card. Elsevier Science, North Holland, Amsterdam, New York and Oxford. 

[8] L, M. (2002): “Program Heterogeneity and Propensity Score Matching: an Appli- 

cation to the Evaluation of Active Labor Market Policies,” The Review of Economics and 

Statistics, 84, 205—220. 

[9] N, W., D. MF (1994): “Large Sample Estimation and Hypothesis 

Testing,” in Handbook of Econometrics, ed. by R. Engle, and D. McFadden. North-Holland, 

Amsterdam. 

[10] P, D. (1984): Convergence of Stochastic Processes. Springer-Verlag, New York. 

[11] (1990): “Empirical Processes: Theory and Applications,” in NSF-CBMS Re- 

gional Conference Series in Probability and Statistics, vol. 2. Institute of Mathematical 

Statistics, Hayward, American Statistical Association, Alexandria. 

[12] R, P. R., D. B. R (1985): “Constructing a Control Group Us- 

ing Multivariate Matched Sampling Methods That Incorporate the Propensity Score,” The 

American Statistician, 39, 33—38. 

[13] R, R., D. B. R (1983): “The Central Role of the Propensity Score 

in Observational Studies for Causal Effects,” Biometrika, 70, 41—55. 

[14] S, R. J. (1980): Approximation Theory of Mathematical Statistics. Wiley, Hobo- 

ken, New Jersey. 

20

[15] S, B. (2004): “An Evaluation of the Swedish System of Active Labor Market 

Programs in the 1990s,” The Review of Economics and Statistics, 86, 133—155. 

[16] S, B. W. (1986): Density Estimation for Statistics and Data Analysis. Chap- 

man and Hall/CRC. 

[17] S, J. A., P. E. T (2005): “Rejoinder,” Journal of Econometrics, 125, 

365—375. 

[18] Z, J. X. (1996): “A Consistent Test of Functional Form via Nonparametric Esti- 

mation Techniques,” Journal of Econometrics, 75(2), 263—289. 

21

FIGURE 1 

E D P S 

Note: ––––- ˆαn p 

1−p ˆ f0,n(p), - - - - - - - - ˆ f1,n(p) 

DGP: D ∗ = 1 + X1 + X2 + X1X2 − ǫi, ǫ ∼ N(0, σ 2 ), 

Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2) 

Density estimated using the biweight kernel and 

Silverman’s rule of thumb, see Silverman[16]

FIGURE 2 

E D P S 

Note: ––––- ˆαn p 

1−p ˆ f0,n(p), - - - - - - - - ˆ f1,n(p) 

DGP: D ∗ = 1 + X1 + X2 + −ǫi, ǫ ∼ U(−1, 1), 

Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2) 

Density estimated using the biweight kernel and 

Silverman’s rule of thumb, see Silverman[16]

TABLE 1: S 

c=0.05 

Proportion of Rejections 

1% level 5% level 10% level 

n=100 0.011 0.046 0.089 

n=200 0.006 0.037 0.082 

n=400 0.003 0.031 0.081 

n=500 0.006 0.044 0.101 

n=800 0.006 0.040 0.088 

n=1000 

c=0.10 

0.005 0.038 0.086 

n=100 0.006 0.043 0.101 

n=200 0.002 0.024 0.070 

n=400 0.004 0.021 0.060 

n=500 0.004 0.027 0.074 

n=800 0.004 0.035 0.085 

n=1000 

c=0.15 

0.005 0.022 0.070 

n=100 0.005 0.027 0.079 

n=200 0.003 0.011 0.057 

n=400 0.003 0.008 0.049 

n=500 0.004 0.012 0.061 

n=800 0.008 0.024 0.070 

n=1000 0.005 0.019 0.053 

DGP: D ∗ = 1 + X1 + X2 − ǫi, ǫ ∼ N(0, σ 2 ), 

Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2)

TABLE 2: P 

c=0.05 



n=100 0.061 0.146 0.218 

n=200 0.273 0.432 0.518 

n=400 0.787 0.886 0.918 

n=500 0.916 0.964 0.975 

n=800 0.997 0.999 0.999 

n=1000 

c=0.10 

1.000 1.000 1.000 

n=100 0.094 0.189 0.237 

n=200 0.417 0.559 0.618 

n=400 0.910 0.962 0.978 

n=500 0.977 0.990 0.993 

n=800 1.000 1.000 1.000 

n=1000 

c=0.15 

1.000 1.000 1.000 

n=100 0.101 0.185 0.241 

n=200 0.459 0.597 0.664 

n=400 0.948 0.976 0.984 

n=500 0.984 0.993 0.996 

n=800 1.000 1.000 1.000 

n=1000 1.000 1.000 1.000 

DGP: D ∗ = 1 + X1 + X2 + X1X2 − ǫi, ǫ ∼ N(0, σ 2 ), 


TABLE 3: P 

c=0.05 



n=100 0.358 0.479 0.537 

n=200 0.818 0.874 0.896 

n=400 0.991 0.993 0.994 

n=500 1.000 1.000 1.000 

n=800 1.000 1.000 1.000 

n=1000 

c=0.10 

1.000 1.000 1.000 

n=100 0.290 0.401 0.483 

n=200 0.821 0.876 0.902 

n=400 0.993 0.994 0.995 

n=500 1.000 1.000 1.000 

n=800 1.000 1.000 1.000 

n=1000 

c=0.15 

1.000 1.000 1.000 

n=100 0.193 0.283 0.343 

n=200 0.737 0.814 0.840 

n=400 0.990 0.994 0.994 

n=500 1.000 1.000 1.000 

n=800 1.000 1.000 1.000 

n=1000 1.000 1.000 1.000 

DGP: D ∗ = (1 + X1 + X2) 2 − ǫi, ǫ ∼ N(0, σ 2 ), 


TABLE 4: P 

c=0.05 



n=100 0.023 0.082 0.150 

n=200 0.074 0.169 0.232 

n=400 0.321 0.490 0.587 

n=500 0.440 0.621 0.698 

n=800 0.838 0.917 0.942 

n=1000 

c=0.10 

0.929 0.977 0.986 

n=100 0.037 0.076 0.134 

n=200 0.131 0.242 0.303 

n=400 0.457 0.620 0.691 

n=500 0.642 0.776 0.848 

n=800 0.935 0.969 0.980 

n=1000 

c=0.15 

0.986 0.997 0.997 

n=100 0.039 0.076 0.124 

n=200 0.162 0.271 0.331 

n=400 0.557 0.689 0.751 

n=500 0.707 0.825 0.877 

n=800 0.957 0.978 0.986 

n=1000 0.993 0.998 0.998 

DGP: D ∗ = 1 + X1 + X2 − ǫi, ǫ ∼ χ 2 1 , 

Estimated model: Q(X, β) = Φ(β 0 +β 1 X1+β 2 X2)

TABLE 5: P 

c=0.05 



n=100 0.008 0.057 0.127 

n=200 0.007 0.032 0.078 

n=400 0.005 0.048 0.091 

n=500 0.009 0.045 0.087 

n=800 0.021 0.073 0.126 

n=1000 

c=0.10 

0.033 0.094 0.147 

n=100 0.008 0.048 0.117 

n=200 0.005 0.021 0.063 

n=400 0.009 0.039 0.074 

n=500 0.013 0.035 0.071 

n=800 0.035 0.075 0.124 

n=1000 

c=0.15 

0.048 0.114 0.169 

n=100 0.005 0.038 0.107 

n=200 0.007 0.021 0.057 

n=400 0.009 0.034 0.066 

n=500 0.014 0.035 0.074 

n=800 0.036 0.081 0.135 

n=1000 0.056 0.110 0.176 

DGP: D ∗ = 1 + X1 + X2 − ǫi, ǫ ∼ U(−1, 1) 

Estimated model: Q(X, β) = Φ(β 0 +β 1 X1+β 2 X2)

TABLE 6: P 

c=0.05 



n=100 0.218 0.337 0.426 

n=200 0.637 0.781 0.826 

n=400 0.984 0.993 0.997 

n=500 1.000 1.000 1.000 

n=800 1.000 1.000 1.000 

n=1000 

c=0.10 

1.000 1.000 1.000 

n=100 0.371 0.505 0.592 

n=200 0.813 0.904 0.936 

n=400 0.998 1.000 1.000 

n=500 1.000 1.000 1.000 

n=800 1.000 1.000 1.000 

n=1000 

c=0.15 

1.000 1.000 1.000 

n=100 0.449 0.584 0.662 

n=200 0.891 0.952 0.970 

n=400 1.000 1.000 1.000 

n=500 1.000 1.000 1.000 

n=800 1.000 1.000 1.000 

n=1000 1.000 1.000 1.000 

DGP: D ∗ = 1 + X1 + X2 − ǫi, ǫ ∼ N(0, σ 2 ), 

Estimated model: Q(X, β) = β 0 +β 1X1+β 2X2

TABLE 7: P, Z[18] 

c=0.05 



n=100 0.000 0.011 0.053 

n=200 0.001 0.016 0.070 

n=400 0.005 0.033 0.079 

n=500 0.005 0.038 0.106 

n=800 0.012 0.059 0.125 

n=1000 

c=0.10 

0.009 0.067 0.131 

n=100 0.002 0.030 0.085 

n=200 0.005 0.052 0.110 

n=400 0.011 0.072 0.126 

n=500 0.014 0.082 0.157 

n=800 0.041 0.132 0.210 

n=1000 

c=0.15 

0.063 0.175 0.294 

n=100 0.002 0.046 0.106 

n=200 0.008 0.061 0.115 

n=400 0.035 0.109 0.176 

n=500 0.043 0.141 0.227 

n=800 0.107 0.271 0.370 

n=1000 0.166 0.373 0.494 

DGP: D ∗ = 1 + X1 + X2 + X1X2 − ǫi, ǫ ∼ N(0, σ 2 ) 


TABLE 8: P 

c=0.05 



n=100 0.034 0.079 0.127 

n=200 0.049 0.106 0.163 

n=400 0.162 0.288 0.364 

n=500 0.229 0.377 0.448 

n=800 0.495 0.620 0.696 

n=1000 

c=0.10 

0.594 0.751 0.807 

n=100 0.038 0.087 0.134 

n=200 0.081 0.143 0.190 

n=400 0.274 0.387 0.465 

n=500 0.356 0.480 0.560 

n=800 0.637 0.744 0.797 

n=1000 

c=0.15 

0.768 0.857 0.895 

n=100 0.042 0.078 0.128 

n=200 0.102 0.170 0.217 

n=400 0.320 0.448 0.508 

n=500 0.415 0.550 0.619 

n=800 0.700 0.789 0.837 

n=1000 0.829 0.895 0.924 

DGP: D ∗ = 1 + X1 + X2 + X1X2 + X3 − ǫi, ǫ ∼ N(0, σ 2 ) 

Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2 + β3X3)

TABLE 9: P, Z[18] 

c=0.05 



n=100 0.000 0.003 0.009 

n=200 0.000 0.002 0.009 

n=400 0.000 0.002 0.015 

n=500 0.000 0.003 0.021 

n=800 0.000 0.013 0.041 

n=1000 

c=0.10 

0.000 0.012 0.052 

n=100 0.000 0.000 0.014 

n=200 0.000 0.009 0.044 

n=400 0.003 0.018 0.064 

n=500 0.001 0.019 0.079 

n=800 0.001 0.034 0.092 

n=1000 

c=0.15 

0.001 0.041 0.114 

n=100 0.001 0.011 0.041 

n=200 0.002 0.021 0.071 

n=400 0.002 0.036 0.096 

n=500 0.003 0.037 0.089 

n=800 0.003 0.055 0.110 

n=1000 0.012 0.064 0.128 

DGP: D ∗ = 1 + X1 + X2 + X1X2 + X3 − ǫi, ǫ ∼ N(0, σ 2 ) 

Estimated model: Q(X, β) = Φ(β 0 +β 1X1+β 2X2 + β3X3)

On the Identification of Misspecified Propensity Scores - School of ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?