On the Identification of Misspecified Propensity Scores - School of ...

More documents

Recommendations

Info

principle possible to construct a √ n-normal estimator of the parameter of interest without imposing any finite-dimensional restrictions. 1 An alternative approach for estimating E[Y0|D = 1] and E[Y1|D = 0] is propensity score matching, which relies upon a celebrated result of Rosenbaum and Rubin [13]. They show that the assumptions underlying matching conditional on X, A1 and A2, imply: A1*. (Y0, Y1) ⊥ D | P (X). A2*. 0 < Pr[D = 1|P (X) = p] < 1 for all p ∈ supp(P (X)). This result can be restated as follows: if matching on X is valid, so is matching based on the propensity score P (X). This result motivates propensity score matching, in which one first estimates the propensity score in a first step and then perform matching, as described above, using the estimated propensity score. In practice, both matching on X and matching on P (X) suffer from the so-called “curse of dimensionality”. While matching on X requires the researcher to estimate E[Y0|D = 1, X], matching on P (X) requires the researcher to estimate E[D|X], an equally high dimensional object. Thus, if X has more than a few dimensions, nonparametric procedures for estimating these objects are undesirable due to sizable finite sample bias. This difficulty leads to the common practice of implementing propensity score matching with a parametric specification for the propensity score. More specifically, in practice propensity score matching involves estimating a parametric model for the propensity score by maximum likelihood in a first step, and then using kernel regression or some other comparable nonparametric procedure to perform matching on the basis of the estimated propensity scores. 2 If the propensity score is estimated using a parametric model, then it is possible that the model is misspecified and that the resulting estimator of P (X) is inconsistent. Let Q(X) denote the probability limit of the estimator of P (X). If P (X) = Q(X), as will likely be the case if the model for P (X) is misspecified, then in general we cannot expect (Y0, Y1) to be independent (or even mean independent) of D conditional on Q(X). In this case, the propensity score matching will in general provide inconsistent estimates of both E[Y1 −Y0] and E[Y1 − Y0|D = 1]. 1 See Heckman, Ichimura, and Todd [6], Hahn [4], and Abadie and Imbens [1]. 2 Since binary variables are bounded, there is potentially less room for misspecification of a parametric model for binary outcome as compared to a continous one, see [2]. 4
3 The Restriction In order to ensure that the appropriate conditional expectations exist, propensity score matching is only valid over the region of common support of the densities of the propensity score conditional on the participation decision. 3 Following the influential work of Heckman, Ichimura, Smith, and Todd [5], researchers therefore compute estimates of both of these densities to de- termine the appropriate region over which to perform matching. The test for misspecification that we develop in the following section will exploit a restriction that must hold between these two densities. Throughout the following we will assume that the propensity score P (X) has a density with respect to Lebesgue measure. Let f(p) denote this density, f1(p) the density of the propensity score conditional on participation, and f0(p) the density of the propensity score conditional on nonparticipation. Using this notation, we have the following result: Lemma 3.1. Let α = Pr[D=0] Pr[D=1] p ∈ supp(P ) we have that and assume 0 < Pr[D = 0] < 1. Then for all 0 the Law of Iterated Expectations, we have that Bayes’ Theorem implies further that Similarly, we have that Pr[D = 1|P (X) = p] = E[E[D|X]|P (X) = p] = p . f1(p) Pr[D = 1] = Pr[D = 1|P (X) = p]f(p) = pf(p) > 0 . f0(p) Pr[D = 0] = (1 − p)f(p) > 0 . Combining these two implications, we have that f1(p) p = α f0(p) 1 − p , from which the asserted conclusion follows immediately. Lemma 3.1 suggests that to the extent that the parametric model for P (X) is correctly specified, we would expect the stated relationship between the estimates of the conditional densities of the propensity score to hold approximately. The property can be easily checked 3 Propensity score matching is only valid for the common support for the population but in practice, all we can make conclusions about is the common support in the sample. As the sample size gets larger, of course, the two sets of common support converge. 5
Page 1 and 2: On the Identification of Misspecifi
Page 3: presented in this paper may be cons
Page 7 and 8: that misspecification is not very s
Page 9 and 10: where ˆεi := Yi − Q(Xi, ˆ θn)
Page 11 and 12: Throughout the simulations presente
Page 13 and 14: 5 Conclusion [TABLE 8 ABOUT HERE] [
Page 15 and 16: Ichimura, and Todd [6]. In addition
Page 17 and 18: To check the second condition of th
Page 19 and 20: where M = sup θ∈Θ[D−Q(x, θ)]
Page 21 and 22: [15] S, B. (2004): “An Evaluation
Page 23 and 24: FIGURE 2 E D P S Note: ----- ˆα
Page 25 and 26: TABLE 2: P c=0.05 Proportion of Rej

On the Identification of Misspecified Propensity Scores - School of ...

Create successful ePaper yourself

Delete template?

Save as template?