On the Identification of Misspecified Propensity Scores - School of ...

More documents

Recommendations

Info

(a) n √ hn → ∞, (nh 3/2 n ) −1 → c < ∞; (b) 1 anh 2 n → a < ∞, na−2 n log n → ∞, n√hn a2 n → b < ∞. Let ˆ θn be a √ n-consistent estimator of arg minθ∈Θ E[D − Q(X, θ)] 2 , where Θ is a compact and convex subset of R l . Then, hn i=1 j=n n hn ˆVn d → N(0, Σ) where Σ can be consistently estimated by n 2 n(n − 1) 1 K 2 Q(Xi, ˆ θn) − Q(Xj, ˆ θn) P: See Appendix A. Assumptions (i)-(iii) are almost the same as Assumption 1 and 5 of [18] adapted to our problem. The only difference is that here we require that Q and K are Lipschitz con- tinuous functions. In addition, the requirement that ˆ θn be a √ n-consistent estimator of arg minθ∈Θ E[D − Q(X, θ)] 2 is guaranteed by Assumptions 2-4 in [18]. The extra assumptions on the bandwidth sequence are there because in our analysis Q(x, ˆ θ) appears in the argument of the kernel function as well. As noted earlier, there are alternatives against which Zheng’s test has power tending to 1, but our test does not. However, Zheng’s procedure requires estimation of expectations conditional on X, which in practice is high dimensional, and therefore suffers from a curse of dimensionality. Our procedure only requires estimation of expectations conditional on Q and thus avoids this difficulty. As a result, we would expect our test to perform noticeably better in finite samples against many alternatives of interest. We now examine the finite sample properties of this testing procedure in a limited Monte Carlo study. Our setup will follow Zheng [18] closely. As before, let X1 and X2 be given by (1a) and (1b). We define now several different data generating processes for D ∗ that will be used at different points in our Monte Carlo study. hn ˆε 2 i ˆε 2 j D ∗ = 1 + X1 + X2 − ǫ, ǫ ∼ N(0, σ 2 ) (5) D ∗ = 1 + X1 + X2 + X1X2 − ǫ, ǫ ∼ N(0, σ 2 ) (6) D ∗ = (1 + X1 + X2) 2 − ǫ, ǫ ∼ N(0, σ 2 ) (7) D ∗ = 1 + X1 + X2 − ǫ, ǫ ∼ χ 2 1 D ∗ = 1 + X1 + X2 − ǫ, ǫ ∼ U(−1, 1) (9) For each of these data generating processes, D = 1[D ∗ > 0] and ǫ ⊥ (X1, X2). 10 (8)
Throughout the simulations presented below, we use the normal kernel given by K(u) = 1 −u2 √ exp . 2π 2 1 − The bandwidth hn is chosen to be equal to cn 5 . In our simulations, we report results for values of c equal to 0.05, 0.10, and 0.15. We consider samples sizes n equal to 100, 200, 400, 500, 800, and 1000. The number of replications for each simulation is always 1000. The bandwidth is chosen to fulfill the requirements from Theorem 4.1. 5 We first examine the size of our test. The data is generated according to (5). We use our procedure to test the null hypothesis H0: ∃β0 ∈ R 3 s.t. Pr[E[D|Q(X, β0)] = Q(X, β0)] = 1 H1: Pr[E[D|Q(X, β)] = Q(X, β)] < 1 ∀β ∈R 3 , where Q(X, β) = Φ(β0+β1X1+β2X2). These results are summarized in Table 1. Our principle finding is that the actual size of the test is conservative and sensitive to the choice of c, but, as expected, the actual size approaches the asymptotic size as the sample size increases. We now go on to consider the power of our test against certain misspecifications of the propensity score. In Tables 2 - 5, we consider four different scenarios in which the true data generating process is given by (6) - (9), respectively. For each scenario, the null hypothesis is the same as the one above. The test performs admirably when the true data generating process is given by (6) - (8), showing high power for even moderately sized samples. Recall that (6) is precisely that of our heuristic from Lemma 3.1 presented in Figure 1. Given the noticeable departure of the two graphs in Figure 1, it is not surprising that our test performs well. The test performs less well, however, when the true data generating process is given by (9), which is again consonant with our earlier findings in Figure 2. This outcome is to be expected due to the shared features of the distributions for ǫ. On the other hand, as evidenced by Table 4, the test performs much better when the distribution is assumed to be χ2 1 , which is not symmetric about zero. We consider next a scenario in which the true data generating process is (5) and the null hypothesis is the same as the one above, but now Q(X, β) = β0 + β1X1 + β2X2. Naturally, we would expect any sensible test to have high power against such alternatives. Q(X, β) is not 5 − The bandwidth employed in [18], c·n 2 5 does not fulfill the assumptions required by theorem 1 − 4.1. However, using this bandwidth instead of c · n 5 does not change the conclusion from the Monte Carlo analysis presented below. Results are available on request. 11
Page 1 and 2: On the Identification of Misspecifi
Page 3 and 4: presented in this paper may be cons
Page 5 and 6: 3 The Restriction In order to ensur
Page 7 and 8: that misspecification is not very s
Page 9: where ˆεi := Yi − Q(Xi, ˆ θn)
Page 13 and 14: 5 Conclusion [TABLE 8 ABOUT HERE] [
Page 15 and 16: Ichimura, and Todd [6]. In addition
Page 17 and 18: To check the second condition of th
Page 19 and 20: where M = sup θ∈Θ[D−Q(x, θ)]
Page 21 and 22: [15] S, B. (2004): “An Evaluation
Page 23 and 24: FIGURE 2 E D P S Note: ----- ˆα
Page 25 and 26: TABLE 2: P c=0.05 Proportion of Rej

On the Identification of Misspecified Propensity Scores - School of ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?