11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

514 Features of Big Data250020001500100050000.5 0 0.5250020001500100050000.5 0 0.5FIGURE 43.3Distributions of sample correlations. Left panel: Distributions of the samplecorrelation ĉorr(X j ,Y)(j =1,..., 47,292). Right panel: Distribution of thesample correlation ĉorr(X j , ˆε), in which ˆε represents the residuals after thelasso fit.How do we deal with endogeneity? Ideally, we hope to be able to selectconsistently S 0 under only the assumption thatY = X ⊤ S 0β S0,0 + ε, E(εX S0 )=0,but this assumption is too weak to recover the set S 0 .AstrongerassumptionisY = X ⊤ S 0β S0,0 + ε, E(ε|X S0 )=0. (43.3)Fan and Liao (2014) use over identification conditions such asE(εX S0 ) = 0 and E(εX 2 S 0)=0 (43.4)to distinguish endogenous and exogenous variables, which are weaker than thecondition in (43.3). They introduce the Focused Generalized Method of Moments(FGMM) which uses the over identification conditions to select consistentlythe set of variables S 0 .Thereaderscanrefertotheirpaperfortechnicaldetails. The left panel of Figure 43.4 shows the distribution of the correlationsbetween the covariates and the residuals after the FGMM fit. Many of thecorrelations are still non-zero, but this is fine, as we assume only (43.4) andmerely need to validate this assumption empirically. For this data set, FGMM

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!