11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

M. van der Laan 47340.4 Super learningThese oracle results for the cross-validation selector teach us that it is possibleto construct an ensemble estimator that asymptotically outperforms any usersupplied library of candidate estimators. We called this estimator the superlearnerdue to its theoretical properties: it is defined as a combination of allthe candidate estimators where the weights defining the combination (e.g.,convex combination) are selected based on cross-validation; see, e.g., van derLaan et al. (2007) and Polley et al. (2012). By using the super-learner as a wayto regularize the MLE, we obtain an estimator with a potentially much betterrate of convergence to the true Q 0 than a simple regularization procedure suchas the one based on binning discussed above. The bias of this super-learnerwill converge to zero at the same rate as the rate of convergence of the superlearner.The bias of the plug-in estimator of ψ 0 based on this super-learner willalso converge at this rate. Unfortunately, if none of our candidate estimatorsin the library achieve the rate 1/ √ n (e.g., a MLE according to a correctlyspecified parametric model), then this bias will be larger than 1/ √ n, so thatthis plug-in estimator will not converge at the desired √ n rate. To conclude,although super-learner has superior performance in estimation of Q 0 ,itstillresults in an overly biased estimator of the target Ψ(Q 0 ).40.4.1 Under-smoothing fails as general methodOur binning discussion above argues that for typical definitions of adaptiveestimators indexed by a fine-tuning parameter (e.g., bandwidth, number ofbasis functions), there is no value of the fine-tuning parameter that wouldresult in a bias for ψ 0 of the order 1/ √ n.Thisisduetothefactthatthefine-tuning parameter needs to exceed a certain value in order to define anestimator in the parameter space of Q 0 .Soevenwhenwewouldhaveselectedthe estimator in our library of candidate estimators that minimizes the MSEwith respect to ψ 0 (instead of the one minimizing the cross-validated risk),then we would still have selected an estimator that is overly biased for ψ 0 .The problem is that our candidate estimators rely on a fine tuning parameterthat controls overall bias of the estimator. Instead we need candidateestimators that have an excellent overall fit of Q 0 but also rely on a tuningparameter that only controls the bias of the resulting plug-in estimator forψ 0 , and we need a way to fit this tuning parameter. For that purpose, weneed to determine a submodel of fluctuations {Q n (ɛ) :ɛ} through a candidateestimator Q n at ɛ = 0, indexed by an amount of fluctuation ɛ, wherefitting ɛ is locally equivalent with fitting ψ 0 in the actual semi-parametricstatistical model M. It appears that the least-favorable submodel from efficiencytheory can be utilized for this purpose, while ɛ can be fitted with theparametric MLE. This insight resulted in so called targeted maximum like-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!