11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

314 Anon-asymptoticwalk28.2.3 Empirical processes to the rescueThe reason for which empirical processes have something to do with the analysisof penalized model selection procedures is roughly the following; see Massart(2007) for further details. Consider the centered empirical risk process¯γ n (t) =γ n (t) − E{γ n (t)}.Minimizing crit(m) isthenequivalenttominimizingcrit(m) − γ n (s) =l(s, ŝ m ) −{¯γ n (s) − ¯γ n (ŝ m )} +pen(m).One can readily see from this formula that to mimic an oracle, the penaltypen(m) should ideally be of the same order of magnitude as ¯γ n (s) − ¯γ n (ŝ m ).Guessing what is the exact order of magnitude for ¯γ n (s) − ¯γ n (ŝ m ) is not aneasy task in general, but one can at least try to compare the fluctuations of¯γ n (s)−¯γ n (ŝ m )tothequantityofinterest,l (s, ŝ m ). To do so, one can introducethe supremum of the weighted process¯γ n (s) − ¯γ n (t)Z m = supt∈S mw {l (s, t)} ,where w is a conveniently chosen non-decreasing weight function. For instanceif w {l (s, t)} =2 √ l (s, t) then,foreveryθ>0,l (s, ŝ m ) −{¯γ n (s) − ¯γ n (ŝ m )}≥(1 − θ) l (s, ŝ m ) − Z 2 m/θ.Thus by choosing pen(m) in such a way that Zm2 ≤ θ pen(m) (with highprobability), one can hope to compare the model selection procedure with theoracle.We are at the very point where the theory of empirical processes comesin, because the problem is now to control the quantity Z m ,whichisindeedthe supremum of an empirical process — at least when the empirical risk isdefined through (28.1). Lucien Birgé and I first used this idea in 1994, whilepreparing our contribution to the Festschrift for Lucien Le Cam to markhis 70th birthday. The corresponding paper, Birgé and Massart (1997), waspublished later and we generalized it in Barron et al. (1999).In the context of least squares density estimation that we were investigating,the weight function w to be considered is precisely of the formw (x) = 2 √ x.ThusifthemodelS m happens to be an finite-dimensionalsubspace of L 2 (µ) generated by some orthonormal basis {φ λ : λ ∈ Λ m },thequantity of interest¯γ n (s) − ¯γ n (t)Z m = sup(28.2)t∈S m2 ‖s − t‖can easily be made explicit. Indeed, assuming that s belongs to S m (this assumptionis not really needed but makes the analysis much more illuminating),√ ∑Z m = (P n − P ) 2 (φ λ ), (28.3)λ∈Λ m

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!