11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

P. Massart 311(at least for discrete models) is that if you allow model complexity to growwith sample size, you can then use minimum complexity penalization to buildnonparametric estimators of a density which adapts to the smoothness.Meanwhile, David Donoho, Iain Johnstone, Gérard Kerkyacharian and DominiquePicard were developing their approach to wavelet estimation. Theirstriking work showed that in a variety of problems, it is possible to build adaptiveestimators of a regression function or a density through a remarkably simpleprocedure: thresholding of the empirical wavelet coefficients. Many paperscould be cited here but Donoho et al. (1995) is possibly the mostly useful reviewon the topic. Wavelet thresholding has an obvious model selection flavorto it, as it amounts to selecting a set of wavelet coefficients from the data.At some point, it became clear to us that there was room for building ageneral theory to help reconcile Akaike’s classical approach to model selection,the emerging results by Barron and Cover or Donoho et al. in which modelselection is used to construct nonparametric adaptive estimators, and Vapnik’sstructural minimization of the risk approach for statistical learning; see Vapnik(1982).28.2.1 The model choice paradigmAssume that a random variable ξ (n) is observed which depends on a parametern. Forconcreteness,youmaythinkofξ (n) as an n-sample from some unknowndistribution. Consider the problem of estimating some quantity of interest, s,which is known to belong to some (large) set S. Consider an empirical riskcriterion γ n based on ξ (n) such that the mappingt ↦→ E {γ n (t)}achieves a minimum at point s. Onecanthendefineanatural(nonnegative)loss function related to this criterion by setting, for all t ∈S,l (s, t) =E{γ n (t)}−E {γ n (s)} .When ξ (n) =(ξ 1 ,...,ξ n ), the empirical risk criterion γ n is usually defined assome empirical meanγ n (t) =P n {γ (t, ·)} = 1 nn∑γ (t, ξ i ) (28.1)of an adequate risk function γ. Two typical examples are as follows.Example 1 (Density estimation) Let ξ 1 ,...,ξ n be a random sample froman unknown density s with respect to a given measure µ. Taking γ(t, x) =− ln{t(x)} in (28.1) leads to the log-likelihood criterion. The correspondingloss function, l, is simply the Kullback–Leibler information between the probabilitymeasures sµ and tµ. Indeed, l(s, t) = ∫ s ln(s/t)dµ if sµ is absolutelycontinuous with respect to tµ and l(s, t) =∞ otherwise. However ifi=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!