11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

P. Massart 31328.2.2 Non-asymptopiaThe penalized empirical risk model selection procedure consists in consideringan appropriate penalty function pen: M→R + and choosing ˆm to minimizecrit (m) =γ n (ŝ m )+pen(m)over M. OnecanthendefinetheselectedmodelS ̂m and the penalized empiricalrisk estimator ŝ ˆm .Akaike’s penalized log-likelihood criterion corresponds to the case wherethe penalty is taken as D m /n, whereD m denotes the number of parametersdefining the regular parametric model S m . As mentioned above, Akaike’sheuristics heavily relies on the assumption that the dimension and the numberof models are bounded with respect to n, as n →∞.Variouspenalizedcriteria have been designed according to this asymptotic philosophy; see, e.g.,Daniel and Wood (1971).In contrast a non-asymptotic approach to model selection allows both thenumber of models and the number of their parameters to depend on n. Onecan then choose a list of models which is suitable for approximation purposes,e.g., wavelet expansions, trigonometric or piecewise polynomials, or artificialneural networks. For example, the hard thresholding procedure turns out tobe a penalized empirical risk procedure if the list of models depends on n.To be specific, consider once again the white noise framework and consideran orthonormal system φ 1 ,...,φ n of L 2 [0, 1] that depends on n. Foreverysubset m of {1,...,n}, definethemodelS m as the linear span of {φ j : j ∈ m}.The complete variable selection problem requires the selection of a subset mfrom the collection of all subsets of {1,...,n}.Takingapenaltyfunctionoftheform pen (m) =T 2 |m| /n leads to an explicit solution for the minimizationof crit(m) becauseinthiscase,setting ˆβ j = ∫ φ j (x)dξ (n) (x), the penalizedempirical criterion can be written ascrit (m) =− ∑ j∈mˆβ 2 j + T 2 |m|nThis criterion is obviously minimized at= ∑ j∈m(− ˆβ 2 j + T 2ˆm = {j ∈{1,...,n} : √ n | ˆβ j |≥T },which is precisely the hard thresholding procedure. Of course the crucial issueis to choose the level of thresholding, T .More generally the question is: what kind of penalty should be recommendedfrom a non-asymptotic perspective? The naive notion that Akaike’scriterion could be used in this context fails in the sense that it may typicallylead to under-penalization. In the preceding example, it would lead to thechoice T = √ 2 while it stems from the work of Donoho et al. that the level ofthresholding should be at least of order √ 2ln(n) as n →∞.n).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!