11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

312 Anon-asymptoticwalkγ(t, x) =‖t‖ 2 −2t(x) in (28.1), where ‖·‖ denotes the norm in L 2 (µ), onegetsthe least squares criterion and the loss function is given by l(s, t) =‖s − t‖ 2for every t ∈ L 2 (µ).Example 2 (Gaussian white noise) Consider the process ξ (n) on [0, 1] definedby dξ (n) (x) = s (x) +n −1/2 dW (x) with ξ (n) (0) = 0, where W denotesthe Brownian motion. The least squares criterion is defined by γ n (t) =‖t‖ 2 − 2 ∫ 10 t (x)dξ(n) (x), and the corresponding loss function l is simply thesquared L 2 distance defined for all s, t ∈ [0, 1] by l(s, t) =‖s − t‖ 2 .Given a model S (which is a subset of S), the empirical risk minimizer issimply defined as a minimizer of γ n over S. Itisanaturalestimatorofs whosequality is directly linked to that of the model S. Thequestionisthen:Howcanone choose a suitable model S? It would be tempting to choose S as large aspossible. Taking S as S itself or as a “big” subset of S is known to lead eitherto inconsistent estimators (Bahadur, 1958) or to suboptimal estimators (Birgéand Massart, 1993). In contrast if S is a “small” model (e.g., some parametricmodel involving one or two parameters), the behavior of the empirical riskminimizer on S is satisfactory as long as s is close enough to S, butthemodelcan easily end up being completely wrong.One of the ideas suggested by Akaike is to use the risk associated to the lossfunction l as a quality criterion for a model. To illustrate this idea, it is convenientto consider a simple example for which everything is easily computable.Consider the white noise framework. If S is a linear space with dimensionD, and if φ 1 ,...,φ D denotes some orthonormal basis of S, theleastsquaresestimator is merely a projection estimator, viz.D∑{∫ 1}ŝ = φ j (x)dξ (n) (x)j=1and the expected quadratic risk of ŝ is equal to0φ jE(‖s − ŝ‖ 2 )=d 2 (s, S)+D/n.This formula for the quadratic risk reflects perfectly the model choiceparadigm: if the model is to be chosen in such a way that the risk of theresulting least squares estimator remains under control, a balance must bestruck between the bias term d 2 (s, S) and the variance term D/n.More generally, given an empirical risk criterion γ n , each model S m inan (at most countable and usually finite) collection {S m : m ∈M}can berepresented by the corresponding empirical risk minimizer ŝ m .Onecanusetheminimum of E {l (s, ŝ m )} over M as a benchmark for model selection. Ideallyone would like to choose m (s) so as to minimize the risk E {l (s, ŝ m )} withrespect to m ∈M. This is what Donoho and Johnstone called an oracle; see,e.g., Donoho and Johnstone (1994). The purpose of model selection is to designadata-drivenchoice ˆm which mimics an oracle, in the sense that the risk of theselected estimator ŝ ˆm is not too far from the benchmark inf m∈M E {l (s, ŝ m )}.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!