11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

310 Anon-asymptoticwalkics, statistical mechanics, random matrix theory, high-dimensional geometry,and statistics.The study of random fluctuations of suprema of empirical processes hasbeen crucial in the application of concentration inequalities to statistics andmachine learning. It also turned out to be a driving force behind the developmentof the theory. This is exactly what I would like to illustrate here whilefocusing on the impact on my own research in the 1990s and beyond.28.2 Model selectionModel selection is a classical topic in statistics. The idea of selecting a model bypenalizing some empirical criterion goes back to the early 1970s with the pioneeringwork of Mallows and Akaike. The classical parametric view on modelselection as exposed in Akaike’s seminal paper (Akaike, 1973) on penalizedlog-likelihood is asymptotic in essence. More precisely Akaike’s formula forthe penalty depends on Wilks’ theorem, i.e., on an asymptotic expansion ofthe log-likelihood.Lucien Birgé and I started to work on model selection criteria based on anon-asymptotic penalized log-likelihood early in the 1990s. We had in mindthat in the usual asymptotic approach to model selection, it is often unrealisticto assume that the number of observations tends to infinity while the list ofmodels and their size are fixed. Either the number of observations is not thatlarge (a hundred, say) and when playing with models with a moderate numberof parameters (five or six) you cannot be sure that asymptotic results apply, orthe number of observations is really large (as in signal de-noising, for instance)and you would like to take advantage of it by considering a potentially largelist of models involving possibly large numbers of parameters.From a non-asymptotic perspective, the number of observations and thelist of models are what they are. The purpose of an ideal model selectionprocedure is to provide a data-driven choice of model that tends to optimizesome criterion, e.g., minimum expected risk with respect to the quadraticloss or the Kullback–Leibler loss. This provides a well-defined mathematicalformalization of the model selection problem but it leaves open the search fora neat generic solution.Fortunately for me, the early 1990s turned out to be a rich period forthe development of mathematical statistics, and I came across the idea thatletting the size of models go to infinity with the number of observations makesit possible to build adaptive nonparametric estimators. This idea can be tracedback to at least two different sources: information theory and signal analysis.In particular, Lucien and I were very impressed by the beautiful paperof Andrew Barron and Tom Cover (Barron and Cover, 1991) on density estimationvia minimum complexity model selection. The main message there

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!