11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

P. Massart 315where P = sµ. In other words, Z m is simply the square root of a χ 2 -typestatistic. Deriving exponential bounds for Z m from (28.3) does not look especiallyeasier than starting from (28.2). However, it is clear from (28.3) thatE(Z 2 m) can be computed explicitly. As a result one can easily bound E(Z m )using Jensen’s inequality, viz.E(Z m ) ≤ √ E(Z 2 m).By shifting to concentration inequalities, we thus hoped to escape the heavyduty“chaining” machinery, which was the main tool available at that time tocontrol suprema of empirical processes.It is important to understand that this is not merely a question of taste orelegance. The disadvantage with chaining inequalities is that even if you optimizethem, at the end of the day the best you can hope for is to derive a boundwith the right order of magnitude; the associated numerical constants are typicallyridiculously large. When the goal is to validate (or invalidate) penalizedcriteria such as Mallows’ or Akaike’s criterion from a non-asymptotic perspective,constants do matter. This motivated my investigation of the fascinatingtopic of concentration inequalities.28.3 Welcome to Talagrand’s wonderlandMotivated by the need to understand whether one can derive concentrationinequalities for suprema of empirical processes, I intensified my readings onthe concentration of measures. By suprema of empirical processes, I meanZ =supt∈Tn∑X i,t ,where T is a set and X 1,t ,...,X n,t are mutually independent random vectorstaking values in R T .For applications such as that which was described above, it is important tocover the case where T is infinite, but for the purpose of building structuralinequalities like concentration inequalities, T finite is in fact the only casethat matters because one can recover the general case from the finite case byapplying monotone limit procedures, i.e., letting the size of the index set growto infinity. Henceforth I will thus assume the set T to be finite.When I started investigating the issue in 1994, the literature was dominatedby the Gaussian Concentration Theorem for Lipschitz functions of independentstandard Gaussian random variables. This result was proved independentlyby Borell (1975) and by Cirel’son and Sudakov (1974). As a sideremark, note that these authors actually established the concentration of Lipschitzfunctions around the median; the analogous result for the mean is duei=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!