11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

R.J. Tibshirani 503that the error variance σ 2 is known. Suppose that we have run LAR for j − 1steps, yielding the active set of predictors A at λ = λ j . Now we take onemore step, entering a new predictor k(j), and producing estimates ˆβ(λ j ) atλ j+1 .Wewishtotestifthek(j)th component β k(j) is zero. We refit the lasso,keeping λ = λ j+1 but using just the variables in A. Thisyieldsestimatesˆβ A (λ j+1 ). Our proposed covariance test statistic is defined byT j = 1 σ 2 {〈y, X ˆβ(λ j+1 )〉−〈y, X A ˆβA (λ j+1 )〉}. (42.1)Roughly speaking, this statistic measures how much of the covariance betweenthe outcome and the fitted model can be attributed to the k(j)th predictor,which has just entered the model.Now something remarkable happens. Under the null hypothesis that allsignal variables are in the model: as p →∞, T j converges to an exponentialrandom variable with unit mean, E(1). The right panel of Figure 42.3 showsthe same example, using the covariance statistic. This test works for testingthe first variable to enter (as in the example), or for testing noise variablesafter all of the signal variables have entered. And it works under quite generalconditions on the design matrix. This result properly accounts for the adaptiveselection: the shrinkage in the l 1 fitting counteracts the inflation due toselection, in just the right way to make the degrees of freedom (mean) of thenull distribution exactly equal to 1 asymptotically. This idea can be appliedto a wide variety of models, and yields honest p-values that should be usefulto statistical practitioners.In a sense, the covariance test and its exponential distribution generalizethe RSS test and its chi-squared distribution, to the adaptive regressionsetting.This work is very new, and is summarized in Lockhart et al. (2014). Theproofs of the results are difficult, and use extreme-value theory and Gaussianprocesses. They suggest that the LAR knots λ k may be fundamental inunderstanding the effects of adaptivity in regression. On the practical side, regressionsoftware can now output honest p-values as predictors enter a model,that properly account for the adaptive nature of the process. And all of thismay be a result of the convexity of the l 1 -penalized objective.42.5 ConclusionIn this chapter I hope that I have conveyed my excitement for some recentdevelopments in statistics, both in its theory and practice. I predict that convexityand sparsity will play an increasing important role in the developmentof statistical methodology.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!