21.06.2014 Views

Subsampling estimates of the Lasso distribution.

Subsampling estimates of the Lasso distribution.

Subsampling estimates of the Lasso distribution.

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 1<br />

Introduction<br />

In this <strong>the</strong>sis we consider <strong>the</strong> linear regression model<br />

Y i = x ′ iβ + ε i , i = 1, . . . , n<br />

and focus on <strong>the</strong> Least Absolute Shrinkage and Selection Operator, or <strong>Lasso</strong>, as estimation<br />

method for <strong>the</strong> parameter β. The <strong>Lasso</strong>, defined as<br />

arg min<br />

φ∈R p<br />

∑ n p∑<br />

(x ′ iφ − Y i ) 2 + λ |φ| j ,<br />

i=1<br />

j=1<br />

was introduced by Tibshirani (1996) and has become very popular over <strong>the</strong> last years.<br />

There are two main reasons for this gain in popularity. The first being, in a context <strong>of</strong><br />

high dimensional data analysis, where <strong>the</strong> dimension <strong>of</strong> <strong>the</strong> parameter is very large, by introducing<br />

an l 1 -norm penalty in addition to <strong>the</strong> squared loss, <strong>the</strong> <strong>Lasso</strong> can induce sparsity<br />

on <strong>the</strong> estimated model while maintaining its prediction ability. That is, many coefficients<br />

will be set to zero, which is a desirable property if one believes that only few parameters<br />

are relevant. The second reason is that <strong>the</strong> solution can be computed efficiently by convex<br />

optimization. Efron, Hastie, Johnstone, and Tibshirani (2004) introduced <strong>the</strong> LARS<br />

algorithm which computes <strong>the</strong> entire solution path, that is, <strong>estimates</strong> corresponding to all<br />

values λ > 0 with <strong>the</strong> cost <strong>of</strong> a single least square regression.<br />

For <strong>the</strong> sole purpose <strong>of</strong> prediction, <strong>the</strong> <strong>Lasso</strong> proves to be very efficient. Several authors<br />

showed under various assumptions that it enjoys a so-called oracle property, see for instance<br />

Zou (2006), Bunea, Tsybakov, and Wegkamp (2007) or Van De Geer and Bühlmann<br />

(2009). Roughly speaking, an oracle result states that prediction based on <strong>the</strong> <strong>Lasso</strong> solution<br />

will be as accurate as if <strong>the</strong> true model was known. Variable selection properties<br />

<strong>of</strong> <strong>the</strong> <strong>Lasso</strong> are by now also quite well understood and it was shown that consistency in<br />

this mode can be characterized by a so called irrerepresentable condition, see for instance<br />

Zhao and Yu (2006) or Meinshausen and Bühlmann (2006).<br />

Never<strong>the</strong>less, <strong>the</strong> <strong>Lasso</strong> <strong>the</strong>ory still needs to bridge a major gap with traditional statistical<br />

inference. Indeed, it is to this date difficult to assign a measure <strong>of</strong> incertainty to its <strong>estimates</strong>.<br />

One consequence is that <strong>the</strong> data analyst who goes beyond <strong>the</strong> goal <strong>of</strong> prediction<br />

and is interrested in selecting <strong>the</strong> exact generating model will typically face many noise<br />

variables for which no statistical tests are available. Also, <strong>the</strong>re are no confidence intervals<br />

for <strong>the</strong> estimated coeffficents.<br />

Assigning confidence intervals to an estimator or testing for a null hypo<strong>the</strong>sis typically<br />

involves understanding its <strong>distribution</strong>al properties. In <strong>the</strong> situation where an analytic<br />

1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!