21.06.2014 Views

Subsampling estimates of the Lasso distribution.

Subsampling estimates of the Lasso distribution.

Subsampling estimates of the Lasso distribution.

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.2 High dimensional setting 61<br />

3. If ˆp (1) < α/s but ˆp (2) ≥ α/(s − 1), accept H (2) , ·, H (s) and stop. If ˆp (1) < α/s and<br />

ˆp (2) < α/(s − 1), reject H (2) in addition to H (1) and test <strong>the</strong> remaining s − 2 hypo<strong>the</strong>ses<br />

at level α/(s − 2).<br />

One can show that <strong>the</strong> Bonferonni-Holm procedure controls <strong>the</strong> FWER (Lehmann and<br />

Romano, 2005, Theorem 9.1.2).<br />

In our case, for an individual hypo<strong>the</strong>sis H 0,j : β j = 0, we propose <strong>the</strong> following p-value<br />

based on subsampling: for a realized statistics √ n ˆβ j n computed on <strong>the</strong> whole sample,<br />

choose as p-value <strong>the</strong> proportion among B subsamples <strong>of</strong> values √ r( ˆβ j n,b,i − ˆβ j n) lying<br />

beyond or beneath it, depending <strong>the</strong>reon if it is positive or negative, more precisely:<br />

ˆp j =<br />

{<br />

B<br />

−1 ∑ B<br />

i=1 1{ √ r( ˆβ j n,b,i − ˆβ j n) ≥ √ n ˆβ j n} if ˆβ j n ≥ 0<br />

B −1 ∑ B<br />

i=1 1{ √ r( ˆβ j n,b,i − ˆβ j n) ≤ √ n ˆβ j n} if ˆβ j n < 0<br />

(6.1.3.2)<br />

where r = b/(1 − b/n) is <strong>the</strong> finite sample corrected subsample size. Similar p-values<br />

are considered in Berg, McMurry, and Politis (2010). Here, <strong>the</strong> disctintion between <strong>the</strong><br />

positive and <strong>the</strong> case is made to take in consideration power issues.<br />

Powers and Family Wise Errors Rates based on 500 replications are reported in tables<br />

6.4, 6.5 and 6.6. The procedure controls <strong>the</strong> FWER while leaving <strong>the</strong> power unaffected,<br />

indicating that <strong>the</strong> proposed p-values are correct.<br />

6.2 High dimensional setting<br />

To access <strong>the</strong> performance <strong>of</strong> subsampling in a high dimensional setting, we apply it to <strong>the</strong><br />

adaptive <strong>Lasso</strong> presented in Chapter 3. We generate four data sets D,D’,E and E’ which<br />

differ in <strong>the</strong> correlation structure between covariates and <strong>the</strong> dimension <strong>of</strong> <strong>the</strong> parameters.<br />

They result from <strong>the</strong> linear model<br />

where<br />

Y i = x ′ iβ + ε i , i = 1, . . . , n<br />

β = (1, −1.25, , −1.25, 1, −1.5, 0) ′<br />

is a 400 × 1 vector in models D and D’, and a 800 × 1 vector in models E E’. The errors ε i<br />

are taken i.i.d. standard normal. X is multivariate normal with mean zero and covariance<br />

matrix C given by<br />

• C i,j = 0.8 |i−j| 1{i, j ∈ {1, . . . , 5} or i, j ∈ {6, . . . , 400}} in model D,<br />

• C i,j = 0.8 |i−j| , i, j = 1, . . . , 400 in model D’,<br />

• C i,j = 0.8 |i−j| 1{i, j ∈ {1, . . . , 5} or i, j ∈ {6, . . . , 800}} in model E,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!