Subsampling estimates of the Lasso distribution.
Subsampling estimates of the Lasso distribution.
Subsampling estimates of the Lasso distribution.
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
6.2 High dimensional setting 61<br />
3. If ˆp (1) < α/s but ˆp (2) ≥ α/(s − 1), accept H (2) , ·, H (s) and stop. If ˆp (1) < α/s and<br />
ˆp (2) < α/(s − 1), reject H (2) in addition to H (1) and test <strong>the</strong> remaining s − 2 hypo<strong>the</strong>ses<br />
at level α/(s − 2).<br />
One can show that <strong>the</strong> Bonferonni-Holm procedure controls <strong>the</strong> FWER (Lehmann and<br />
Romano, 2005, Theorem 9.1.2).<br />
In our case, for an individual hypo<strong>the</strong>sis H 0,j : β j = 0, we propose <strong>the</strong> following p-value<br />
based on subsampling: for a realized statistics √ n ˆβ j n computed on <strong>the</strong> whole sample,<br />
choose as p-value <strong>the</strong> proportion among B subsamples <strong>of</strong> values √ r( ˆβ j n,b,i − ˆβ j n) lying<br />
beyond or beneath it, depending <strong>the</strong>reon if it is positive or negative, more precisely:<br />
ˆp j =<br />
{<br />
B<br />
−1 ∑ B<br />
i=1 1{ √ r( ˆβ j n,b,i − ˆβ j n) ≥ √ n ˆβ j n} if ˆβ j n ≥ 0<br />
B −1 ∑ B<br />
i=1 1{ √ r( ˆβ j n,b,i − ˆβ j n) ≤ √ n ˆβ j n} if ˆβ j n < 0<br />
(6.1.3.2)<br />
where r = b/(1 − b/n) is <strong>the</strong> finite sample corrected subsample size. Similar p-values<br />
are considered in Berg, McMurry, and Politis (2010). Here, <strong>the</strong> disctintion between <strong>the</strong><br />
positive and <strong>the</strong> case is made to take in consideration power issues.<br />
Powers and Family Wise Errors Rates based on 500 replications are reported in tables<br />
6.4, 6.5 and 6.6. The procedure controls <strong>the</strong> FWER while leaving <strong>the</strong> power unaffected,<br />
indicating that <strong>the</strong> proposed p-values are correct.<br />
6.2 High dimensional setting<br />
To access <strong>the</strong> performance <strong>of</strong> subsampling in a high dimensional setting, we apply it to <strong>the</strong><br />
adaptive <strong>Lasso</strong> presented in Chapter 3. We generate four data sets D,D’,E and E’ which<br />
differ in <strong>the</strong> correlation structure between covariates and <strong>the</strong> dimension <strong>of</strong> <strong>the</strong> parameters.<br />
They result from <strong>the</strong> linear model<br />
where<br />
Y i = x ′ iβ + ε i , i = 1, . . . , n<br />
β = (1, −1.25, , −1.25, 1, −1.5, 0) ′<br />
is a 400 × 1 vector in models D and D’, and a 800 × 1 vector in models E E’. The errors ε i<br />
are taken i.i.d. standard normal. X is multivariate normal with mean zero and covariance<br />
matrix C given by<br />
• C i,j = 0.8 |i−j| 1{i, j ∈ {1, . . . , 5} or i, j ∈ {6, . . . , 400}} in model D,<br />
• C i,j = 0.8 |i−j| , i, j = 1, . . . , 400 in model D’,<br />
• C i,j = 0.8 |i−j| 1{i, j ∈ {1, . . . , 5} or i, j ∈ {6, . . . , 800}} in model E,