21.06.2014 Views

Subsampling estimates of the Lasso distribution.

Subsampling estimates of the Lasso distribution.

Subsampling estimates of the Lasso distribution.

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

42 The adaptive <strong>Lasso</strong> in a high dimensional setting<br />

C.3 The minimum ˜b n1 = min 1≤j≤kn |η nj | satisfies<br />

for<br />

r n =<br />

kn<br />

1/2 (1 + c n )˜b n1 rn −1 → 0<br />

n 1/2<br />

log(m n ) 1/d log(n) 1{d=1}<br />

Theorem 4.3.0.11. (Huang et al., 2008, Theorem 3) Suppose that assumptions C.1 to<br />

C.3 hold. Then β nj defined in 4.3.0.9 is r n consistent for η nj defined in 4.3.0.10 and <strong>the</strong><br />

adaptive irrepresentable condition holds.<br />

Pro<strong>of</strong>. Let<br />

Then<br />

µ 0 = E(Y) =<br />

˜β nj = x′ j Y<br />

n<br />

p n<br />

∑<br />

j=1<br />

x j β 0j .<br />

= η nj + x′ j ε<br />

n<br />

where η nj = x ′ j µ 0/n. The covariates are assumed to be standardized, that is ‖x j ‖/n = 1,<br />

so for arbitrary δ > 0, Lemma 4.0.2.7 implies that<br />

( {<br />

P r n max | ˜β<br />

} ) ( { } )<br />

nj − η nj | > δ = P r n max |x ′ jε|/n > δ<br />

1≤j≤k n 1≤j≤k n<br />

( )<br />

≤ p n qn<br />

∗ n 1/2 rn<br />

−1 δ = o P (1).<br />

Here, r n log(p n ) log(n) 1{d=1} n 1/2 = o P (1) have been used in <strong>the</strong> last equality. This proves<br />

<strong>the</strong> first part <strong>of</strong> assumption B.2. The second part follows from assumption C.3:<br />

(<br />

∑k n 1<br />

j=1<br />

For asumption B.3, note that<br />

η 2 nj<br />

+ M 2 n2<br />

η 4 nj<br />

)<br />

≤ k n<br />

˜b2 nj<br />

(1 + c 2 n) = o(r 2 n).<br />

and<br />

∥<br />

∥X ′ ∥<br />

n1x j 2 ∑k n<br />

( = x<br />

′ ) 2<br />

i x j ≤ kn n 2 ρ 2 n<br />

i=1<br />

|η nj |‖s n1 ‖ ≤ kn<br />

1/2 c n<br />

for every k n < j ≤ p n . Now, assumption C.2 yields<br />

n −1 ∣<br />

|η nj | ∣x ′ jX n1 C −1<br />

n1 s n1∣ ≤ τn1 −1 c nk n ρ n .<br />

for such j. This completes <strong>the</strong> pro<strong>of</strong>.<br />

The general message <strong>of</strong> Theorem 4.3.0.11 is that marginals regressors can be used as initial<br />

<strong>estimates</strong> if correlations between relevant and noise variables are believed to be weak.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!