Subsampling estimates of the Lasso distribution.

More documents

Recommendations

Info

4.1 Variable selection consistency 37 For the second part of the claim, first note that max ‖|η nj |˜s n1 − |η nj |s n1 ‖ 2 ∑k n ∣ = max ∣|η nj |/| ˜β ni | − |η nj |/|η ni | ∣ 2 ∑k n ≤ M 2 |η nj | − | ˜β ni | n2 j>k n j>k n ∣ i=1 i=1 | ˜β nj ||η ni | ∣ ( ) 2 ≤ max || η nj | − | ˜β k n 2 ∑ nj || M n2 1≤j≤k n ∣|η ni | (1 2 + | ˜β ) ni |/|η ni | − 1 ∣ ( = max || η nj | − | ˜β nj || 1≤j≤k n i=1 ) 2 ∑k n ∣ i=1 = o P (1/r 2 n) O P (1) 2 M 2 n1 = o P (1). M n2 |η ni | 2 (1 + o P (1)) ∣ Next, we have ∣ ∣∣| max ˜βnj | − |η nj | ∣ ‖˜s n1 ‖ = o P (1/r n )(1 + o P (1))M n1 = o P (1) j>k n The second part of the claim now follows from the triangle inequality. 2 2 4.1 Variable selection consistency Theorem 4.1.0.9. (Huang et al., 2008, Theorem 1)(Variable selection consistency of the adaptive Lasso) Suppose that assumptions B.1 to B.5 are valid, then ( ) P ˆβn = s β 0 → 1. Proof. From subdifferential calculus, we know that a vector ˆβ n ∈ R pn is a solution to the adaptive Lasso problem if the the subdifferential ∂L n ( ˆβ n ) of the objective function at ˆβ n contains the point zero, that is { x ′ j (y − X ˆβ n ) = λ n w nj sgn( ˆβ nj ), if ˆβ nj ≠ 0 ∣ ∣x ′ j (y − X) ˆβ ∣ ∣∣ n < λn w nj , if ˆβ (4.1.0.5) nj = 0 Furthermore, if the vector family solution. Define ˜s n1 = { x j , ˆβ } nj ≠ 0 is linearly independent, then ˆβ n is unique ( | ˜β n1 | −1 sgn(β 01 ), . . . , | ˜β ) ′ nkn | −1 sgn(β 0kn ) and ˆβ n1 = ( X ′ n1X n1 ) −1 ( X ′ n1 y − λ n˜s n1 ) = β 01 + 1 ( n C−1 n1 X ′ ) 1 ε − λ n˜s n1 If ˆβ n1 = s β 01 then 4.1 holds for ˆβ ( n = ˆβ′ n1, 0 ′) ′ . For this particular ˆβn we have X ˆβ n = X 1 ˆβ n1 . The family {x j , 1 ≤ j ≤ k n } is then linearly independent, hence X ′ n1 X n1 regular and we obtain that ˆβ n = s β 0 if { ∣ ˆβn1 = s β 01 ∣x ′ j (y − X n1 ˆβ (4.1.0.6) n1 ) ∣ < λ n w nj , k n < j ≤ p n
38 The adaptive Lasso in a high dimensional setting Now, let H n = I n − 1 n X n1C −1 n1 X′ n1 be the projection matrix on the nullspace of X′ 1 . By definition of ˆβ n1 , we have ( y − X n1 ˆβn1 = ε + X 1 β 01 − ˆβ ) n1 = ε + 1 n X n1C −1 ( n1 X ′ ) 1 ε − λ n˜s n1 = H n ε + λ n n X n1C −1 n1 ˜s n1 Following condition 4.1.0.6 we have that ˆβ n = s β 0 if ⎧ ⎨ sgn (β 0j ) (β 0j − ˆβ ) nj < |β 0j | 1 ≤ j ≤ k n ( )∣ ⎩ ∣ ∣x ′ j H n ε + λn n X n1C −1 n1 ˜s ∣∣ (4.1.0.7) n1 < λn w nj , k n < j ≤ p n Next, denote by e j , the j-th standard unit vector and choose an arbitrary 0 < κ < κ + δ < 1. One verifies that according to condition 4.1.0.7, ˆβn ≠ s β 0 implies the realization of either of the following events : B n1 = B n2 = B n3 = B n4 = k n ⋃ j=1 k n ⋃ j=1 ⋃p n j=k n+1 ⋃p n j=k n+1 { } n −1 |e ′ jC −1 n1 X 1ε| ≥ |β 0j |/2 { } |e j C −1 n1 ˜s n1| ≥ |β 0j |/2 {∣ ∣∣x ′ j H n ε∣ ∣ ≥ (1 − κ − δ)λ n w nj } { n −1 ∣ ∣ ∣x ′ j X 1 C −1 n1 ˜s ∣ n1 } ∣ ≥ (κ + δ) . We show that they each have probability tending to zero as n → ∞. For B n1 , first note that the matrix C n1 has root n 1/2 X n1 (X ′ n1 X n1) −1 . Thus we obtain ∥ ∥∥∥ ( ) ∥ n −1 e ′ jCn1 −1 ′ ∥∥∥ X′ 1 = n −1/2 ‖n −1/2 X n1 C −1 n1 e j‖ ≤ n −1/2 ‖C −1/2 n1 ‖ ‖e j ‖ ≤ (nτ n1 ) −1/2 by the spectral decomposition theorem. It follow that ⎛ ⋃k n { } ⎞ P (B n1 ) = P ⎝ ‖e ′ jC −1 n1 X′ 1ε‖/n ≥ |β 0j |/2 ⎠ ≤ k n q ∗ n 1=1 ( √ ) bn1 τn1 n 2 with q n (·) given in Lemma 4.0.2.7. Now, assumptions B.1, B.4 and B.5 imply that P (B n1 ) → 0. For the set B n2 , note that Lemma 4.0.2.6, then assumptions B.4 and B.5 imply that λ n ∣ ∣e j C −1 n1 n ˜s n1∣ ≤ λ ( ) n‖˜s n1 ‖ λn M n1 = O P = o P (b n1 ). nτ n1 nτ n1
Page 1: ✄✄ ✄ ✄ ✄ ✄✄ ✄ ✄
Page 4 and 5: iv Abstract
Page 6 and 7: vi CONTENTS Contents Notation viii
Page 8 and 9: viii Notation List of Tables 6.1 Mo
Page 10 and 11: Chapter 1 Introduction In this thes
Page 12 and 13: Chapter 2 Minimizers of convex proc
Page 14 and 15: 2.1 Convergence in probability 5 fo
Page 16 and 17: 2.2 Convergence in distribution 7 a
Page 18 and 19: 2.2 Convergence in distribution 9 L
Page 20 and 21: 2.2 Convergence in distribution 11
Page 22 and 23: 2.2 Convergence in distribution 13
Page 24 and 25: 2.3 A continous mapping theorem for
Page 30 and 31: Chapter 3 Application to the Lasso
Page 32 and 33: 3.2 Limit in distribution 23 almost
Page 34 and 35: 3.2 Limit in distribution 25 We als
Page 36 and 37: 3.2 Limit in distribution 27 3.2.2
Page 38 and 39: 3.2 Limit in distribution 29 Figure
Page 40 and 41: Chapter 4 The adaptive Lasso in a h
Page 42 and 43: 33 B.3 (Adaptive irrepresentable co
Page 44 and 45: 35 exists a constant K d depending
Page 48 and 49: 4.2 Partial asymptotic normality 39
Page 50 and 51: 4.3 Marginal regressors as initial
Page 52 and 53: Chapter 5 Subsampling In his semina
Page 54 and 55: 5.1 Pointwise consistency for distr
Page 56 and 57: 5.2 Uniform consistency for quantil
Page 64 and 65: Chapter 6 Numerical results Motivat
Page 66 and 67: 6.1 Low dimensinal setting 57 5. De
Page 68 and 69: 6.1 Low dimensinal setting 59 Figur
Page 70 and 71: 6.2 High dimensional setting 61 3.
Page 72 and 73: 6.2 High dimensional setting 63 (d)
Page 74 and 75: 6.2 High dimensional setting 65 Mod
Page 76 and 77: 6.2 High dimensional setting 67 Mod
Page 78 and 79: 6.2 High dimensional setting 69 Fig
Page 80 and 81: Chapter 7 Concluding remarks We rev
Page 82 and 83: Bibliography Andersen, P. and R. Gi
Page 84 and 85: Appendix A R Codes A.1 Simulation c
Page 86 and 87: A.1 Simulation code for the Lasso i
Page 88 and 89: A.2 Simulation code for the adaptiv
Page 90 and 91: A.2 Simulation code for the adaptiv

Subsampling estimates of the Lasso distribution.

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?