Subsampling estimates of the Lasso distribution.

More documents

Recommendations

Info

3.2 Limit in distribution 25 We also adopt the following partitioning of C, W, and u: ( ) C11 C C = 12 , C 21 C 22 where C 11 is a r × r matrix, C 22 is a (p − r) × (p − r) matrix and C 12 = C ′ 21 ; W = ( ) W1 ; W 2 u = ( u1 ) , u 2 where W 1 and u 1 are r-vectors. Finally, denote û = arg min u V (u). Proposition 3.2.1.1. If λ n / √ n → λ 0 ≥ 0 and C is nonsingular then (i) P (û 2 = 0) = P ( − λ 0 2 1 ≤ C 21C −1 11 (W 1 − λ o sgn(β))/2 − W 2 ≤ λ 0 2 1 ) where W 1 ∼ N (0, σ 2 C 11 ), W 2 ∼ N (0, σ 2 C 22 ) and the inequality is to be interpreted componentwise. (ii) û 1 has a Gaussian distribution and the distribution of every component of û 2 is the mixture of a Gaussian distribution with a point mass at zero. Proof. The subdifferential ∂V (û) at û contains 0, i.e. − 2 = 0 ( ) ( ) ( ) ⎛ W1 C11 C + 2 12 û1 + λ W 2 C 21 C 22 û 0 ⎝ 2 sgn(β) (sgn(û j 2 )I(ûj 2 ) ≠ 0 + e jI(û j 2 ) ) p−r for some values e j ∈ [0, 1], j = 1, . . . , p − r. In particular, the following relations between C, W, û 1 and û 2 always hold : û 1 = C −1 11 (W 1 − C 12 û 2 − λ 0 sgn(β)/2) j=1 ⎞ ⎠ Together, they imply − λ 0 2 1 ≤ −W 2 + C 21 û 1 + C 22 û 2 ≤ λ 0 2 1. − λ ( ) 0 2 1 ≤ −W 2 + C 21 C −1 11 (W 1 − λ 0 sgn(β)) + C 22 − C 21 C −1 11 C 12 û 2 ≤ λ 0 2 1 (3.2.1.1) Next, suppose that û 2 = 0. Then it follows readily from 3.2.1.1 that − λ 0 2 1 ≤ −W 2 + C 21 C −1 11 (W 1 − λ 0 sgn(β)) ≤ λ 0 2 1 (3.2.1.2)
26 Application to the Lasso estimator Conversely, suppose that 3.2.1.2 holds, we show that the minimizer necessarily satisfies û 2 = 0. Indeed, if û 2 ≠ 0 then the equality is attained in 3.2.1.2 at every line corresponding to nonzero components of û 2 , that is, ⎧ ( ( ) ) ⎪⎨ −W 2 + C 21 C −1 11 (W 1 − λ 0 sgn(β)) + C 22 − C 21 C −1 11 C 12 û 2 = − λ 0 2 , if ûj 2 > 0 j ⎪ ⎩ λ 0 2 , if ûj 2 < 0 In particular, it follows from 3.2.1.2 that { (( ) ) C 22 − C 21 C −1 11 C 12 û 2 ∈ [−λ0 , 0], if û j 2 > 0 j [0, λ 0 ] if û j 2 < 0 (3.2.1.3) Note that (C 22 − C 21 C −1 11 C 12), the Schur complement of C, is SPD since C is. Let D be the matrix which results from (C 22 − C 21 C −1 11 C 12) after removal of the lines and columns corresponding to the zero components of û 2 , D is also SPD. Let û≠0 2 be the vector which results from û 2 after removal of the zero components. Then 3.2.1.3 implies that û≠0 T 2 Dû≠0 2 ≤ 0 which contradicts the fact that D is SPD, this complete the argument. Finally, we show that every component of the limit distribution has a discontinuity at the point zero only, and is otherwise Gaussian. Consider after an eventual permutation of the last p − r covariates a further partitioning of W 2 , u 2 and C of the form ( ) W2,1 W 2 = W 2,0 u 2 = ( u2,1 u 2,0 ) , and C = ⎛ ⎜ ⎝ ⎞ C 11 C 12 C 13 ⎟ C 21 C 22 C 23 ⎠ C 31 C 32 C 33 respectively, where W 2,1 and u 2,1 are r ′ -vectors for some 0 < r ′ the event { sgn(u2,1 ) = s ′ ; u 2,0 = 0 } , (u 1 , u 2,1 ) ′ has the following distribution : ( ) ( ( )) u1 ∼ N σ 2 λ 0 ˜C, − u 2,1 2 ˜C −1 sgn(β) s ′ This completes the proof.
Page 1: ✄✄ ✄ ✄ ✄ ✄✄ ✄ ✄
Page 4 and 5: iv Abstract
Page 6 and 7: vi CONTENTS Contents Notation viii
Page 8 and 9: viii Notation List of Tables 6.1 Mo
Page 10 and 11: Chapter 1 Introduction In this thes
Page 12 and 13: Chapter 2 Minimizers of convex proc
Page 14 and 15: 2.1 Convergence in probability 5 fo
Page 16 and 17: 2.2 Convergence in distribution 7 a
Page 18 and 19: 2.2 Convergence in distribution 9 L
Page 20 and 21: 2.2 Convergence in distribution 11
Page 22 and 23: 2.2 Convergence in distribution 13
Page 24 and 25: 2.3 A continous mapping theorem for
Page 30 and 31: Chapter 3 Application to the Lasso
Page 32 and 33: 3.2 Limit in distribution 23 almost
Page 36 and 37: 3.2 Limit in distribution 27 3.2.2
Page 38 and 39: 3.2 Limit in distribution 29 Figure
Page 40 and 41: Chapter 4 The adaptive Lasso in a h
Page 42 and 43: 33 B.3 (Adaptive irrepresentable co
Page 44 and 45: 35 exists a constant K d depending
Page 46 and 47: 4.1 Variable selection consistency
Page 48 and 49: 4.2 Partial asymptotic normality 39
Page 50 and 51: 4.3 Marginal regressors as initial
Page 52 and 53: Chapter 5 Subsampling In his semina
Page 54 and 55: 5.1 Pointwise consistency for distr
Page 56 and 57: 5.2 Uniform consistency for quantil
Page 64 and 65: Chapter 6 Numerical results Motivat
Page 66 and 67: 6.1 Low dimensinal setting 57 5. De
Page 68 and 69: 6.1 Low dimensinal setting 59 Figur
Page 70 and 71: 6.2 High dimensional setting 61 3.
Page 72 and 73: 6.2 High dimensional setting 63 (d)
Page 74 and 75: 6.2 High dimensional setting 65 Mod
Page 76 and 77: 6.2 High dimensional setting 67 Mod
Page 78 and 79: 6.2 High dimensional setting 69 Fig
Page 80 and 81: Chapter 7 Concluding remarks We rev
Page 82 and 83: Bibliography Andersen, P. and R. Gi
Page 84 and 85:
Appendix A R Codes A.1 Simulation c
Page 86 and 87:
A.1 Simulation code for the Lasso i
Page 88 and 89:
A.2 Simulation code for the adaptiv
Page 90 and 91:
A.2 Simulation code for the adaptiv
show all

Subsampling estimates of the Lasso distribution.

Create successful ePaper yourself

Delete template?

Save as template?