Subsampling estimates of the Lasso distribution.

More documents

Recommendations

Info

4.2 Partial asymptotic normality 39 Finally assumption B.5 yields P (B n2 ) = o(1). For B n3 , first note that 1 w nj = | ˜β nj | ≤ M n2 + O P (1/r n ), j = k n + 1, . . . , p n Furthermore, ‖(x j H n ) ′ ‖ ≤ √ n, so it follows that ⎛ ⋃p n { P (B n3 ) ≤ P ⎝ |x ′ jH n ε| ≥ (1 − κ − δ)λ n C −1 (M n2 + 1/r n ) −1}⎞ ⎠ + o P (1) ≤ m n q ∗ n j=k n+1 ((1 − κ − δ)λ n n −1/2 C −1 (M n2 + 1/r n ) −1) for C large enough. Lemma 4.0.2.7 and assumption B.4 now imply that P (B n3 ) → 0. Finally, for the set B n4 , recall that ‖x j ‖/n = 1, so Lemma 4.0.2.6 and assumption B.5 together imply {∣ ∣∣x ′ max j X 1 C −1 k nk n ≤ τ −1/2 n1 o P (1) = o P (1). ∣ Now, by assumption B.3, it holds that ∣η nj x ′ j X 1C −1 0. Thich completes the proof. ∣ ∣ /(nw nj ) − ∣η nj x ′ jX 1 C −1 ) ∥ ′ ∥∥∥ ∥ } ∥∥| n1 ˜βnj˜s n1 − η nj s n1 | ∥ n1 s ∣ n1 } n1 s n1∣ ∣ ≤ κ, so we indeed obtain P (B n4 ) → 4.2 Partial asymptotic normality The proof of the following result on partial asymptotic normality builds upon the fact that estimates of relevant coefficients stay away from zero with high probability as n tends to infinity. This is indeed a conclusion of Theorem 4.1.0.9 which asserts variable selection consistency. Note however that this result is too strong for this purpose and a consistency result for an l q -norm, q > 1 would actually be sufficient. Theorem 4.2.0.10. (Huang et al., 2008, Theorem 2)(Asymptotic normality for nonzero coefficients) Suppose that assumptions B.1 to B.6 are valid. For an arbitrary k n ×1-vector α n with ‖α n ‖ ≤ 1, let If M n1 λ n n −1/2 → 0, then ) n 1/2 s −1 n α ′ n (ˆβn − β 0 = n −1/2 s −1 n s 2 n = σ 2 α ′ nC −1 n1 α′ n n∑ ε i α ′ nC ′ n1u i + o P (1) N (0, 1). i=1 where o P (1) is a term that converges to zero in probability uniformly with respect to α n . Proof. Under assumptions B.1 to B.5, one has variable selection consistency according to Theorem 4.1.0.9, in particular ˆβ n1 has no zero component on a set with probability converging to one. On this set, one has ∂/∂β 1 L n ( ˆβ n1 , ˆβ n2 ) = 0, that is, −2 n∑ i=1 (y i − u i ˆβn1 − z i ˆβn2 ) u i + 2λ n ψ n = 0 (4.2.0.8)
40 The adaptive Lasso in a high dimensional setting where ψ n = written as ( w nj sgn( ˆβ ) n1j ) . Since β n2 = 0 and ε i = Y i − u i β 10 , this can be 1≤j≤k n n∑ i=1 (ε i − u i ( ˆβ n1 − β 01 ) − z i ˆβn2 ) u i + λ n ψ n = 0 Using C n1 = n −1 X ′ n1 X n1, we first obtain then n 1/2 α ′ n ( ) C n1 ˆβn1 − β 10 = 1 n ( ˆβ n1 − β 10 ) = n −1/2 n ∑ i=1 n∑ i=1 ε i u i − λ n n ψ n − 1 n n∑ z ′ ˆβ i n2 w i , i=1 ε i α ′ nC −1 n1 u i − n −1/2 λ n α ′ nC −1 n1 ψ n − n −1/2 n ∑ i=1 z ′ i ˆβ n2 u i . By Theorem 4.1.0.9, the set { ˆβ n2 = 0} has probability tending to one, thus the last term converges to zero in probability. By the spectral decomposition Theorem and assumption B.5 one has ‖C −1 n1 ‖ = τ n1 −1 ≤ τ 1 −1 . Cauchy-Schwarz inequality then yields ∣ ∣n −1/2 λ n α ′ nC −1 ∣ ≤ n −1/2 λ n ‖α ′ n‖ ‖C −1 n1 ‖ ‖˜s n1‖ n1 ψ n ≤ n −1/2 λ n τ −1 1 (1 + o P (1))M n1 = o P (1) Here we used Lemma 4.0.2.8 in the first inequality and the assumption n −1/2 λ n M n1 → 0 in the equality. So we obtain as claimed n 1/2 s −1 n = n −1/2 s −1 n n∑ i=1 ε i α ′ nC −1 n1 u i + o P (1). To prove asymptotic normality, we verify that the Lindeberg-Feller condition holds. Set and v i = n −1/2 s −1 n α n C −1 n1 x n1 ξ i = ε i v i . Then we have var ( n ∑ ) ξ i i=1 = σ 2 s −2 n α nC ′ −1 n1 α n = 1. For arbitrary δ > 0, n∑ i=1 ( E ξi 2 1{|ξ| > δ} ) = σ 2 n ∑ i=1 ( ) vi 2 E ε 2 i 1{|ε i v i | > δ}
Page 1: ✄✄ ✄ ✄ ✄ ✄✄ ✄ ✄
Page 4 and 5: iv Abstract
Page 6 and 7: vi CONTENTS Contents Notation viii
Page 8 and 9: viii Notation List of Tables 6.1 Mo
Page 10 and 11: Chapter 1 Introduction In this thes
Page 12 and 13: Chapter 2 Minimizers of convex proc
Page 14 and 15: 2.1 Convergence in probability 5 fo
Page 16 and 17: 2.2 Convergence in distribution 7 a
Page 18 and 19: 2.2 Convergence in distribution 9 L
Page 20 and 21: 2.2 Convergence in distribution 11
Page 22 and 23: 2.2 Convergence in distribution 13
Page 24 and 25: 2.3 A continous mapping theorem for
Page 30 and 31: Chapter 3 Application to the Lasso
Page 32 and 33: 3.2 Limit in distribution 23 almost
Page 34 and 35: 3.2 Limit in distribution 25 We als
Page 36 and 37: 3.2 Limit in distribution 27 3.2.2
Page 38 and 39: 3.2 Limit in distribution 29 Figure
Page 40 and 41: Chapter 4 The adaptive Lasso in a h
Page 42 and 43: 33 B.3 (Adaptive irrepresentable co
Page 44 and 45: 35 exists a constant K d depending
Page 46 and 47: 4.1 Variable selection consistency
Page 50 and 51: 4.3 Marginal regressors as initial
Page 52 and 53: Chapter 5 Subsampling In his semina
Page 54 and 55: 5.1 Pointwise consistency for distr
Page 56 and 57: 5.2 Uniform consistency for quantil
Page 64 and 65: Chapter 6 Numerical results Motivat
Page 66 and 67: 6.1 Low dimensinal setting 57 5. De
Page 68 and 69: 6.1 Low dimensinal setting 59 Figur
Page 70 and 71: 6.2 High dimensional setting 61 3.
Page 72 and 73: 6.2 High dimensional setting 63 (d)
Page 74 and 75: 6.2 High dimensional setting 65 Mod
Page 76 and 77: 6.2 High dimensional setting 67 Mod
Page 78 and 79: 6.2 High dimensional setting 69 Fig
Page 80 and 81: Chapter 7 Concluding remarks We rev
Page 82 and 83: Bibliography Andersen, P. and R. Gi
Page 84 and 85: Appendix A R Codes A.1 Simulation c
Page 86 and 87: A.1 Simulation code for the Lasso i
Page 88 and 89: A.2 Simulation code for the adaptiv
Page 90 and 91: A.2 Simulation code for the adaptiv

Subsampling estimates of the Lasso distribution.

Create successful ePaper yourself

Delete template?

Save as template?