Subsampling estimates of the Lasso distribution.

More documents

Recommendations

Info

35 exists a constant K d depending on d only such that ∥ ⎡ (∣ ∥∥∥∥ψd ∣∣∣∣ ∑ a i ε i ≤ K d ⎣E n [ ∑ n n∑ ∥ i=1 = K d ⎡ ⎣E ≤ K d ⎡ ⎢ ≤ K d ⎡ ⎣E ∣) ∣∣∣∣ a i ε i + i=1 (∣( ∣∣∣∣ ∑ n ) a i ε i · 1 ∣ i=1 ⎛( ∑ n ⎣E ⎝ (( n ∑ ) ⎞ 2 a i ε i ⎠ i=1 a 2 i ε 2 i i=1 ] ⎤ 1/d ′ ‖a i ε i ‖ d′ ⎦ ψ d i=1 ) [ ∣ ∑ n 1/2 + ] ⎤ 1/d ′ |a i | d′ ‖ε i ‖ d′ ⎦ ψ d i=1 [ n ] ⎤ + (1 + K) 1/d C −1/d ∑ 1/d ′ |a i | d′ ⎥ ⎦ i=1 )) 1/2 + (1 + K) 1/d C −1/d [ n ∑ ⎡ [ n ] ⎤ =≤ K d ⎣σ + (1 + K) 1/d C −1/d ∑ 1/d ′ |a i | d′ ⎦ i=1 ] ⎤ 1/d ′ |a i | d′ ⎦ i=1 Here, Hölder’s inequality and Lemma 4.0.2.4 have been used in the second inequality. For 1 < d ≤ 2, d ′ = d/(d − 1) ≥ 2, which implies It follows that ( n∑ n ) ∑ d ′ /2 |a i | d′ ≤ |a i | 2 = 1 i=1 i=1 ∥ n∑ ∥∥∥∥ψd ( a ∥ i ε i ≤ K d σ + (1 + K) 1/d C −1/d) i=1 For d = 1, by Lemma 4.0.2.7, there exists some constant K 1 such that ∥ (∣ n∑ ∥∥∥∥ψ1 ∣∣∣∣ a ∥ i ε i ≤ K 1 [E n ∣) ] ∑ ∣∣∣∣ a i ε i + ‖ max |a iε i |‖ ψ1 1≤i≤n i=1 i=1 ] ≤ K 1 [σ + K ′ log(n) max ‖a iε i ‖ ψ1 1≤i≤n ] ≤ K 1 [σ + K ′ (1 + K)C −1 log(n) max |a i| 1≤i≤n ] ≤ K 1 [σ + K ′ (1 + K)C −1 log(n) where Hölder’s inequality and Lemma 4.0.2.6 have been used in the second inequality. Finally, note that an arbitrary random variable X, the following holds P (X > t‖X‖ ψd ) ≤ (1 + ψ d (t)) −1 (1 + E (ψ d (|X|/‖X‖ ψd ))) ≤ 2 exp(−t d ) for all t > 0, by Markov’s inequality and by definition of the ψ d -Orlicz norm. ( Lemma 4.0.2.8. Let ˜s n1 = | ˜β ′ nj | −1 sgn(β 0j )) and s n1 = ( | η˜ nj | −1 sgn(β 0j ) ) ′ j∈J j∈J n1 . n1 Suppose that assumption B.2 holds. Then, ‖˜s n1 ‖ = (1 + o P (1)) M n1
36 The adaptive Lasso in a high dimensional setting and ∥ ∥ ∥∥| ∥∥ max ˜βnj |˜s n1 − |η nj |s n1 = oP (1) j /∈J n1 Proof. By assumption B.2, we have ∣∣ ∣ ∣∣∣∣ ∣∣∣∣ ˜βnj ∣∣∣∣ max − 1∣ 1≤j≤k n η nj ∣ = ∣ ∣ ∣∣∣∣ max 1 ∣∣∣∣ ∣ ∣∣| ˜βnj | − |η nj | ∣ ≤ O P (1/r n )M n1 = o P (1) 1≤j≤k n η nj ∣∣ ∣ ∣∣∣∣ ∣∣∣∣ η ∣∣∣∣ nj We also have max 1≤j≤kn − 1 ˜β nj ∣ = o P (1). Indeed, note that 1 | ˜β nj | = 1 | ˜β nj | − |η nj | + |η nj | ≤ 1 |η nj | 1 ( ∣ ∣∣ ∣1 + ˜βnj /η nj − 1)∣ ≤ M n1 1 |1 + o P (1)| = M n1O P (1) = o P (r n ), for every 1 ≤ j ≤ k n , it follows that ∣ ∣∣∣∣ η nj max − 1 1≤j≤k n ˜β nj ∣ = max 1/| ˜β ∣ nj | ∣|η nj | − | ˜β nj | ∣ 1≤j≤k n ( ∣ ∣∣|ηnj ≤ o P (r n ) max | − | ˜β ) nj | ∣ 1≤j≤k n ≤ o P (r n )o P (1/r n ) = o P (1) Now we can prove the first part of the claim, ‖˜s n1 ‖ = √ = ≤ k n ∑ 1 j=1 ˜β nj 2 ∑k n √ j=1 ∑k n √ j=1 ( ( 1 1 + |η )) 2 nj| |η nj | |β nj | − 1 ( ) 2 1 |η nj | (1 + o P (1)) ≤ (1 + o P (1)) √ k n ∑ j=1 ≤ (1 + o P (1)) M n1 1 |η nj | 2
Page 1: ✄✄ ✄ ✄ ✄ ✄✄ ✄ ✄
Page 4 and 5: iv Abstract
Page 6 and 7: vi CONTENTS Contents Notation viii
Page 8 and 9: viii Notation List of Tables 6.1 Mo
Page 10 and 11: Chapter 1 Introduction In this thes
Page 12 and 13: Chapter 2 Minimizers of convex proc
Page 14 and 15: 2.1 Convergence in probability 5 fo
Page 16 and 17: 2.2 Convergence in distribution 7 a
Page 18 and 19: 2.2 Convergence in distribution 9 L
Page 20 and 21: 2.2 Convergence in distribution 11
Page 22 and 23: 2.2 Convergence in distribution 13
Page 24 and 25: 2.3 A continous mapping theorem for
Page 30 and 31: Chapter 3 Application to the Lasso
Page 32 and 33: 3.2 Limit in distribution 23 almost
Page 34 and 35: 3.2 Limit in distribution 25 We als
Page 36 and 37: 3.2 Limit in distribution 27 3.2.2
Page 38 and 39: 3.2 Limit in distribution 29 Figure
Page 40 and 41: Chapter 4 The adaptive Lasso in a h
Page 42 and 43: 33 B.3 (Adaptive irrepresentable co
Page 46 and 47: 4.1 Variable selection consistency
Page 48 and 49: 4.2 Partial asymptotic normality 39
Page 50 and 51: 4.3 Marginal regressors as initial
Page 52 and 53: Chapter 5 Subsampling In his semina
Page 54 and 55: 5.1 Pointwise consistency for distr
Page 56 and 57: 5.2 Uniform consistency for quantil
Page 64 and 65: Chapter 6 Numerical results Motivat
Page 66 and 67: 6.1 Low dimensinal setting 57 5. De
Page 68 and 69: 6.1 Low dimensinal setting 59 Figur
Page 70 and 71: 6.2 High dimensional setting 61 3.
Page 72 and 73: 6.2 High dimensional setting 63 (d)
Page 74 and 75: 6.2 High dimensional setting 65 Mod
Page 76 and 77: 6.2 High dimensional setting 67 Mod
Page 78 and 79: 6.2 High dimensional setting 69 Fig
Page 80 and 81: Chapter 7 Concluding remarks We rev
Page 82 and 83: Bibliography Andersen, P. and R. Gi
Page 84 and 85: Appendix A R Codes A.1 Simulation c
Page 86 and 87: A.1 Simulation code for the Lasso i
Page 88 and 89: A.2 Simulation code for the adaptiv
Page 90 and 91: A.2 Simulation code for the adaptiv

Subsampling estimates of the Lasso distribution.

Create successful ePaper yourself

Delete template?

Save as template?