Subsampling estimates of the Lasso distribution.

More documents

Recommendations

Info

Chapter 4 The adaptive Lasso in a high dimensional setting In a high dimensional setting, where the use of the Lasso is most justified, due to its sparsity inducing property, there are to this date no asymptotic results similar to those of Knight and Fu (2000). Hence we turn to a variant, the adaptive Lasso and presents the results of Huang et al. (2008) who studied it under saprsity asssumptions and a further a partial orthogonality assumption between relevant and noise covariates. Their approach offers the advantage to provide an asymptotic normality result for estimates corresponding to nonzero-coefficients. One typically resorts to a triangular array to model high dimensionality in linear models, that is, one assumes that Y i = x ′ iβ n0 + ε i , i = 1, . . . , n (4.0.2.1) where the parameter β n0 has a dimension p n allowed to grow faster than n. For observations (Y i , x i ), i = 1, . . . , n drawn from 4.0.2.1, the adaptive Lasso is defined as arg min φ∈R p L n ( φ) = arg min φ∈R p n ∑ i=1 (Y i − x ′ iφ) 2 ∑p n + λ n w nj |φ j | (4.0.2.2) where λ n is the usual penalty parameter and are {w nj } j are in general strictly positive weights. However, here we make the assumption that they are computed using an initial estimator ˜β nj of β nj , that is : j=1 w nj = | ˜β nj | −1 , (4.0.2.3) for j = 1, · · · , n. For notational convenience, in the remaining, we drop the subscript n for β n0 , yet dependence on n is implicitely assumed. Next, we introduce further notation. First, assume without loss of generality that β 0 takes the form β 0 = (β ′ 1, β ′ 2) ′ 31
32 The adaptive Lasso in a high dimensional setting where β 1 is a k n × 1 vector, has each of its components different from zero and β 2 is a m n × 1 vector equal to zero. Let x i = (x i1 , . . . , x ipn ) ′ be the covariate vector of the i-th observation, denote by u i its first part, corresponding to non-zero coefficients, and by z i the part corresponding to zero coefficients, i.e. The empirical covariance matrix x i = (u ′ i, z ′ i) ′ . C n = 1 n X′ nX n is also partitioned according to zero and non-zero coefficients. For X ′ n1 = (u 1 , · · · , u kn ) ′ , set C n1 = 1 n X′ n1X n1 , a k n ×k n matrix. Finally let ρ n1 and τ n1 be the smallest respectively the largest eigenvalue of τ n1 of C n1 . Results on the variable selection consistency and the partial normality of the adaptive Lasso will be derived under the following assumptions: Assumption B. B.1 {ε i } i is a sequence of i.i.d. random variables with mean zero andc there are some constants 1 ≤ d ≤ 2, C > 0 and K > 0, such that the tail probability satisfies for every x ≥ 0. P (|ε i | > x) ≤ K exp (−Cx d) B.2 The initial estimators ˜β nj are r n -consistent for the estimation of unknown constants η nj depending on β, that is r n The constants η nj satisfy ∣ ∣ ∣∣ ∣∣ max ˜βnj − η nj = OP (1), r n → ∞. 1≤j≤p n max k n
Page 1: ✄✄ ✄ ✄ ✄ ✄✄ ✄ ✄
Page 4 and 5: iv Abstract
Page 6 and 7: vi CONTENTS Contents Notation viii
Page 8 and 9: viii Notation List of Tables 6.1 Mo
Page 10 and 11: Chapter 1 Introduction In this thes
Page 12 and 13: Chapter 2 Minimizers of convex proc
Page 14 and 15: 2.1 Convergence in probability 5 fo
Page 16 and 17: 2.2 Convergence in distribution 7 a
Page 18 and 19: 2.2 Convergence in distribution 9 L
Page 20 and 21: 2.2 Convergence in distribution 11
Page 22 and 23: 2.2 Convergence in distribution 13
Page 24 and 25: 2.3 A continous mapping theorem for
Page 30 and 31: Chapter 3 Application to the Lasso
Page 32 and 33: 3.2 Limit in distribution 23 almost
Page 34 and 35: 3.2 Limit in distribution 25 We als
Page 36 and 37: 3.2 Limit in distribution 27 3.2.2
Page 38 and 39: 3.2 Limit in distribution 29 Figure
Page 42 and 43: 33 B.3 (Adaptive irrepresentable co
Page 44 and 45: 35 exists a constant K d depending
Page 46 and 47: 4.1 Variable selection consistency
Page 48 and 49: 4.2 Partial asymptotic normality 39
Page 50 and 51: 4.3 Marginal regressors as initial
Page 52 and 53: Chapter 5 Subsampling In his semina
Page 54 and 55: 5.1 Pointwise consistency for distr
Page 56 and 57: 5.2 Uniform consistency for quantil
Page 64 and 65: Chapter 6 Numerical results Motivat
Page 66 and 67: 6.1 Low dimensinal setting 57 5. De
Page 68 and 69: 6.1 Low dimensinal setting 59 Figur
Page 70 and 71: 6.2 High dimensional setting 61 3.
Page 72 and 73: 6.2 High dimensional setting 63 (d)
Page 74 and 75: 6.2 High dimensional setting 65 Mod
Page 76 and 77: 6.2 High dimensional setting 67 Mod
Page 78 and 79: 6.2 High dimensional setting 69 Fig
Page 80 and 81: Chapter 7 Concluding remarks We rev
Page 82 and 83: Bibliography Andersen, P. and R. Gi
Page 84 and 85: Appendix A R Codes A.1 Simulation c
Page 86 and 87: A.1 Simulation code for the Lasso i
Page 88 and 89: A.2 Simulation code for the adaptiv
Page 90 and 91:
A.2 Simulation code for the adaptiv
show all

Subsampling estimates of the Lasso distribution.

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?