Subsampling estimates of the Lasso distribution.

More documents

Recommendations

Info

6.2 High dimensional setting 61 3. If ˆp (1) < α/s but ˆp (2) ≥ α/(s − 1), accept H (2) , ·, H (s) and stop. If ˆp (1) < α/s and ˆp (2) < α/(s − 1), reject H (2) in addition to H (1) and test the remaining s − 2 hypotheses at level α/(s − 2). One can show that the Bonferonni-Holm procedure controls the FWER (Lehmann and Romano, 2005, Theorem 9.1.2). In our case, for an individual hypothesis H 0,j : β j = 0, we propose the following p-value based on subsampling: for a realized statistics √ n ˆβ j n computed on the whole sample, choose as p-value the proportion among B subsamples of values √ r( ˆβ j n,b,i − ˆβ j n) lying beyond or beneath it, depending thereon if it is positive or negative, more precisely: ˆp j = { B −1 ∑ B i=1 1{ √ r( ˆβ j n,b,i − ˆβ j n) ≥ √ n ˆβ j n} if ˆβ j n ≥ 0 B −1 ∑ B i=1 1{ √ r( ˆβ j n,b,i − ˆβ j n) ≤ √ n ˆβ j n} if ˆβ j n < 0 (6.1.3.2) where r = b/(1 − b/n) is the finite sample corrected subsample size. Similar p-values are considered in Berg, McMurry, and Politis (2010). Here, the disctintion between the positive and the case is made to take in consideration power issues. Powers and Family Wise Errors Rates based on 500 replications are reported in tables 6.4, 6.5 and 6.6. The procedure controls the FWER while leaving the power unaffected, indicating that the proposed p-values are correct. 6.2 High dimensional setting To access the performance of subsampling in a high dimensional setting, we apply it to the adaptive Lasso presented in Chapter 3. We generate four data sets D,D’,E and E’ which differ in the correlation structure between covariates and the dimension of the parameters. They result from the linear model where Y i = x ′ iβ + ε i , i = 1, . . . , n β = (1, −1.25, , −1.25, 1, −1.5, 0) ′ is a 400 × 1 vector in models D and D’, and a 800 × 1 vector in models E E’. The errors ε i are taken i.i.d. standard normal. X is multivariate normal with mean zero and covariance matrix C given by • C i,j = 0.8 |i−j| 1{i, j ∈ {1, . . . , 5} or i, j ∈ {6, . . . , 400}} in model D, • C i,j = 0.8 |i−j| , i, j = 1, . . . , 400 in model D’, • C i,j = 0.8 |i−j| 1{i, j ∈ {1, . . . , 5} or i, j ∈ {6, . . . , 800}} in model E,
62 Numerical results • C i,j = 0.8 |i−j| , i, j = 1, . . . , 800 in model E’. Models D and E respect the partial orthogonality condition between relevant and irrelevant parameters while D’ and E’ violate it. Estimation procedure For a sample (Y i , x i ) n i=1 of simulated observations, we proceed as follows to construct confidence intervals for the coefficinents: (i) Center and scale the observations, that is, consider Ỹ i = Y i − n −1 ⎛ n ∑ ˜x ij = ⎝x ij − n −1 Y l , i = 1, . . . , n l=1 ⎞ ∑ n ( ⎠ n −1 n ∑ (x lj − lj l=1 k=1 ) n∑ −1 x kj ) 2 , j = 1, . . . , p n , i = 1, . . . , n. (ii) Set weights w j as w j = n −1 n ∑ i=1 Ỹ i˜x ij , j = 1, . . . , p n . (iii) Compute the Lasso path for the data set with scaled covariates, that is, for (Ỹi, diag(w 1 , . . . , w pn )˜x i ) n i=1. Then choose the penalization parameter by K-fold cross validation (we choose K=10) based on (Ỹi, diag(w 1 , . . . , w pn )x i ) n i=1 . Denote it by λ n,CV . Let ˜β n be the Lasso solution corresponding to (Ỹi, diag(w 1 , . . . , w pn )x i ) n i=1 and to the penalization parameter λ n,CV . Finally, set the adaptive Lasso solution to ˆβ n = diag(w −1 1 , . . . , w−1 p n ) ˜β n . (iv) Repeat the folowing steps for m = 1, . . . , B: (a) Generate a random subsample I m ⊂ {1, . . . , n} of size b by drawing without replacement. (b) Center and scale observations with index i ∈ I m to obtain a data set (Ỹ (m) i , x (m) i ) i∈Im . (m) (c) Compute the Lasso solution path for the data set (Ỹi , diag(w1 , . . . , w pn )x (m) i with covariates scaled by weights obtained in step ii. Let ˜β (m) b be the Lasso solution corresponding to (Ỹi , diag(w1 , . . . , w pn )x (m) i ) i∈Im and to the rescaled (m) penalization parameter λ b,CV = λ n,CV (b/n) 0.4 . ) i∈Im
Page 1:
✄✄ ✄ ✄ ✄ ✄✄ ✄ ✄
Page 4 and 5:
iv Abstract
Page 6 and 7:
vi CONTENTS Contents Notation viii
Page 8 and 9:
viii Notation List of Tables 6.1 Mo
Page 10 and 11:
Chapter 1 Introduction In this thes
Page 12 and 13:
Chapter 2 Minimizers of convex proc
Page 14 and 15:
2.1 Convergence in probability 5 fo
Page 16 and 17:
2.2 Convergence in distribution 7 a
Page 18 and 19:
2.2 Convergence in distribution 9 L
Page 20 and 21: 2.2 Convergence in distribution 11
Page 22 and 23: 2.2 Convergence in distribution 13
Page 24 and 25: 2.3 A continous mapping theorem for
Page 30 and 31: Chapter 3 Application to the Lasso
Page 32 and 33: 3.2 Limit in distribution 23 almost
Page 34 and 35: 3.2 Limit in distribution 25 We als
Page 36 and 37: 3.2 Limit in distribution 27 3.2.2
Page 38 and 39: 3.2 Limit in distribution 29 Figure
Page 40 and 41: Chapter 4 The adaptive Lasso in a h
Page 42 and 43: 33 B.3 (Adaptive irrepresentable co
Page 44 and 45: 35 exists a constant K d depending
Page 46 and 47: 4.1 Variable selection consistency
Page 48 and 49: 4.2 Partial asymptotic normality 39
Page 50 and 51: 4.3 Marginal regressors as initial
Page 52 and 53: Chapter 5 Subsampling In his semina
Page 54 and 55: 5.1 Pointwise consistency for distr
Page 56 and 57: 5.2 Uniform consistency for quantil
Page 64 and 65: Chapter 6 Numerical results Motivat
Page 66 and 67: 6.1 Low dimensinal setting 57 5. De
Page 68 and 69: 6.1 Low dimensinal setting 59 Figur
Page 72 and 73: 6.2 High dimensional setting 63 (d)
Page 74 and 75: 6.2 High dimensional setting 65 Mod
Page 76 and 77: 6.2 High dimensional setting 67 Mod
Page 78 and 79: 6.2 High dimensional setting 69 Fig
Page 80 and 81: Chapter 7 Concluding remarks We rev
Page 82 and 83: Bibliography Andersen, P. and R. Gi
Page 84 and 85: Appendix A R Codes A.1 Simulation c
Page 86 and 87: A.1 Simulation code for the Lasso i
Page 88 and 89: A.2 Simulation code for the adaptiv
Page 90 and 91: A.2 Simulation code for the adaptiv
show all

Subsampling estimates of the Lasso distribution.

Create successful ePaper yourself

Delete template?

Save as template?