Subsampling estimates of the Lasso distribution.

More documents

Recommendations

Info

Chapter 5 Subsampling In his seminal paper already, Tibshirani (1996) noted that bootstrap variance estimates for the lasso could take value zero. Knight and Fu (2000) showed by heuristical means that residual bootstrap estimates of the lasso distribution are inconsistent, formal arguments on this inconsistency were finally given by Chatterjee and Lahiri (2010) who proved that the conditional residual boostrap distribution given the data converges to a random measure. In the subsequent paper Chatterjee and Lahiri (2011), they propose a consistent modification of the Lasso which also allows for consistency of residual bootstrap estimates. However, their modification, which consists in setting smallish estimates to zero according to a threshold value, can be problematic in finite samples in the presence of small parameters, furthermore it involves choosing the value of the threshold parameter. In general, proving consistency of the bootstrap is a difficult task. It typically involves proving that the distribution of the root of interest is continuous in the sampling distribution, usually the empirical distribution. In this chapter we introduce subsampling, a resampling method without replacement which offers the advantage to be consistent under very weak aasumptions. In the remaining we will denote by X (n) a sample of n i.i.d. random variables drawn from some distribution P belonging to a family P. For a real valued parameter function θ : P → R and a root R n (X (n) , θ(P )), denote the distribution of R n under P by J n (·, P ). 5.1 Pointwise consistency for distribution estimation In this section, we restrict ourselves to the case R n (X (n) , θ(P )) = τ n (ˆθn (X (n)) − θ(P )) where ˆθ n is an estimator of the parameter θ and τ n is a scaling factor tending to infinity. For a sample X (n) of size n and a value b = b(n) < n, denote by {X n,b.i } i the set of all N n := ( n b) subsamples of size b. The idea is to approximate the distribution of τb (ˆθ b −θ(P )) instead, using the subsamples X n,b,i at hand (which are true samples of size b drawn from P ) and replacing θ by ˆθ n . This results in the following estimator ˆL n,b (x) := 1 ∑N n 1{τ b (ˆθ n,b,i − N ˆθ n ) ≤ x} (5.1.0.1) n i=1 43
44 Subsampling The consistency of ˆL n,b as an estimator to J n (·, P ) is derived under Assumption D. J(P ) as n → ∞. There exists a limit law J(P ) such that J n (P ) converges weakly to Theorem 5.1.0.12. (Politis et al., 1999, Theorem 2.2.1) Assume that assumption D holds. Also assume τ b /τ n → n, b → ∞, and b/n → 0 as n → ∞. Then the following are true: (i) If x is a continuity point of J(·, P ), then ˆL n,b (x) → P J(x, P ) (ii) If J(·, P ) is continuous, then ∣ sup ∣ˆL n,b (x) − J n (x, P ) ∣ → P 0 (iii) Let Correspondingly, define x∈R c n,b (1 − α) = inf{x ∈ R|ˆL n,b (x) ≥ 1 − α}. c(1 − α, P ) = inf{x ∈ R|J(x, P ) ≥ 1 − α}. Then ( 1 − α − ∆J(c(1 − α, P )) ≤ lim inf P n→∞ ≤ lim sup P n→∞ ≤ 1 − α. ) τ n (ˆθ n − θ(P ) ≤ c n,b (1 − α)) ) ( τ n (ˆθ n − θ(P )) ≤ c n,b (1 − α) If J(·, P ) is continuous at c(1 − α, P ), then ( ) lim P τ n (ˆθ n − θ(P )) ≤ c n,b (1 − α) n→∞ (iv) Assume that τ b (ˆθ n − θ(P )) → 0 almost surely and that ∞∑ exp{−d(n/b)} < ∞, n=1 = 1 − α for every d > 0, then the convergence in i and ii hold with probability one. Proof. Define U n (x) = U n (x, P ) = N −1 n N n ∑ i=1 { ) } 1 τ b (ˆθn,b,i − θ(P ) ≤ x To prove i, we show that U n (x) converges in probablity to J(x, P ) for every continuity point x of J(x, P ). Note that ˆL n,b (x) = N −1 n N n ∑ i=1 { 1 τ b (ˆθ n,b,i − θ(P )) + τ b (θ(P ) − ˆθ } n ) .
Page 1: ✄✄ ✄ ✄ ✄ ✄✄ ✄ ✄
Page 4 and 5: iv Abstract
Page 6 and 7: vi CONTENTS Contents Notation viii
Page 8 and 9: viii Notation List of Tables 6.1 Mo
Page 10 and 11: Chapter 1 Introduction In this thes
Page 12 and 13: Chapter 2 Minimizers of convex proc
Page 14 and 15: 2.1 Convergence in probability 5 fo
Page 16 and 17: 2.2 Convergence in distribution 7 a
Page 18 and 19: 2.2 Convergence in distribution 9 L
Page 20 and 21: 2.2 Convergence in distribution 11
Page 22 and 23: 2.2 Convergence in distribution 13
Page 24 and 25: 2.3 A continous mapping theorem for
Page 30 and 31: Chapter 3 Application to the Lasso
Page 32 and 33: 3.2 Limit in distribution 23 almost
Page 34 and 35: 3.2 Limit in distribution 25 We als
Page 36 and 37: 3.2 Limit in distribution 27 3.2.2
Page 38 and 39: 3.2 Limit in distribution 29 Figure
Page 40 and 41: Chapter 4 The adaptive Lasso in a h
Page 42 and 43: 33 B.3 (Adaptive irrepresentable co
Page 44 and 45: 35 exists a constant K d depending
Page 46 and 47: 4.1 Variable selection consistency
Page 48 and 49: 4.2 Partial asymptotic normality 39
Page 50 and 51: 4.3 Marginal regressors as initial
Page 54 and 55: 5.1 Pointwise consistency for distr
Page 56 and 57: 5.2 Uniform consistency for quantil
Page 64 and 65: Chapter 6 Numerical results Motivat
Page 66 and 67: 6.1 Low dimensinal setting 57 5. De
Page 68 and 69: 6.1 Low dimensinal setting 59 Figur
Page 70 and 71: 6.2 High dimensional setting 61 3.
Page 72 and 73: 6.2 High dimensional setting 63 (d)
Page 74 and 75: 6.2 High dimensional setting 65 Mod
Page 76 and 77: 6.2 High dimensional setting 67 Mod
Page 78 and 79: 6.2 High dimensional setting 69 Fig
Page 80 and 81: Chapter 7 Concluding remarks We rev
Page 82 and 83: Bibliography Andersen, P. and R. Gi
Page 84 and 85: Appendix A R Codes A.1 Simulation c
Page 86 and 87: A.1 Simulation code for the Lasso i
Page 88 and 89: A.2 Simulation code for the adaptiv
Page 90 and 91: A.2 Simulation code for the adaptiv

Subsampling estimates of the Lasso distribution.

Create successful ePaper yourself

Delete template?

Save as template?