Subsampling estimates of the Lasso distribution.

More documents

Recommendations

Info

5.1 Pointwise consistency for distribution estimation 45 For arbitrary ε > 0, set E n (ε) = { τ b |θ(P ) − ˆθ } n | ≤ ε . One then verifies that U n (x − ε) ≤ 1 {E n (ε)} ≤ ˆL n,b (x)1{E n (ε)} ≤ U n (x + ε) holds. Further, note that the assumption τ b /τ b → 0 implies P (E n (ε)) → 0, so for fixed ε, with probaility tending to one, we obtain For U n (x ± ε) → P J(x ± ε), we obtain U n (x − ε) ≤ ˆL n,b (x) ≤ U n (x + ε) J(x − ε, P ) ≤ ˆL n,b (x) ≤ J(x + ε, P ) + ε with probability tending to one. Hence, taking a sequence ε n → 0, such that x ± ε n are continuity points of J(·, P ) yields ˆL n,b (x) → P J(x, P ). Thus, it is sufficient to show that U n (x) → P J(x, P ) for every continuity point x of J(·, P ). For every 1 ≤ i ≤ N n , ˆθ n,b,i is a statistic based on a sample of size b drawn from the distribution P , hence U n (x) is a U-statistic of degree b with 0 ≤ U n (x) ≤ 1 and E (U n (x)) = J b (x, P ) By Hoeffding’s ineqality ((Serfling, 1980, Theorem A, Section 5.6)), it follows that P (U n (x) − J b (x, P ) ≥ t) ≤ exp (−2⌊n/b⌋t 2) for every t > 0. A similar inequality is obtained for t < 0 by considering −U n (x). So, it follows that U n (x) → P J b (x, P ), for continuity points x of J(·, P ), this yields U n (x) → P J(x, P ) since for such x, J b (x, P ) → J(x, P ) by the portmanteau theorem. To prove ii, we use the subsequence criterion. Following i, given an arbitrary subsequence {j n } n , for every continuity point x of J(·, P ), one can extract a further subsequence {k jn } n such that L kjn (x) → J(x, P ) almost surely. By a diagonal argument, one can assume that L kjn (x) → J(x, P ) almost surely for every x in a countable subset of the real line. So, we obtain L kjn J(·, P ). By continuity of J(·, P ), it follows from Polya’s theorem that ∣ sup ∣L kjn (x) − J(x, P ) ∣ → 0 x∈R almost surely, which completes the argument. To prove iii, for α ∈ (0, 1), define and c L (1 − α) = inf {x ∈ R|J(x, P ) ≥ 1 − α} c U (1 − α) = sup {x ∈ R|J(x, P ) ≤ 1 − α} Then choose ε > 0 such that c L (1 − α) − ε and c U (1 − α) + ε are both continuity points of J(·, P ). Following i, we have both ˆL n,b (c L (1 − α) − ε) → P J(c L (1 − α) − ε, P )
46 Subsampling and ˆL n,b (c U (1 − α) + ε) → P J(c U (1 − α) + ε, P ). Hence, the sets {ˆLn,b (c L (1 − α) − ε) < 1 − α ≤ ˆL n,b (c U (1 − α) + ε)} ⊆ { } −1 c L (1 − α) − ε < ˆL n,b (1 − α) ≤ c U(1 − α) + ε have probability tending to one as n → ∞. It follows that ( ) P τ n (ˆθ n − θ(P )) ≤ ĉ n,b (1 − α) ≤ J n (c U (1 − α) + ε, P ) + o(1) and P ( ) τ n (ˆθ n − θ(P )) ≤ ĉ n,b (1 − α) ≥ J n (c L (1 − α) − ε, P ) + o(1). Letting n tend to infinity first, then ε tend to zero yields, together with the Portmanteau Theorem, the inequalities. Finally iv can be proved similarly to i and ii using Borel-Cantelli Lemma. Remark. (i) Note that point iii also holds for the root U n,b . Indeed, the proof for ˆL n,b (·) solely rests on the convergence in probability of ˆL n,b (x) to J(x, P ) for every continuity point x of J(·, P ). As seen in the proof of i, this is a property shared by U n,b (·) as well, this without even requiring τ b /τ n → 0, the assumption b/n → 0 being sufficient. Obviously the price to pay are larger confidence intervals. (ii) The conclusion of point iii can also be stated for two-sided confidence intervals with obvious changes in the assumptions. In the regular situation where τ n = √ n, the choice b = n δ for some 0 < δ < 1 satisfies the conditions of Theorem 5.1.0.12. In view of our goal, constructing confidence intervals for Lasso estimates, the message conveyed by Theorem 5.1.0.12 is that in the situation where the 1 − α quantile happens to be a discontinuity point, and this can indeed happen if the corresponding parameter is equal to zero (cf. Theorem 3.2.1.1), the subsampling confidence interval assymptotically carries an error which is in the worst case equal to the jump height at the quantile. However, as we will see in the next section, this conclusion is too pessimistic and it turns out that some form of uniform convergence is what we need to achieve consistency. 5.2 Uniform consistency for quantiles appproximation The present section focuses on the use of subsampling for the construction of confidence intervals only, in contrast to the previous one where the estimation of the distribution function in a uniform sense was also considered. We will see that achieving asymptotic valid or conservative confidence intervals is possible if the distribution functions satisfy some uniformity or monoticity condition in the limit. All results of this section, appart the Dvoretzky-Kiefer-Wolfowitz inequality, are due to Romano and Shaikh (2010) who stated their results in a uniform sense for a family of probability measures, we follow their exposition.
Page 1:
✄✄ ✄ ✄ ✄ ✄✄ ✄ ✄
Page 4 and 5: iv Abstract
Page 6 and 7: vi CONTENTS Contents Notation viii
Page 8 and 9: viii Notation List of Tables 6.1 Mo
Page 10 and 11: Chapter 1 Introduction In this thes
Page 12 and 13: Chapter 2 Minimizers of convex proc
Page 14 and 15: 2.1 Convergence in probability 5 fo
Page 16 and 17: 2.2 Convergence in distribution 7 a
Page 18 and 19: 2.2 Convergence in distribution 9 L
Page 20 and 21: 2.2 Convergence in distribution 11
Page 22 and 23: 2.2 Convergence in distribution 13
Page 24 and 25: 2.3 A continous mapping theorem for
Page 30 and 31: Chapter 3 Application to the Lasso
Page 32 and 33: 3.2 Limit in distribution 23 almost
Page 34 and 35: 3.2 Limit in distribution 25 We als
Page 36 and 37: 3.2 Limit in distribution 27 3.2.2
Page 38 and 39: 3.2 Limit in distribution 29 Figure
Page 40 and 41: Chapter 4 The adaptive Lasso in a h
Page 42 and 43: 33 B.3 (Adaptive irrepresentable co
Page 44 and 45: 35 exists a constant K d depending
Page 46 and 47: 4.1 Variable selection consistency
Page 48 and 49: 4.2 Partial asymptotic normality 39
Page 50 and 51: 4.3 Marginal regressors as initial
Page 52 and 53: Chapter 5 Subsampling In his semina
Page 56 and 57: 5.2 Uniform consistency for quantil
Page 64 and 65: Chapter 6 Numerical results Motivat
Page 66 and 67: 6.1 Low dimensinal setting 57 5. De
Page 68 and 69: 6.1 Low dimensinal setting 59 Figur
Page 70 and 71: 6.2 High dimensional setting 61 3.
Page 72 and 73: 6.2 High dimensional setting 63 (d)
Page 74 and 75: 6.2 High dimensional setting 65 Mod
Page 76 and 77: 6.2 High dimensional setting 67 Mod
Page 78 and 79: 6.2 High dimensional setting 69 Fig
Page 80 and 81: Chapter 7 Concluding remarks We rev
Page 82 and 83: Bibliography Andersen, P. and R. Gi
Page 84 and 85: Appendix A R Codes A.1 Simulation c
Page 86 and 87: A.1 Simulation code for the Lasso i
Page 88 and 89: A.2 Simulation code for the adaptiv
Page 90 and 91: A.2 Simulation code for the adaptiv
show all

Subsampling estimates of the Lasso distribution.

Create successful ePaper yourself

Delete template?

Save as template?