24.02.2013 Views

Optimality

Optimality

Optimality

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

y the P× 1 vector<br />

(3.1) ηn(p) =<br />

Linear prediction after model selection 295<br />

� θ[p] + (X[p] ′ X[p]) −1 X[p] ′ X[¬p]θ[¬p]<br />

(0, . . . ,0) ′<br />

with the conventions that ηn(0) = (0, . . . ,0) ′ ∈ R P and ηn(P) = θ. Furthermore, let<br />

Φn,p(t), t∈R k , denote the cdf of √ nA( ˜ θ(p)−ηn(p)), i.e., Φn,p(t) is the cdf of a centered<br />

Gaussian random vector with covariance matrix σ 2 A[p](X[p] ′ X[p]/n) −1 A[p] ′<br />

in case p > 0, and the cdf of point-mass at zero in R k in case p = 0. If p > 0 and<br />

if the matrix A[p] has rank k, then Φn,p(t) has a density with respect to Lebesgue<br />

measure, and we shall denote this density by φn,p(t). We note that ηn(p) depends<br />

on θ and that Φn,p(t) depends on σ (in case p > 0), although these dependencies<br />

are not shown explicitly in the notation.<br />

For p > 0, the conditional distribution of √ n ˜ θp(p) given √ nA( ˜ θ(p)−ηn(p)) = z<br />

is a Gaussian distribution with mean √ nηn,p(p) + bn,pz and variance σ 2 ζ 2 n,p, where<br />

(3.2) bn,p = C (p)′<br />

n (A[p](X[p] ′ X[p]/n) −1 A[p] ′ ) − , and<br />

(3.3) ζ 2 n,p = ξ 2 n,p− bn,pC (p)<br />

n .<br />

In the displays above, C (p)<br />

n stands for A[p](X[p] ′ X[p]/n) −1ep, with ep denoting<br />

the p-th standard basis vector in Rp , and (A[p](X[p] ′ X[p]/n) −1A[p] ′ ) − denotes a<br />

generalized inverse of the matrix indicated (cf. Note 3(v) in Section 8a.2 of Rao [7]).<br />

Note that, in general, the quantity bn,pz depends on the choice of generalized inverse<br />

in (3.2); however, for z in the column-space of A[p], bn,pz is invariant under the<br />

choice of inverse; cf. Lemma A.2 in Leeb [1]. Since √ nA( ˜ θ(p)−ηn(p)) lies in the<br />

column-space of A[p], the conditional distribution of √ n˜ θp(p) given √ nA( ˜ θ(p)−<br />

ηn(p)) = z is thus well-defined by the above. Observe that the vector of covariances<br />

between A˜ θ(p) and ˜ θp(p) is given by σ2n−1C (p)<br />

n . In particular, note that A˜ θ(p) and<br />

˜θp(p) are uncorrelated if and only if ζ2 n,p = ξ2 n,p (or, equivalently, if and only if<br />

bn,pz = 0 for all z in the column-space of A[p]); again, see Lemma A.2 in Leeb [1].<br />

Finally, for M denoting a univariate Gaussian random variable with zero mean<br />

and variance s2≥ 0, we abbreviate the probability P(|M− a| < b) by ∆s(a, b),<br />

a∈R∪{−∞,∞}, b∈R. Note that ∆s(·,·) is symmetric around zero in its first<br />

argument, and that ∆s(−∞, b) = ∆s(∞, b) = 0 holds. In case s = 0, M is to be<br />

interpreted as being equal to zero, such that ∆0(a, b) equals one if|a| < b and zero<br />

otherwise; i.e., ∆0(a, b) reduces to an indicator function.<br />

3.1. The known-variance case<br />

The cdf G∗ n,θ,σ (t) can be expanded as a weighted sum of conditional cdfs, condi-<br />

tional on the outcome of the model selection step, where the weights are given by<br />

the corresponding model selection probabilities. To this end, let G∗ n,θ,σ (t|p) denote<br />

the conditional cdf of √ nA( ˜ θ∗− θ) given that ˆp ∗ equals p forO≤p≤P; that<br />

is, G∗ n,θ,σ (t|p) = Pn,θ,σ( √ nA( ˜ θ∗− θ)≤t| ˆp ∗ = p), with t∈R k . Moreover, let<br />

π∗ n,θ,σ (p) = Pn,θ,σ(ˆp ∗ = p) denote the corresponding model selection probability.<br />

Then the unconditional cdf G∗ n,θ,σ (t) can be written as<br />

(3.4) G ∗ n,θ,σ(t) =<br />

P�<br />

p=O<br />

G ∗ n,θ,σ(t|p)π ∗ n,θ,σ(p).<br />

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!