06.06.2013 Views

Theory of Statistics - George Mason University

Theory of Statistics - George Mason University

Theory of Statistics - George Mason University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1.3 Sequences <strong>of</strong> Events and <strong>of</strong> Random Variables 93<br />

large. It may be useful to magnify the difference Xn − bn by use <strong>of</strong> some<br />

normalizing sequence <strong>of</strong> constants an:<br />

Yn = an(Xn − bn). (1.190)<br />

While the distribution <strong>of</strong> the sequence {Xn − bn} may be degenerate, the<br />

sequence {an(Xn − bn)} may have a distribution that is nondegenerate, and<br />

this asymptotic distribution may be useful in statistical inference. (This approach<br />

is called “asymptotic inference”.) We may note that even though we<br />

are using the asymptotic distribution <strong>of</strong> {an(Xn −bn)}, for a reasonable choice<br />

<strong>of</strong> a sequence <strong>of</strong> normalizing constants {an}, we sometimes refer to it as the<br />

asymptotic distribution <strong>of</strong> {Xn} itself, but we must remember that it is the<br />

distribution <strong>of</strong> the normalized sequence, {an(Xn − bn)}.<br />

The shift constants generally serve to center the distribution, especially<br />

if the limiting distribution is symmetric. Although linear transformations are<br />

<strong>of</strong>ten most useful, we could consider sequences <strong>of</strong> more general transformations<br />

<strong>of</strong> Xn; instead <strong>of</strong> {an(Xn −bn)}, we might consider {hn(Xn)}, for some<br />

sequence <strong>of</strong> functions {hn}.<br />

The Asymptotic Distribution <strong>of</strong> {g(Xn)}<br />

Applications <strong>of</strong>ten involve a differentiable Borel scalar function g, and we may<br />

be interested in the convergence <strong>of</strong> {g(Xn)}. (The same general ideas apply<br />

when g is a vector function, but the higher-order derivatives quickly become<br />

almost unmanageable.) When we have {Xn} converging in distribution to<br />

X + b, what we can say about the convergence <strong>of</strong> {g(Xn)} depends on the<br />

differentiability <strong>of</strong> g at b.<br />

Theorem 1.46<br />

Let X and {Xn} be random variables (k-vectors) such that<br />

an(Xn − bn) d → X, (1.191)<br />

where b1, b2, . . . is a sequence <strong>of</strong> constants such that limn→∞ bn = b < ∞, and<br />

a1, a2, . . . is a sequence <strong>of</strong> constant scalars such that limn→∞ an = ∞ or such<br />

that limn→∞ an = a > 0. Now let g be a Borel function from IR k to IR that is<br />

continuously differentiable at each bn. Then<br />

an(g(Xn) − g(bn)) d → (∇g(b)) T X. (1.192)<br />

Pro<strong>of</strong>. This follows from a Taylor series expansion <strong>of</strong> g(Xn) and Slutsky’s<br />

theorem.<br />

A common application <strong>of</strong> Theorem 1.46 arises from the simple corollary for<br />

the case when X in expression (1.191) has the multivariate normal distribution<br />

Nk(0, Σ) and ∇g(b) = 0:<br />

<strong>Theory</strong> <strong>of</strong> <strong>Statistics</strong> c○2000–2013 James E. Gentle

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!