28.07.2014 Views

Orthogonality Does Not Imply Asymptotic Normality DA Freedman ...

Orthogonality Does Not Imply Asymptotic Normality DA Freedman ...

Orthogonality Does Not Imply Asymptotic Normality DA Freedman ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Orthogonality</strong> <strong>Does</strong> <strong>Not</strong> <strong>Imply</strong> <strong>Asymptotic</strong> <strong>Normality</strong><br />

<strong>DA</strong> <strong>Freedman</strong><br />

Statistics 215 October 2007<br />

Consider random variables which are orthogonal, with mean 0 and variance 1, and uniformly<br />

bounded fourth moments. The CLT need not hold—i.e., the sum need not be asymptotically<br />

normal—because independence is not assumed. (The CLT, being a theorem, remains true.) We<br />

present several examples, normalizing the sum S n = X 1 +···X n by √ n,orby<br />

n∑<br />

D n = √ Xj 2.<br />

The examples are relevant to general forms of the OLS model Y = Mβ + ɛ that require only<br />

cov(ɛ|M) = σ 2 I n×n rather than IID errors; normalization by D n is akin to normalizing regression<br />

statistics by ˆσ . The punchline: the usual asymptotics need not hold for the OLS estimator ˆβ, without<br />

the assumption of IID errors given M. (We write M for the design matrix to avoid confusion with<br />

the random variables X j .)<br />

(∗) Conditions. The X j have E(X j ) = 0 and E(Xj 2) = 1. Furthermore E(X4 j<br />

) is uniformly<br />

bounded, and E(X j X k ) = 0 for j ̸= k.<br />

Example 1. Let Z and {Y j } be independent. Suppose the Y j are independent, E(Y j ) = 0,<br />

E(Z 2 ) = E(Yj 2)<br />

= 1. Finally, suppose E(Z4 ) and E(Yj 4)<br />

are finite. Let X j = ZY j . These random<br />

variables satisfy the conditions (∗),butS n / √ n isn’t asymptotically normal: the limiting distribution<br />

is normal, multiplied by Z. Normalizing by D n does give asymptotic normality, because Z cancels.<br />

Example 2. Let U j = 0or √ 2 be a sequence of random variables constructed as follows.<br />

With probability 1/2, the sequence consists of a long block of 0’s, followed by a very long block<br />

of √ 2’s, followed by a very very long block of 0’s, etc. With the remaining probability 1/2, the<br />

0’s and √ 2’s are interchanged. The Y j are IID ±1 with probability 1/2, independent of {U j }. Let<br />

X j = U j Y j . Again, these random variables satisfy the conditions (∗). Clearly, max j Uj<br />

2 = 2 and<br />

Dn 2 →∞, so max j≤n Uj<br />

2 = o(D2 n ). Furthermore, var(S n|U) = Dn 2. Thus, S n/D n is asymptotically<br />

normal. With rapidly increasing block length, D 2 n /n oscillates between 0+ and 2−. So S n/ √ n<br />

isn’t asymptotically normal.<br />

Our next example involves ξ j = sin(jθ), where θ is uniform on the circle [0, 2π) and j is<br />

an integer; the main interest is j = 1, 2,.... If z j = exp(ijθ) with i = √ −1 and exp(z) = e z ,<br />

then ξ j is the imaginary part of z j = cos(jθ) + i sin(jθ). The next lemma follows by computing<br />

moments.<br />

j=1<br />

Lemma 2. The z j all have the same distribution; furthermore, for each j,asn →∞, the joint<br />

distribution of z n and z j converges weak-star, the two variables becoming independent.<br />

1


Lemma 3.<br />

(i) E(ξ j ) = 0; in fact, all odd moments vanish.<br />

(ii) E(ξj 2 ) = 1/2 and E(ξ4<br />

j ) = 3/8.<br />

(iii) E(ξ j ξ k ) = 0 for j ̸= k.<br />

(iv) ∑ n<br />

j=1 cos(jθ) is the real part of<br />

and ∑ n<br />

j=1 sin(jθ) is the imaginary part.<br />

n (θ) = e(n+1)iθ − e iθ<br />

e iθ − 1<br />

(v) ∑ n<br />

j=1 sin 2 (jθ) = 1 2 (n − q n) where q n = ∑ n<br />

j=1 cos(2jθ) is the real part of n (2θ).<br />

Example 3. Let X j = √ 2ξ j . Conditions (∗) are satisfied. However, S n converges in distribution,<br />

and D 2 n is of order n. Whether we normalize S n by √ n or D n —or not at all—there is no<br />

asymptotic normality.<br />

Sourav Chatterjee suggested that examples could be based on U-statistics. For l = 1, 2,...,<br />

let the U l be IID, with P(U l =±1) = 1/2. Let<br />

Q n =<br />

∑<br />

1≤j̸=k≤n<br />

( n∑ ) 2 ( n∑<br />

U j U k = U l −<br />

whose distribution is asymptotic to n(χ 2 1 − 1). <strong>Not</strong>e that Q n+1 − Q n = 2 ( ∑ n<br />

l=1 U l<br />

)<br />

Un+1 .<br />

Example 4. Let T j = ∑ j<br />

l=1 U l/ √ j and let X j+1 = T j U j+1 for j = 1, 2,....Let X 1 = U 1 .<br />

Conditions (∗) are easily verified; for the rest, we rely on simulations. To begin with, S n is very<br />

skewed to the right, so cannot be asymptotically normal. On the other hand, S n /D n —although<br />

not far from normal—has a negative mean. We can replace T j by f(T j ) for suitable functions f ,<br />

although varf(T j ) may then depend a little on j and n. Iff(x)= x 6 , then S n itself has a much longer<br />

tail than the normal; indeed, S n is roughly like a symmetrized log normal variable. By contrast,<br />

S n /D n is short-tailed and bimodal. (Numerator and denominator are somewhat dependent.) Neither<br />

S n / √ n nor S n /D n is asymptotically normal.<br />

Back-of-the-envelope arguments suggest<br />

1<br />

√ n<br />

n∑<br />

j=1<br />

1<br />

n D2 n = 1 n<br />

( 1<br />

f √j<br />

n∑<br />

j=1<br />

l=1<br />

j∑<br />

U l<br />

)U j+1 →<br />

l=1<br />

( 1<br />

f √j<br />

∫ 1<br />

0<br />

,<br />

l=1<br />

U 2 l<br />

)<br />

f(B t / √ t)dB t (1)<br />

j∑ ) ∫ 2 1<br />

U l → f(B t / √ t) 2 dt (2)<br />

l=1<br />

2<br />

0


where B is Brownian motion. Because the summands are uncorrelated,<br />

⎧<br />

⎫ ⎧<br />

⎨ n/K<br />

1 ∑ ( 1<br />

j∑ ) ⎬<br />

var √ f √j U l U j+1<br />

⎩ n ⎭ = 1 n/K<br />

∑ ⎨ ( 1<br />

j∑ ) ⎫ ⎬ ( ) 1<br />

var<br />

n ⎩ f √j U l<br />

⎭ = O K<br />

j=1<br />

Likewise,<br />

⎧<br />

⎨ n/K<br />

1 ∑<br />

E<br />

⎩n<br />

j=1<br />

[ ( 1<br />

f √j<br />

l=1<br />

j=1<br />

⎫ ⎧<br />

j∑ )] 2⎬ U l<br />

⎭ = 1 n/K<br />

∑ ⎨[<br />

E f<br />

n ⎩<br />

l=1<br />

j=1<br />

The singularity near 0 therefore seems unimportant.<br />

( 1 √j<br />

l=1<br />

⎫<br />

j∑ )] 2⎬ ( ) 1<br />

U l<br />

⎭ = O K<br />

l=1<br />

(3)<br />

(4)<br />

Example 5. (Chaterjee.) Let the summands be U j U k , with j 1 + 2.6 √ 2 } = .03,<br />

while P { |Z| > 2.6 } = .009, where Z is N(0,1). The tail area is off by a factor of 3, and it gets<br />

worse further out. On the other hand, annoyingly,<br />

P { (χ1 2 − 1)/√ 2 > 2 } {<br />

= P |Z| ><br />

l=1<br />

√<br />

1 + 2 √ 2} .= P {|Z| > 1.96}<br />

. = .05,<br />

The first probability is one-sided:<br />

P {(χ1 2 − 1)/√ 2 < −2} =P {Z 2 < 1 − 2 √ 2}=0,<br />

but the symmetric tail area is very close.<br />

Steve Evans has another construction, which gives a sequence X 1 ,X 2 ,... of uncorrelated<br />

random variables having mean 0 and variance 1, with subsequences of<br />

L { (X 1 +···+X n )/ √ n }<br />

close to any distribution with mean 0 and variance 1.<br />

For regression asymptotics assuming independent errors, see<br />

http://www.stat.berkeley.edu/users/census/Ftest.pdf<br />

3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!