06.06.2013 Views

Theory of Statistics - George Mason University

Theory of Statistics - George Mason University

Theory of Statistics - George Mason University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

176 2 Distribution <strong>Theory</strong> and Statistical Models<br />

An exponential family that is not <strong>of</strong> full rank may also be degenerate,<br />

meaning that there exists a vector a and a constant r such that<br />

<br />

pθ(x)dx = 1.<br />

a T x=r<br />

(The term “degenerate” in this sense is also applied to any distribution,<br />

whether in an exponential family or not.) The support <strong>of</strong> a degenerate distribution<br />

within IR d is effectively within IR k for k < d. An example <strong>of</strong> a nonfull<br />

rank exponential family that is also a degenerate family is the family <strong>of</strong> multinomial<br />

distributions (page 830). A continuous degenerate distribution is also<br />

called a singular distribution.<br />

An example <strong>of</strong> a family <strong>of</strong> distributions that is a nonfull rank exponential<br />

family is the normal family N(µ, µ 2 ).<br />

A nonfull rank exponential family is also called a curved exponential family.<br />

2.4.7 Properties <strong>of</strong> Exponential Families<br />

Exponential families have a number <strong>of</strong> useful properties. First <strong>of</strong> all, we note<br />

that an exponential family satisfies the Fisher information regularity conditions.<br />

This means that we can interchange the operations <strong>of</strong> differentiation<br />

and integration, a fact that we will exploit below. Other implications <strong>of</strong> the<br />

Fisher information regularity conditions allow us to derive optimal statistical<br />

inference procedures, a fact that we will exploit in later chapters.<br />

In the following, we will use the usual form <strong>of</strong> the PDF,<br />

fθ(x) = exp(η(θ) T T(x) − ξ(θ))h(x),<br />

and we will assume that it is <strong>of</strong> full rank.<br />

We first <strong>of</strong> all differentiate both sides <strong>of</strong> the identity, wrt θ,<br />

<br />

fθ(x)dx = 1 (2.17)<br />

Carrying the differentiation on the left side under the integral, we have<br />

<br />

<br />

Jη(θ)T(x) − ∇ξ(θ) exp η(θ)T(x) − ξ(θ) h(x)dx = 0.<br />

Hence, because by assumption Jη(θ) is <strong>of</strong> full rank, by rearranging terms under<br />

the integral and integrating out terms not involving x, we get the useful fact<br />

E(T(X)) = (Jη(θ)) −1 ∇ξ(θ). (2.18)<br />

We now consider E(T(X)). As it turns out, this is a much more difficult<br />

situation. Differentiation yields more complicated objects. (See Gentle (2007),<br />

page 152, for derivatives <strong>of</strong> a matrix wrt a vector.) Let us first consider the<br />

scalar case; that is, η(θ) and T(x) are scalars, so η(θ) T T(x) is just η(θ)T(x).<br />

<strong>Theory</strong> <strong>of</strong> <strong>Statistics</strong> c○2000–2013 James E. Gentle

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!