Student Notes To Accompany MS4214: STATISTICAL INFERENCE
Student Notes To Accompany MS4214: STATISTICAL INFERENCE
Student Notes To Accompany MS4214: STATISTICAL INFERENCE
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
2.8 Optimality Properties of the MLE<br />
Suppose that an experiment consists of measuring random variables x1, x2, . . . , xn which<br />
are iid with probability distribution depending on a parameter θ. Let ˆ θ be the MLE<br />
of θ. Define<br />
W1 = � E[I(θ)]( ˆ θ − θ)<br />
W2 = � I(θ)( ˆ θ − θ)<br />
�<br />
W3 = E[I( ˆ θ)]( ˆ θ − θ)<br />
�<br />
W4 = I( ˆ θ)( ˆ θ − θ).<br />
Then, W1, W2, W3, and W4 are all random variables and, as n → ∞, the probabilistic<br />
behaviour of each of W1, W2, W3, and W4 is well approximated by that of a N(0, 1)<br />
random variable. Then, since E[W1] ≈ 0, we have that E[ ˆ θ] ≈ θ and so ˆ θ is approx-<br />
imately unbiased. Also Var[W1] ≈ 1 implies that Var[ ˆ θ] ≈ (E[I(θ)]) −1 and so ˆ θ is<br />
approximately efficient.<br />
Let the data X have probability distribution g(X; θ) where θ = (θ1, θ2, . . . , θm) is a<br />
vector of m unknown parameters. Let I(θ) be the m×m information matrix as defined<br />
above and let E[I(θ)] be the m × m matrix obtained by replacing the elements of I(θ)<br />
by their expected values. Let ˆ θ be the MLE of θ. Let CRLBr be the rth diagonal<br />
element of [E[I(θ)]] −1 . For r = 1, 2, . . . , m, define W1r = ( ˆ θr − θr)/ √ CRLBr. Then, as<br />
n → ∞, W1r behaves like a standard normal random variable.<br />
Suppose we define W2r by replacing CRLBr by the rth diagonal element of the<br />
matrix [I(θ)] −1 , W3r by replacing CRLBr by the rth diagonal element of the matrix<br />
[EI( ˆ θ)] −1 and W4r by replacing CRLBr by the rth diagonal element of the matrix<br />
[I( ˆ θ)] −1 . Then it can be shown that as n → ∞, W2r, W3r, and W4r all behave like<br />
standard normal random variables.<br />
2.9 Data Reduction<br />
Definition 2.11 (Sufficiency). Consider a statistic T = t(X) that summarises the<br />
data so that no information about θ is lost. Then we call t(X) a sufficient statistic. �<br />
Example 2.12. T = t(X) = ¯ X is sufficient for µ when Xi ∼ iid N(µ, σ 2 ). �<br />
<strong>To</strong> better understand the motivation behind the concept of sufficiency consider<br />
three independent Binomial trials where θ = P (X = 1).<br />
39