Student Notes To Accompany MS4214: STATISTICAL INFERENCE
Student Notes To Accompany MS4214: STATISTICAL INFERENCE
Student Notes To Accompany MS4214: STATISTICAL INFERENCE
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Definition 2.8 (Score function). For the (possibly vector valued) observation X = x<br />
to be informative about θ, the density must vary with θ. If f(x|θ) is smooth and<br />
differentiable, this change is quantified to first order by the score function<br />
S(θ) = ∂<br />
∂θ ln f(x|θ) ≡ f ′ (x|θ)<br />
f(x|θ) .<br />
Under suitable regularity conditions (differentiation wrt θ and integration wrt x can<br />
be interchanged), we have<br />
E{S(θ)} =<br />
=<br />
� ′ f (x|θ)<br />
f(x|θ)dx =<br />
f(x|θ)<br />
�<br />
∂<br />
�� �<br />
f(x|θ)dx =<br />
∂θ<br />
∂<br />
∂θ<br />
f ′ (x|θ)dx ,<br />
1 = 0.<br />
Thus the score function has expectation zero. �<br />
True frequentism evaluates the properties of estimators based on their “long-run”<br />
behaviour. The value of x will vary from sample to sample so we have treated the score<br />
function as a random variable and looked at its average across all possible samples.<br />
Lemma 2.7 (Fisher information). The variance of S(θ) is the expected Fisher<br />
information about θ<br />
Proof. Using the chain rule<br />
I(θ) = E{S(θ) 2 } ≡ E<br />
∂2 ∂<br />
ln f =<br />
∂θ2 ∂θ<br />
� �<br />
1 ∂f<br />
f ∂θ<br />
= − 1<br />
f 2<br />
�<br />
∂f<br />
∂θ<br />
�<br />
∂ ln f<br />
= −<br />
∂f<br />
�� � �<br />
2<br />
∂<br />
ln f(x|θ)<br />
∂θ<br />
� 2<br />
� 2<br />
+ 1 ∂<br />
f<br />
2f ∂θ2 + 1 ∂<br />
f<br />
2f ∂θ2 If integration and differentiation can be interchanged<br />
�<br />
1 ∂<br />
E<br />
f<br />
2f ∂θ2 � �<br />
=<br />
∂2f ∂2<br />
dx =<br />
∂θ2 ∂θ2 �<br />
dx = ∂2<br />
1 = 0,<br />
∂θ2 thus<br />
X<br />
X<br />
� � �� � �<br />
2<br />
∂<br />
∂<br />
−E ln f(x|θ) = E ln f(x|θ) = I(θ). (2.1)<br />
∂θ2 ∂θ<br />
Variance measures lack of knowledge. Reasonable that the reciprocal of the variance<br />
should be defined as the amount of information carried by the (possibly vector valued)<br />
observation x about θ.<br />
15