12.07.2015 Views

Stat 5101 Lecture Notes - School of Statistics

Stat 5101 Lecture Notes - School of Statistics

Stat 5101 Lecture Notes - School of Statistics

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

106 <strong>Stat</strong> <strong>5101</strong> (Geyer) Course <strong>Notes</strong>Theorem 3.5 (Conditional Parallel Axis Theorem). If Y ∈ L 1E{[Y − a(X)] 2 | X} = var(Y | X)+[a(X)−E(Y |X)] 2 (3.35)The argument is exactly the same as that given for the unconditional version,except for the need to use Axiom CE1 instead <strong>of</strong> Axiom E2 to pull a function<strong>of</strong> the conditioning variable out <strong>of</strong> the conditional expectation. Otherwise, onlythe notation changes.If we take the unconditional expectation <strong>of</strong> both sides <strong>of</strong> (3.35), we getE ( E{[Y − a(X)] 2 | X} ) = E{var(Y | X)} + E{[a(X) − E(Y | X)] 2 }and by the iterated expectation axiom, the left hand side is the the unconditionalexpectation, that is,E{[Y − a(X)] 2 } = E{var(Y | X)} + E{[a(X) − E(Y | X)] 2 } (3.36)This relation has no special name, but it has two very important special cases.The first is the prediction theorem.Theorem 3.6. For predicting a random variable Y given the value <strong>of</strong> anotherrandom variable X, the predictor function a(X) that minimizes the expectedsquared prediction errorE{[Y − a(X)] 2 }is the conditional expectation a(X) =E(Y |X).The pro<strong>of</strong> is extremely simple. The expected squared prediction error isthe left hand side <strong>of</strong> (3.36). On the right hand side <strong>of</strong> (3.36), the first termdoes not contain a(X). The second term is the expectation <strong>of</strong> the square <strong>of</strong>a(X) − E(Y | X). Since a square is nonnegative and the expectation <strong>of</strong> anonnegative random variable is nonnegative (Axiom E1), the second term isalways nonnegative and hence is minimized when it is zero. By Theorem 2.32,that happens if and only if a(X) = E(Y | X) with probability one. (Yetanother place where redefinition on a set <strong>of</strong> probability zero changes nothing <strong>of</strong>importance).Example 3.5.1 (Best Prediction).Suppose X and Y have the unnormalized joint densityh(x, y) =(x+y)e −x−y ,x > 0,y>0,what function <strong>of</strong> Y is the best predictor <strong>of</strong> X in the sense <strong>of</strong> minimizing expectedsquared prediction error?The predictor that minimizes expected squared prediction error is the regressionfunctiona(Y )=E(X|Y)= 2+Y1+Yfound in Example 3.4.5.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!