Stat 5101 Lecture Notes - School of Statistics

More documents

Recommendations

Info

106 Stat 5101 (Geyer) Course NotesTheorem 3.5 (Conditional Parallel Axis Theorem). If Y ∈ L 1E{[Y − a(X)] 2 | X} = var(Y | X)+[a(X)−E(Y |X)] 2 (3.35)The argument is exactly the same as that given for the unconditional version,except for the need to use Axiom CE1 instead of Axiom E2 to pull a functionof the conditioning variable out of the conditional expectation. Otherwise, onlythe notation changes.If we take the unconditional expectation of both sides of (3.35), we getE ( E{[Y − a(X)] 2 | X} ) = E{var(Y | X)} + E{[a(X) − E(Y | X)] 2 }and by the iterated expectation axiom, the left hand side is the the unconditionalexpectation, that is,E{[Y − a(X)] 2 } = E{var(Y | X)} + E{[a(X) − E(Y | X)] 2 } (3.36)This relation has no special name, but it has two very important special cases.The first is the prediction theorem.Theorem 3.6. For predicting a random variable Y given the value of anotherrandom variable X, the predictor function a(X) that minimizes the expectedsquared prediction errorE{[Y − a(X)] 2 }is the conditional expectation a(X) =E(Y |X).The proof is extremely simple. The expected squared prediction error isthe left hand side of (3.36). On the right hand side of (3.36), the first termdoes not contain a(X). The second term is the expectation of the square ofa(X) − E(Y | X). Since a square is nonnegative and the expectation of anonnegative random variable is nonnegative (Axiom E1), the second term isalways nonnegative and hence is minimized when it is zero. By Theorem 2.32,that happens if and only if a(X) = E(Y | X) with probability one. (Yetanother place where redefinition on a set of probability zero changes nothing ofimportance).Example 3.5.1 (Best Prediction).Suppose X and Y have the unnormalized joint densityh(x, y) =(x+y)e −x−y ,x > 0,y>0,what function of Y is the best predictor of X in the sense of minimizing expectedsquared prediction error?The predictor that minimizes expected squared prediction error is the regressionfunctiona(Y )=E(X|Y)= 2+Y1+Yfound in Example 3.4.5.
3.5. CONDITIONAL EXPECTATION AND PREDICTION 107The other important consequence of (3.36) is obtained by taking a(X) =E(Y)=µ Y (that is, a is the constant function equal to µ Y ). This givesE{[Y − µ Y ] 2 } = E{var(Y | X)} + E{[µ Y − E(Y | X)] 2 } (3.37)The left hand side of (3.37) is, by definition var(Y ). By the iterated expectationaxiom, E{E(Y | X)} = E(Y )=µ Y , so the second term on the right hand sideis the expected squared deviation of E(Y | X) from its expectation, which is,by definition, its variance. Thus we have obtained the following theorem.Theorem 3.7 (Iterated Variance Formula). If Y ∈ L 2 ,var(Y )=E{var(Y | X)} +var{E(Y |X)}.Example 3.5.2 (Example 3.3.1 Continued).Suppose X 0 , X 1 , ... is an infinite sequence of identically distributed randomvariables, having mean E(X i )=µ X and variance var(X i )=σX 2 , and suppose Nis a nonnegative integer-valued random variable independent of the X i havingmean E(N) =µ N and variance var(N) =σN 2 . Note that we have now tiedup the loose end in Example 3.3.1. We now know from Theorem 3.4 thatindependence of the X i and N impliesE(X i | N) =E(X i )=µ X .and similarlyvar(X i | N )=var(X i )=σX.2Question: What is the variance ofS N = X 1 + ···+X Nexpressed in terms of the means and variances of the X i and N?This is easy using the iterated variance formula. First, as we found in Example3.3.1,E(S N | N) =NE(X i | N)=Nµ X .A similar calculation givesvar(S N | N) =Nvar(X i | N )=Nσ 2 X(because of the assumed independence of the X i and N). Hencevar(S N )=E{var(S N | N )} +var{E(S N |N)}=E(NσX) 2 + var(Nµ X )= σXE(N)+µ 2 2 Xvar(N)= σXµ 2 N + µ 2 XσN2Again notice that it is impossible to do this problem any other way. Thereis not enough information given to use any other approach.Also notice that the answer is not exactly obvious. You might just guess,using your intuition, the answer to Example 3.3.1. But you wouldn’t guess this.You need the theory.
Page 1 and 2:
Stat 5101 Lecture NotesCharles J. G
Page 3 and 4:
Contents1 Random Variables and Chan
Page 5 and 6:
CONTENTSv5.1.4 Variance Matrices ..
Page 7 and 8:
CONTENTSviiD Relations Among Brand
Page 9 and 10:
Chapter 1Random Variables andChange
Page 11 and 12:
1.1. RANDOM VARIABLES 3Sometimes it
Page 13 and 14:
1.1. RANDOM VARIABLES 5that is, X i
Page 15 and 16:
1.2. CHANGE OF VARIABLES 7the union
Page 17 and 18:
1.2. CHANGE OF VARIABLES 9Thus even
Page 19 and 20:
1.2. CHANGE OF VARIABLES 11Every in
Page 21 and 22:
1.2. CHANGE OF VARIABLES 13where f
Page 23 and 24:
1.3. RANDOM VECTORS 151.3.1 Discret
Page 25 and 26:
1.4. THE SUPPORT OF A RANDOM VARIAB
Page 27 and 28:
1.5. JOINT AND MARGINAL DISTRIBUTIO
Page 29 and 30:
1.5. JOINT AND MARGINAL DISTRIBUTIO
Page 31 and 32:
1.6. MULTIVARIABLE CHANGE OF VARIAB
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Chapter 2Expectation2.1 Introductio
Page 41 and 42:
2.3. BASIC PROPERTIES 33expectation
Page 43 and 44:
2.3. BASIC PROPERTIES 35that is, th
Page 45 and 46:
2.4. MOMENTS 37The Multiplicativity
Page 47 and 48:
2.4. MOMENTS 39holds for all real-v
Page 49 and 50:
2.4. MOMENTS 41simple, it is often
Page 51 and 52:
2.4. MOMENTS 43In contrast, for all
Page 53 and 54:
2.4. MOMENTS 45is the sum of the ex
Page 55 and 56:
2.4. MOMENTS 47Proof. Just take a i
Page 57 and 58:
2.4. MOMENTS 49What happens to Coro
Page 59 and 60:
2.4. MOMENTS 51This inequality is a
Page 61 and 62:
2.4. MOMENTS 53(a)Why does (2.7) as
Page 63 and 64: 2.5. PROBABILITY THEORY AS LINEAR A
Page 83 and 84: 2.6. PROBABILITY IS A SPECIAL CASE
Page 85 and 86: 2.7. INDEPENDENCE 772.7 Independenc
Page 87 and 88: 2.7. INDEPENDENCE 79Then X and Y ar
Page 89 and 90: 2.7. INDEPENDENCE 81Show that the f
Page 91 and 92: Chapter 3Conditional Probability an
Page 93 and 94: 3.1. PARAMETRIC FAMILIES OF DISTRIB
Page 95 and 96: 3.2. CONDITIONAL PROBABILITY DISTRI
Page 97 and 98: 3.3. AXIOMS FOR CONDITIONAL EXPECTA
Page 103 and 104: 3.4. JOINT, CONDITIONAL, AND MARGIN
Page 113: 3.5. CONDITIONAL EXPECTATION AND PR
Page 117 and 118: 3.5. CONDITIONAL EXPECTATION AND PR
Page 119 and 120: Chapter 4Parametric Families ofDist
Page 121 and 122: 4.1. LOCATION-SCALE FAMILIES 113The
Page 123 and 124: 4.2. THE GAMMA DISTRIBUTION 115Show
Page 125 and 126: 4.3. THE BETA DISTRIBUTION 117cours
Page 127 and 128: 4.4. THE POISSON PROCESS 119Hence t
Page 129 and 130: 4.4. THE POISSON PROCESS 121Definit
Page 131 and 132: 4.4. THE POISSON PROCESS 123measure
Page 133 and 134: 4.4. THE POISSON PROCESS 1254-3. A
Page 135 and 136: Chapter 5Multivariate DistributionT
Page 137 and 138: 5.1. RANDOM VECTORS 129Again like r
Page 139 and 140: 5.1. RANDOM VECTORS 1315.1.6 Covari
Page 141 and 142: 5.1. RANDOM VECTORS 1335.1.7 Linear
Page 143 and 144: 5.1. RANDOM VECTORS 135Example 5.1.
Page 145 and 146: 5.1. RANDOM VECTORS 137Example 5.1.
Page 147 and 148: 5.1. RANDOM VECTORS 139where we hav
Page 149 and 150: 5.2. THE MULTIVARIATE NORMAL DISTRI
Page 159 and 160: 5.3. BERNOULLI RANDOM VECTORS 151Th
Page 161 and 162: 5.3. BERNOULLI RANDOM VECTORS 153yo
Page 163 and 164: 5.4. THE MULTINOMIAL DISTRIBUTION 1
Page 165 and 166:
5.4. THE MULTINOMIAL DISTRIBUTION 1
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Chapter 6Convergence Concepts6.1 Un
Page 175 and 176:
6.1. UNIVARIATE THEORY 167It simpli
Page 177 and 178:
6.1. UNIVARIATE THEORY 169density o
Page 179 and 180:
6.1. UNIVARIATE THEORY 171Rewriting
Page 181 and 182:
6.1. UNIVARIATE THEORY 173Comment T
Page 183 and 184:
6.1. UNIVARIATE THEORY 175Problems6
Page 185 and 186:
Chapter 7Sampling Theory7.1 Empiric
Page 187 and 188:
7.1. EMPIRICAL DISTRIBUTIONS 1797.1
Page 189 and 190:
7.1. EMPIRICAL DISTRIBUTIONS 181Def
Page 191 and 192:
7.1. EMPIRICAL DISTRIBUTIONS 183To
Page 193 and 194:
7.2. SAMPLES AND POPULATIONS 185The
Page 195 and 196:
7.2. SAMPLES AND POPULATIONS 187•
Page 197 and 198:
7.3. SAMPLING DISTRIBUTIONS OF SAMP
Page 199 and 200:
Page 201 and 202:
Page 203 and 204:
Page 205 and 206:
Page 207 and 208:
Page 209 and 210:
Page 211 and 212:
Page 213 and 214:
Page 215 and 216:
Page 217 and 218:
Page 219 and 220:
Appendix AGreek LettersTable A.1: T
Page 221 and 222:
Appendix BSummary of Brand-NameDist
Page 223 and 224:
B.1. DISCRETE DISTRIBUTIONS 215B.1.
Page 225 and 226:
B.2. CONTINUOUS DISTRIBUTIONS 217Th
Page 227 and 228:
B.2. CONTINUOUS DISTRIBUTIONS 219B.
Page 229 and 230:
B.4. DISCRETE MULTIVARIATE DISTRIBU
Page 231 and 232:
B.5. CONTINUOUS MULTIVARIATE DISTRI
Page 233 and 234:
B.5. CONTINUOUS MULTIVARIATE DISTRI
Page 235 and 236:
Appendix CAddition Rules forDistrib
Page 237 and 238:
Appendix DRelations Among BrandName
Page 239 and 240:
Appendix EEigenvalues andEigenvecto
Page 241 and 242:
E.2. EIGENVALUES AND EIGENVECTORS 2
Page 243 and 244:
E.2. EIGENVALUES AND EIGENVECTORS 2
Page 245 and 246:
E.3. POSITIVE DEFINITE MATRICES 237
Page 247:
Appendix FNormal Approximations for
show all

Stat 5101 Lecture Notes - School of Statistics

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?