(a) Let ˆ✓ be unbiased. Find E(ˆ✓ ✓) 2 (the average squared di↵erence between theestim<strong>at</strong>or <strong>and</strong> the true value of ✓), in terms of marginal moments of ˆ✓ <strong>and</strong> ✓.Hint: condition on ✓.(b) Repe<strong>at</strong> (a), except in this part suppose th<strong>at</strong> ˆ✓ is the Bayes procedure r<strong>at</strong>her thanassuming th<strong>at</strong> it is unbiased.Hint: condition on X.(c) Show th<strong>at</strong> it is impossible for ˆ✓ to be both the Bayes procedure <strong>and</strong> unbiased,except in silly problems where we get to know ✓ perfectly by observing X.Hint: if Y is a nonneg<strong>at</strong>ive r.v. with mean 0, then P (Y =0)=1.5. The surprise of learning th<strong>at</strong> an event with probability p happened is defined aslog 2 (1/p) (wherethelogisbase2soth<strong>at</strong>ifwelearnth<strong>at</strong>aneventofprobability1/2happened, the surprise is 1, which corresponds to having received 1 bit of inform<strong>at</strong>ion).Let X be a discrete r.v. whose possible values are a 1 ,a 2 ,...,a n (distinct realnumbers), with P (X = a j )=p j (where p 1 + p 2 + ···+ p n =1).The entropy of X is defined to be the average surprise of learning the value of X,i.e., H(X) = P nj=1 p j log 2 (1/p j ). This concept was used by Shannon to cre<strong>at</strong>e thefield of inform<strong>at</strong>ion theory, which is used to quantify inform<strong>at</strong>ion <strong>and</strong> has becomeessential for communic<strong>at</strong>ion <strong>and</strong> compression (e.g., MP3s <strong>and</strong> cell phones).(a) Explain why H(X 3 )=H(X), <strong>and</strong> give an example where H(X 2 ) 6= H(X).(b) Show th<strong>at</strong> the maximum possible entropy for X is when its distribution is uniformover a 1 ,a 2 ,...,a n ,i.e.,P (X = a j )=1/n. (This should make sense intuitively:learning the value of X conveys the most inform<strong>at</strong>ion on average when X is equallylikely to take any of its values, <strong>and</strong> the least possible inform<strong>at</strong>ion if X is a constant.)Hint: this can be done by Jensen’s inequality, without any need for calculus. To doso, consider a r.v. Y whose possible values are the probabilities p 1 ,...,p n ,<strong>and</strong>showwhy E(log 2 (1/Y )) apple log 2 (E(1/Y )) <strong>and</strong> how to interpret it.6. In a n<strong>at</strong>ional survey, a r<strong>and</strong>om sample of people are chosen <strong>and</strong> asked whether theysupport a certain policy. Assume th<strong>at</strong> everyone in the popul<strong>at</strong>ion is equally likelyto be surveyed <strong>at</strong> each step, <strong>and</strong> th<strong>at</strong> the sampling is with replacement (samplingwithout replacement is typically more realistic, but with replacement will be a goodapproxim<strong>at</strong>ion if the sample size is small compared to the popul<strong>at</strong>ion size). Let n bethe sample size, <strong>and</strong> let ˆp <strong>and</strong> p be the proportion of people who support the policyin the sample <strong>and</strong> in the entire popul<strong>at</strong>ion, respectively. Show th<strong>at</strong> for every c>0,P (|ˆp p| >c) apple 14nc 2 .2
St<strong>at</strong> 1<strong>10</strong> Penultim<strong>at</strong>e <strong>Homework</strong> Solutions, Fall 2011Prof. Joe Blitzstein (Department of St<strong>at</strong>istics, <strong>Harvard</strong> University)1. Judit plays in a total of N ⇠ Geom(s) chesstournamentsinhercareer.Supposeth<strong>at</strong> in each tournament she has probability p of winning the tournament, independently.Let T be the number of tournaments she wins in her career.(a) Find the mean <strong>and</strong> variance of T .We have T |N ⇠ Bin(N,p). By Adam’s Law,E(T )=E(E(T |N)) = E(Np)=p(1s)/s.By Eve’s Law,Var(T )=E(Var(T |N)) + Var(E(T |N))= E(Np(1 p)) + Var(Np)= p(1 p)(1 s)/s + p 2 (1 s)/s 2=p(1 s)(s +(1 s)p)s 2 .(b) Find the MGF of T . Wh<strong>at</strong> is the name of this distribution (with its parameters)?Let I j ⇠ Bern(p) be the indic<strong>at</strong>or of Judit winning the jth tournament. ThenE(e tT )=E(E(e tT |N))= E((pe t + q) N )1X= s (pe t +1 p) n (1 s) n=n=0s1 (1 s)(pe t +1 p) .This is reminiscent of the Geometric MGF used on HW 9. The famous discretedistributions we have studied whose possible values are 0, 1, 2,... are the Poisson,Geometric, <strong>and</strong> Neg<strong>at</strong>ive Binomial, <strong>and</strong> T is clearly not Poisson since its variancedoesn’t equal its mean. If T ⇠ Geom(✓), we have ✓ = , as found by settingE(T )= 1✓✓ss+p(1or by finding Var(T )/E(T ). Writing the MGF of T asE(e tT )=sss +(1 s)p (1 s)pe = s+(1t 11s)s)p,(1 s)ps+(1 s)p et