Student Notes To Accompany MS4214: STATISTICAL INFERENCE

More documents

Recommendations

Info

When θ ∈ R1 we can define the score function as the first derivative of the log-likelihood S(θ) = ∂ ln L(θ). ∂θ The maximum likelihood estimate (MLE) ˆ θ of θ is the solution to the score equation S(θ) = 0. At the maximum, the second partial derivative of the log-likelihood is negative, so we define the curvature at ˆ θ as I( ˆ θ) where I(θ) = − ∂2 ln L(θ). ∂θ2 We can check that a solution ˆ θ of the equation S(θ) = 0 is actually a maximum by checking that I( ˆ θ) > 0. A large curvature I( ˆ θ) is associated with a tight or strong peak, intuitively indicating less uncertainty about θ. In likelihood theory I(θ) is a key quantity called the observed Fisher information, and I( ˆ θ) is the observed Fisher information evaluated at the MLE ˆ θ. Although I(θ) is a function, I( ˆ θ) is a scalar. The likelihood function L(θ|x) supplies an order of preference or plausibility among possible values of θ based on the observed y. It ranks the plausibility of possible values of θ by how probable they make the observed y. If P (x|θ = θ1) > P (x|θ = θ2) then the observed x makes θ = θ1 more plausible than θ = θ2, and consequently from (2.2), L(θ1|x) > L(θ2|x). The likelihood ratio L(θ1|x)/L(θ2|x) = f(θ1|x)/f(θ2|x) is a measure of the plausibility of θ1 relative to θ2 based on the observed fact y. The relative likelihood L(θ1|x)/L(θ2|x) = k means that the observed value x will occur k times more frequently in repeated samples from the population defined by the value θ1 than from the population defined by θ2. Since only ratios of likelihoods are meaningful, it is convenient to standardize the likelihood with respect to its maximum. Define the relative likelihood as R(θ|x) = L(θ|x)/L( ˆ θ|x). The relative likelihood varies between 0 and 1. The MLE ˆ θ is most plausible value of θ in that it makes the observed sample most probable. The relative likelihood measures the plausibility of any particular value of θ relative to that of ˆ θ. When the random variables X1, . . . , Xn are mutually independent we can write the joint density as fX(x) = j=1 where x = (x1, . . . , xn) ′ is a realization of the random vector X = (X1, . . . , Xn) ′ , and the likelihood function becomes LX(θ|x) = n� n� j=1 fXj (xj) fXj (xj|θ). When the densities fXj (xj) are identical, we unambiguously write f(xj). 19
Example 2.2 (Bernoulli Trials). Consider n independent Bernoulli trials. The jth ob- servation is either a “success” or “failure” coded xj = 1 and xj = 0 respectively, and P (Xj = xj) = θ xj (1 − θ) 1−xj for j = 1, . . . , n. The vector of observations y = (x1, x2, . . . , xn) T is a sequence of ones and zeros, and is a realization of the random vector Y = (X1, X2, . . . , Xn) T . As the Bernoulli outcomes are assumed to be independent we can write the joint probability mass function of Y as the product of the marginal probabilities, that is n� L(θ) = P (Xj = xj) = j=1 n� θ xj 1−xj (1 − θ) j=1 = θ � xj (1 − θ) n− � xj = θ r (1 − θ) n−r where r = � n i=1 xj is the number of observed successes (1’s) in the vector y. The log-likelihood function is then and the score function is ℓ(θ) = r ln θ + (n − r) ln(1 − θ), S(θ) = ∂ r (n − r) ℓ(θ) = − ∂θ θ 1 − θ . Solving for S( ˆ θ) = 0 we get ˆ θ = r/n. The information function is I(θ) = r n − r + > 0 ∀ θ, θ2 (1 − θ) 2 guaranteeing that ˆ θ is the MLE. Each Xi is a Bernoulli random variable and has expected value E(Xi) = θ, and variance Var(Xi) = θ(1 − θ). The MLE ˆ θ(y) is itself a random variable and has expected value E( ˆ � r � θ) = E = E n �� n i=1 Xi n � = 1 n n� i=1 E (Xi) = 1 n n� θ = θ, implying that ˆ θ(y) is an unbiased estimator of θ. The variance of ˆ θ(y) is Var( ˆ ��n i=1 θ) = Var Xi � = n 1 n2 n� Var (Xi) = 1 n2 n� (1 − θ)θ (1 − θ)θ = . n Finally, note that i=1 I(θ) = E [I(θ)] = E(r) n − E(r) n + = θ2 (1 − θ) 2 θ + n 1 − θ = i=1 i=1 n θ(1 − θ) = � Var[ ˆ �−1 θ] , and ˆ θ attains the Cramer-Rao lower bound (CRLB). � 20
Page 1 and 2: Student Notes To Accompany MS4214:
Page 3 and 4: 4.8 Worked Problems . . . . . . . .
Page 5 and 6: Suppose that, in order to learn som
Page 7 and 8: Example 1.4 (Blood pressure). We wi
Page 9 and 10: Chapter 2 The Theory of Estimation
Page 11 and 12: Thus if we carried out an infinite
Page 13 and 14: Problem 2.1. Let X have a binomial
Page 15 and 16: Problem 2.6. Let X1, . . . , Xn be
Page 17 and 18: Theorem 2.8 (Cramér Rao lower boun
Page 19: Let us propose ˆµ = ¯ X as an es
Page 23 and 24: Relative Likelihood 0.0 0.2 0.4 0.6
Page 25 and 26: Example 2.7 (Exponential distributi
Page 27 and 28: ℓ(µ) = −m ln µ − 1 m� ti
Page 29 and 30: 2.5 Multi-parameter Estimation Supp
Page 31 and 32: Lemma 2.9 (Joint distribution of th
Page 33 and 34: So, if | ˆ θ − θ0| is small, w
Page 35 and 36: Then the log-likelihood function is
Page 37 and 38: A regression of the empirical distr
Page 39 and 40: 2.7 The Invariance Principle How do
Page 41 and 42: Event Probability Set 0 0 0 (1 −
Page 43 and 44: 2.10 Worked Problems The Problems 1
Page 45 and 46: (a) Show that ˆµ is an unbiased e
Page 47 and 48: 4. E(X2 ) = � ∞ 0 x2f(x)dx =
Page 49 and 50: Student Questions 1. Let X1, X2, .
Page 51 and 52: Chapter 3 The Theory of Confidence
Page 53 and 54: Suppose that we have data X1, X2, .
Page 55 and 56: Lemma 3.2 (The Student t-distributi
Page 57 and 58: Example 3.6. Suppose that we have d
Page 59 and 60: 3.3 Approximate Confidence Interval
Page 61 and 62: 3.4 Worked Problems The Problems 1.
Page 63 and 64: Student Questions 1. Let X1, X2, .
Page 65 and 66: Chapter 4 The Theory of Hypothesis
Page 67 and 68: Example 4.3 (The power function). S
Page 69 and 70: (d) Suppose Θ0 = {(µ, σ) : −
Page 71 and 72:
(c) Θ0 = {(µ, µ,σ) : −∞ <
Page 73 and 74:
The Score Test Statistic: This test
Page 75 and 76:
4.5 The Neyman-Pearson Lemma Suppos
Page 77 and 78:
According to the Neyman-Pearson lem
Page 79 and 80:
would consider to be an unusually l
Page 81 and 82:
pendent we would expect the proport
Page 83 and 84:
3. Explain what is meant by the pow
Page 85 and 86:
For the null hypothesis θ = 1, the
Page 87 and 88:
0.1820. Similarly P (X = 3) = 0.218
Page 89 and 90:
Appendix A Review of Probability A.
Page 91 and 92:
where Xi ∼ Bernoulli(θ) for i =
Page 93 and 94:
The expected value of the jth momen
Page 95 and 96:
A.3 Continuous Random Variables A.3
Page 97 and 98:
A.3.4 Gaussian Distribution A rando
Page 99 and 100:
A.3.5 Weibull Distribution The Weib
Page 101 and 102:
Next, let g(y) be a monotone decrea
Page 103 and 104:
That is, X + Y ∼ Pois(θ + λ). =
Page 105 and 106:
since � ∞ −∞ e−αu2 du =
Page 107 and 108:
A.4.4 The Bivariate Normal Distribu
Page 109 and 110:
A.5 Generating Functions Denote the
Page 111 and 112:
Density p.g.f. m.g.f. ch.f. c.g.f B
Page 113:
Uniform U(a, b) Continuous Distribu
show all

Student Notes To Accompany MS4214: STATISTICAL INFERENCE

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?