06.06.2013 Views

Theory of Statistics - George Mason University

Theory of Statistics - George Mason University

Theory of Statistics - George Mason University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.1 The Likelihood Function 443<br />

is a sum rather than a product. We <strong>of</strong>ten denote the log-likelihood without<br />

the “L” subscript. The notation for the likelihood and the log-likelihood varies<br />

with authors. My own choice <strong>of</strong> an uppercase “L” for the likelihood and a<br />

lowercase “l” for the log-likelihood is long-standing, and not based on any<br />

notational optimality consideration. Because <strong>of</strong> the variation in the notation<br />

for the log-likelihood, I will <strong>of</strong>ten use the “lL” notation because this expression<br />

is suggestive <strong>of</strong> the meaning.<br />

We will <strong>of</strong>ten work with either the likelihood or the log-likelihood as if<br />

there is only one observation.<br />

Likelihood Principle<br />

According to the likelihood principle in statistical inference all <strong>of</strong> the information<br />

that the data provide concerning the relative merits <strong>of</strong> two hypotheses is<br />

contained in the likelihood ratio <strong>of</strong> those hypotheses and the data; that is, if<br />

for x and y,<br />

L(θ ; x)<br />

= c(x, y) ∀θ, (6.4)<br />

L(θ ; y)<br />

where c(x, y) is constant for given x and y, then any inference about θ based<br />

on x should be in agreement with any inference about θ based on y.<br />

Although at first glance, we may think that the likelihood principle is so<br />

obviously the right way to make decisions, Example 6.1 may cause us to think<br />

more critically about this principle.<br />

The likelihood principle asserts that for making inferences about a probability<br />

distribution, the overall data-generating process need not be considered;<br />

only the observed data are relevant.<br />

Example 6.1 The likelihood principle in sampling from a Bernoulli<br />

distribution<br />

In Example 3.12 we considered the problem <strong>of</strong> making inferences on the parameter<br />

π in a family <strong>of</strong> Bernoulli distributions.<br />

One approach was to take a random sample <strong>of</strong> size n, X1, . . ., Xn from the<br />

Bernoulli(π), and then use T = Xi, which has a binomial distribution with<br />

parameters n and π.<br />

Another approach was to take a sequential sample, X1, X2, . . ., until a<br />

fixed number t <strong>of</strong> 1’s have occurred. The size <strong>of</strong> the sample N is random and<br />

the random variable N has a negative binomial distribution with parameters<br />

t and π.<br />

Now, suppose we take the first approach with n = n0 and we observe<br />

T = t0; and then we take the second approach with t = t0 and we observe<br />

N = n0. Using the PDFs in equations 3.43 and 3.44 we get the likelihoods<br />

and<br />

LB(π) =<br />

n0<br />

<strong>Theory</strong> <strong>of</strong> <strong>Statistics</strong> c○2000–2013 James E. Gentle<br />

t0<br />

<br />

π t0 (1 − π) n0−t0 (6.5)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!