13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 INTRODUCTION<br />

models for continuous data, the normal distribution plays a central role. This section<br />

presents the key distributions for categorical data: the binomial and multinomial<br />

distributions.<br />

1.2.1 Binomial Distribution<br />

Often, categorical data result from n independent and identical trials with two possible<br />

outcomes for each, referred <strong>to</strong> as “success” and “failure.” These are generic labels,<br />

and the “success” outcome need not be a preferred result. Identical trials means that<br />

the probability of success is the same for each trial. Independent trials means the<br />

response outcomes are independent random variables. In particular, the outcome of<br />

one trial does not affect the outcome of another. These are often called Bernoulli<br />

trials. Let π denote the probability of success for a given trial. Let Y denote the<br />

number of successes out of the n trials.<br />

Under the assumption of n independent, identical trials, Y has the binomial distribution<br />

with index n and parameter π. You are probably familiar with this distribution,<br />

but we review it briefly here. The probability of outcome y for Y equals<br />

P(y) =<br />

n!<br />

y!(n − y)! π y (1 − π) n−y , y = 0, 1, 2,...,n (1.1)<br />

To illustrate, suppose a quiz has 10 multiple-choice questions, with five possible<br />

answers for each. A student who is completely unprepared randomly guesses the<br />

answer for each question. Let Y denote the number of correct responses. The probability<br />

of a correct response is 0.20 for a given question, so n = 10 and π = 0.20. The<br />

probability of y = 0 correct responses, and hence n − y = 10 incorrect ones, equals<br />

P(0) =[10!/(0!10!)](0.20) 0 (0.80) 10 = (0.80) 10 = 0.107.<br />

The probability of 1 correct response equals<br />

P(1) =[10!/(1!9!)](0.20) 1 (0.80) 9 = 10(0.20)(0.80) 9 = 0.268.<br />

Table 1.1 shows the entire distribution. For contrast, it also shows the distributions<br />

when π = 0.50 and when π = 0.80.<br />

The binomial distribution for n trials with parameter π has mean and standard<br />

deviation<br />

E(Y) = μ = nπ, σ = � nπ(1 − π)<br />

The binomial distribution in Table 1.1 has μ = 10(0.20) = 2.0 and σ =<br />

√ [10(0.20)(0.80)] =1.26.<br />

The binomial distribution is always symmetric when π = 0.50. For fixed n, it<br />

becomes more skewed as π moves <strong>to</strong>ward 0 or 1. For fixed π, it becomes more

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!