01.03.2013 Views

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.1.3 The Chi-Square Goodness of Fit Test<br />

5.1 Inference on One Population 179<br />

The previous binomial test applied to a dichotomised population. When there are<br />

more than two categories, one often wishes to assess whether the observed<br />

frequencies of occurrence in each category are in accordance to what should be<br />

expected. Let us start with the r<strong>and</strong>om variable 5.4 <strong>and</strong> square it:<br />

Z<br />

2<br />

2<br />

( P − p)<br />

2 ⎛ 1 1 ⎞ ( X 1 − np)<br />

( X 2 − nq)<br />

= = n(<br />

P − p)<br />

=<br />

+<br />

pq / n<br />

⎜ +<br />

p q<br />

⎟<br />

, 5.5<br />

⎝ ⎠ np nq<br />

where X1 <strong>and</strong> X2 are the r<strong>and</strong>om variables associated with the number of<br />

derivation note that denoting Q = 1 − P we have (nP – np) 2 = (nQ – nq) 2 “successes” <strong>and</strong> “failures” in the n-sized sample, respectively. In the above<br />

. Formula<br />

5.5 conveniently expresses the fitting of X1 = nP <strong>and</strong> X2 = nQ to the theoretical<br />

values in terms of square deviations. Square deviation is a popular distance<br />

measure given its many useful properties, <strong>and</strong> will be extensively used in<br />

Chapter 7.<br />

Let us now consider k categories of events, each one represented by a r<strong>and</strong>om<br />

variable Xi, <strong>and</strong>, furthermore, let us denote by pi the probability of occurrence of<br />

each category. Note that the joint distribution of the Xi is a multinomial<br />

distribution, described in B.1.6. The result 5.5 is generalised for this multinomial<br />

distribution, as follows (see property 5 of B.2.7):<br />

2 ( X np ) 2<br />

k<br />

* 2<br />

i i<br />

= ~ χ k −1<br />

i= 1 npi<br />

∑ −<br />

χ , 5.6<br />

where the number of degrees of freedom, df = k – 1, is imposed by the restriction:<br />

k<br />

∑<br />

i=1<br />

x<br />

i =<br />

n . 5.7<br />

As a matter of fact, the chi-square law is only an approximation for the sampling<br />

distribution of χ ∗2 , given the dependency expressed by 5.7.<br />

In order to test the goodness of fit of the observed counts Oi to the expected<br />

counts Ei, that is, to test whether or not the following null hypothesis is rejected:<br />

H0: The population has absolute frequencies Ei for each of the i =1, .., k<br />

categories,<br />

we then use test the statistic:<br />

* 2<br />

( O − E )<br />

k<br />

2<br />

i i<br />

∑<br />

i= 1 Ei<br />

χ =<br />

, 5.8<br />

2<br />

2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!