01.03.2013 Views

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.3 Summarising the Data 75<br />

with the highest frequency of occurrence (37.9%). In choosing this modal category,<br />

we expect to be in error 62.1% of the times. On the other h<strong>and</strong>, if we know the sex<br />

(i.e., we know the full table), we would choose as prediction outcome the “agree”<br />

category if it is a male (expecting then 73.5 – 28 = 45.5% of errors), <strong>and</strong> the “fully<br />

agree” category if it is a female (expecting then 26.5 – 11.4 = 15.1% of errors).<br />

Let us denote:<br />

i. Pec ≡ Percentage of errors using only the columns = 100 – percentage of<br />

modal column category.<br />

ii. Pecr ≡ Percentage of errors using also the rows = sum along the rows of (100<br />

– percentage of modal column category in each row).<br />

The λ measure (Goodman <strong>and</strong> Kruskal lambda) of proportional reduction of<br />

error, when using the columns depending from the rows, is defined as:<br />

Pec − Pecr<br />

λ cr =<br />

. 2.27<br />

Pe<br />

c<br />

Similarly, for the prediction of the rows depending from the columns, we have:<br />

Per − Perc<br />

λ rc =<br />

. 2.28<br />

Pe<br />

r<br />

The coefficient of mutual association (also called symmetric lambda) is a<br />

weighted average of both lambdas, defined as:<br />

average reduction in errors ( Pec<br />

− Pecr<br />

) + ( Per<br />

− Perc<br />

)<br />

λ =<br />

=<br />

. 2.29<br />

average number of errors<br />

Pe + Pe<br />

The lambda measure always ranges between 0 <strong>and</strong> 1, with 0 meaning that the<br />

independent variable is of no help in predicting the dependent variable <strong>and</strong> 1<br />

meaning that the independent variable perfectly specifies the categories of the<br />

dependent variable.<br />

Example 2.10<br />

Q: Compute the lambda statistics for Table 2.4.<br />

A: <strong>Using</strong> formula 2.27 we find λcr = 0.024, suggesting a non-helpful contribution<br />

of the sex in determining the outcome of Q4. We also find λrc = 0 <strong>and</strong> λ = 0.017.<br />

The significance of the lambda statistic will be discussed in Chapter 5.<br />

2.3.6.3 The Kappa Statistic<br />

The kappa statistic is used to measure the degree of agreement for categorical<br />

variables. Consider the cross table shown in Figure 2.19 where the r rows are<br />

c<br />

r

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!