20.03.2013 Views

From Algorithms to Z-Scores - matloff - University of California, Davis

From Algorithms to Z-Scores - matloff - University of California, Davis

From Algorithms to Z-Scores - matloff - University of California, Davis

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

356 CHAPTER 17. RELATIONS AMONG VARIABLES: ADVANCED<br />

The point is that (17.40) looks like our no-interaction ANOVA models, e.g. (??). On the other<br />

hand, if we assume instead that Education is independent <strong>of</strong> IDE and Language but that IDE and<br />

Language are not independent <strong>of</strong> each other, our model would be<br />

log(pijk) = P<br />

<br />

X (1) = i and X (2) <br />

= j · P X (3) <br />

= k<br />

(17.43)<br />

= ai + bj + dij + ck (17.44)<br />

Here we have written P X (1) = i and X (2) = j as a sum <strong>of</strong> “main effects” ai and bj, and “interaction<br />

effects,” dij, analogous <strong>to</strong> ANOVA.<br />

Another possible model would have IDE and Language conditionally independent, given Education,<br />

meaning that at any level <strong>of</strong> education, a programmer’s preference <strong>to</strong> use IDE or not, and his choice<br />

<strong>of</strong> programming language, are not related. We’d write the model this way:<br />

log(pijk) = P<br />

<br />

X (1) = i and X (2) <br />

= j · P X (3) <br />

= k<br />

(17.45)<br />

= ai + bj + fik + hjk + ck (17.46)<br />

Note carefully that the type <strong>of</strong> independence in (17.46) has a quite different interpretation than<br />

that in (17.44).<br />

The full model, with no independence assumptions at all, would have three two-way interaction<br />

terms, as well as a three-way interaction term.<br />

17.4.4.4 Parameter Estimation<br />

Remember, whenever we have parametric models, the statistician’s “Swiss army knife” is maximum<br />

likelihood estimation. That is what is most <strong>of</strong>ten used in the case <strong>of</strong> log-linear models.<br />

How, then, do we compute the likelihood <strong>of</strong> our data, the Nijk? It’s actually quite straightforward,<br />

because the Nijk have a multinomial distribution. Then<br />

L =<br />

n!<br />

Πi,j,kNijk! pNijk<br />

ijk<br />

(17.47)<br />

We then write the pijk in terms <strong>of</strong> our model parameters. Take for example (17.44), where we write<br />

pijk = e ai+bj+dij+ck (17.48)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!