14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Appendix A Statistical Details 677<br />

Key Statistical Concepts<br />

Pressure Cylinders<br />

What if your response is categorical instead of continuous? For example, suppose that the response is the<br />

country of origin for a sample of cars. For your sample, there are probabilities for the three response levels,<br />

American, European, <strong>and</strong> Japanese. You can set these probabilities for country of origin to some estimate<br />

<strong>and</strong> then evaluate the uncertainty in your data. This uncertainty is found by summing the negative logs of<br />

the probabilities of the responses given by the data. It is written<br />

H = h = yi ()<br />

– log p yi ()<br />

The idea of springs illustrates how a mean is fit to continuous data. When the response is categorical,<br />

statistical methods estimate the response probabilities directly <strong>and</strong> choose the estimates that minimize the<br />

total uncertainty of the data. The probability estimates must be nonnegative <strong>and</strong> sum to 1. You can picture<br />

the response probabilities as the composition along a scale whose total length is 1. For each response<br />

observation, load into its response area a gas pressure cylinder, for example, a tire pump. Let the partitions<br />

between the response levels vary until an equilibrium of lowest potential energy is reached. The sizes of the<br />

partitions that result then estimate the response probabilities.<br />

Figure A.7 shows what the situation looks like for a single category such as the medium size cars (see the<br />

mosaic column from Carpoll.jmp labeled medium in Figure A.8). Suppose there are thirteen responses<br />

(cars). The first level (American) has six responses, the next has two, <strong>and</strong> the last has five responses. The<br />

response probabilities become 6•13, 2•13, <strong>and</strong> 5•13, respectively, as the pressure against the response<br />

partitions balances out to minimize the total energy.<br />

Figure A.7 Effect of Pressure Cylinders in Partitions<br />

As with springs for continuous data, you can divide your sample by some factor <strong>and</strong> fit separate sets of<br />

partitions. Then test that the response rates are the same across the groups by measuring how much<br />

additional energy you need to push the partitions to be equal. Imagine the pressure cylinders for car origin<br />

probabilities grouped by the size of the car. The energy required to force the partitions in each group to<br />

align horizontally tests whether the variables have the same probabilities. Figure A.8 shows these partitions.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!