03.03.2015 Views

Selecting the Right Objective Measure for Association Analysis*

Selecting the Right Objective Measure for Association Analysis*

Selecting the Right Objective Measure for Association Analysis*

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.5 General Solution of Standardization Procedure<br />

In <strong>the</strong>orem 1, we showed that <strong>the</strong> IPF procedure can be <strong>for</strong>mulated in terms of a<br />

matrix multiplication operation. Fur<strong>the</strong>rmore, <strong>the</strong> left and right multiplication<br />

matrices are equivalent to scaling <strong>the</strong> row and column elements of <strong>the</strong> original<br />

matrix by some constant factors k 1 , k 2 , k 3 and k 4 . Note that one of <strong>the</strong>se factors<br />

is actually redundant; <strong>the</strong>orem 1 can be stated in terms of three parameters, k 1,<br />

′<br />

k 2 ′ and k 3, ′ i.e.,<br />

[ ] [ ] [ ]<br />

k1 0 a b k3 0<br />

=<br />

0 k 2 c d 0 k 4<br />

[ ] [ ] [ ]<br />

k<br />

′<br />

1 0 a b k<br />

′<br />

3 0<br />

0 k 2<br />

′ c d 0 1<br />

Suppose M = [a b; c d] is <strong>the</strong> original contingency table and M s = [x y; y x]<br />

is <strong>the</strong> standardized table. We can show that any generalized standardization procedure<br />

can be expressed in terms of three basic operations: row scaling, column<br />

scaling, and addition of null values 4 .<br />

[ ] [ ] [<br />

k1 0 a b k3 0<br />

0 k 2 c d 0 1<br />

]<br />

[ ]<br />

0 0<br />

+ =<br />

0 k 4<br />

This matrix equation can be easily solved to obtain<br />

k 1 = y b ,<br />

k 2 = y2 a<br />

xbc ,<br />

k 3 = xb<br />

ay ,<br />

[ ]<br />

x y<br />

y x<br />

(<br />

k 4 = x 1 −<br />

ad<br />

)<br />

bc<br />

x 2<br />

y 2<br />

For IPF, since ad<br />

bc = x2 /y 2 , <strong>the</strong>re<strong>for</strong>e k 4 = 0, and <strong>the</strong> entire standardization<br />

procedure can be expressed in terms of row and column scaling operations.<br />

7 <strong>Measure</strong> Selection based on Rankings by Experts<br />

Although <strong>the</strong> preceding sections describe two scenarios in which many of <strong>the</strong><br />

measures become consistent with each o<strong>the</strong>r, such scenarios may not hold <strong>for</strong> all<br />

application domains. For example, support-based pruning may not be useful <strong>for</strong><br />

domains containing nominal variables, while in o<strong>the</strong>r cases, one may not know<br />

<strong>the</strong> exact standardization scheme to follow. For such applications, an alternative<br />

approach is needed to find <strong>the</strong> best measure.<br />

In this section, we describe a subjective approach <strong>for</strong> finding <strong>the</strong> right measure<br />

based on <strong>the</strong> relative rankings provided by domain experts. Ideally, we<br />

want <strong>the</strong> experts to rank all <strong>the</strong> contingency tables derived from <strong>the</strong> data. These<br />

rankings can help us identify <strong>the</strong> measure that is most consistent with <strong>the</strong> expectation<br />

of <strong>the</strong> experts. For example, we can compare <strong>the</strong> correlation between<br />

<strong>the</strong> rankings produced by <strong>the</strong> existing measures against <strong>the</strong> rankings provided<br />

by <strong>the</strong> experts and select <strong>the</strong> measure that produces <strong>the</strong> highest correlation.<br />

4 Note that although <strong>the</strong> standardized table preserves <strong>the</strong> invariant measure, <strong>the</strong>se<br />

intermediate steps of row or column scaling and addition of null values may not<br />

preserve <strong>the</strong> measure.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!