23.02.2015 Views

Machine Learning - DISCo

Machine Learning - DISCo

Machine Learning - DISCo

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

A final dimension is the particular definition of rule PERFORMANCE used to<br />

guide the search in LEARN-ONE-RULE. Various evaluation functions have been used.<br />

Some common evaluation functions include:<br />

0 Relative frequency. Let n denote the number of examples the rule matches<br />

and let nc denote the number of these that it classifies correctly. The relative<br />

frequency estimate of rule performance is<br />

Relative frequency is used to evaluate rules in the AQ program.<br />

0 m-estimate of accuracy. This accuracy estimate is biased toward the default<br />

accuracy expected of the rule. It is often preferred when data is scarce and<br />

the rule must be evaluated based on few examples. As above, let n and nc<br />

denote the number of examples matched and correctly predicted by the rule.<br />

Let p be the prior probability that a randomly drawn example from the entire<br />

data set will have the classification assigned by the rule (e.g., if 12 out of<br />

100 examples have the value predicted by the rule, then p = .12). Finally,<br />

let m be the weight, or equivalent number of examples for weighting this<br />

prior p. The m-estimate of rule accuracy is<br />

Note if m is set to zero, then the m-estimate becomes the above relative frequency<br />

estimate. As m is increased, a larger number of examples is needed<br />

to override the prior assumed accuracy p. The m-estimate measure is advocated<br />

by Cestnik and Bratko (1991) and has been used in some versions of<br />

the CN2 algorithm. It is also used in the naive Bayes classifier discussed in<br />

Section 6.9.1.<br />

0 Entropy. This is the measure used by the PERFORMANCE subroutine in the<br />

algorithm of Table 10.2. Let S be the set of examples that match the rule<br />

preconditions. Entropy measures the uniformity of the target function values<br />

for this set of examples. We take the negative of the entropy so that better<br />

rules will have higher scores.<br />

-Entropy (S) =<br />

C<br />

pi logl pi<br />

where c is the number of distinct values the target function may take on,<br />

and where pi is the proportion of examples from S for which the target<br />

function takes on the ith value. This entropy measure, combined with a test<br />

for statistical significance, is used in the CN2 algorithm of Clark and Niblett<br />

(1989). It is also the basis for the information gain measure used by many<br />

decision tree learning algorithms.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!