13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.4 COVERING ALGORITHMS: CONSTRUCTING RULES 105greatly reduced. In practical implementations, we can use an ad hoc test to guardagainst splitting on such a useless attribute.Unfortunately, in some situations the gain ratio modification overcompensates<strong>and</strong> can lead to preferring an attribute just because its intrinsic informationis much lower than that for the other attributes. A st<strong>and</strong>ard fix is to choosethe attribute that maximizes the gain ratio, provided that the information gainfor that attribute is at least as great as the average information gain for all theattributes examined.DiscussionThe divide-<strong>and</strong>-conquer approach to decision tree induction, sometimes calledtop-down induction of decision trees, was developed <strong>and</strong> refined over many yearsby J. Ross Quinlan of the University of Sydney, Australia. Although others haveworked on similar methods, Quinlan’s research has always been at the very forefrontof decision tree induction. The method that has been described using theinformation gain criterion is essentially the same as one known as ID3. The useof the gain ratio was one of many improvements that were made to ID3 overseveral years; Quinlan described it as robust under a wide variety of circumstances.Although a robust <strong>and</strong> practical solution, it sacrifices some of the elegance<strong>and</strong> clean theoretical motivation of the information gain criterion.A series of improvements to ID3 culminated in a practical <strong>and</strong> influentialsystem for decision tree induction called C4.5. These improvements includemethods for dealing with numeric attributes, missing values, noisy data, <strong>and</strong>generating rules from trees, <strong>and</strong> they are described in Section 6.1.4.4 Covering algorithms: Constructing rulesAs we have seen, decision tree algorithms are based on a divide-<strong>and</strong>-conquerapproach to the classification problem. They work from the top down, seekingat each stage an attribute to split on that best separates the classes; then recursivelyprocessing the subproblems that result from the split. This strategygenerates a decision tree, which can if necessary be converted into a set of classificationrules—although if it is to produce effective rules, the conversion is nottrivial.An alternative approach is to take each class in turn <strong>and</strong> seek a way of coveringall instances in it, at the same time excluding instances not in the class.This is called a covering approach because at each stage you identify a rule that“covers” some of the instances. By its very nature, this covering approach leadsto a set of rules rather than to a decision tree.The covering method can readily be visualized in a two-dimensional spaceof instances as shown in Figure 4.6(a). We first make a rule covering the a’s. For

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!