13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.1 INFERRING RUDIMENTARY RULES 85For each attribute,For each value of that attribute, make a rule as follows:count how often each class appearsfind the most frequent classmake the rule assign that class to this attribute-value.Calculate the error rate of the rules.Choose the rules with the smallest error rate.Figure 4.1 Pseudocode for 1R.Table 4.1Evaluating the attributes in the weather data.Attribute Rules Errors Total errors1 outlook sunny Æ no 2/5 4/14overcast Æ yes 0/4rainy Æ yes 2/52 temperature hot Æ no* 2/4 5/14mild Æ yes 2/6cool Æ yes 1/43 humidity high Æ no 3/7 4/14normal Æ yes 1/74 windy false Æ yes 2/8 5/14true Æ no* 3/6* A r<strong>and</strong>om choice was made between two equally likely outcomes.To see the 1R method at work, consider the weather data of Table 1.2 (we willencounter it many times again when looking at how learning algorithms work).To classify on the final column, play, 1R considers four sets of rules, one for eachattribute. These rules are shown in Table 4.1. An asterisk indicates that a r<strong>and</strong>omchoice has been made between two equally likely outcomes. The number oferrors is given for each rule, along with the total number of errors for the ruleset as a whole. 1R chooses the attribute that produces rules with the smallestnumber of errors—that is, the first <strong>and</strong> third rule sets. Arbitrarily breaking thetie between these two rule sets gives:outlook: sunny Æ noovercast Æ yesrainy Æ yes

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!