13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

14 CHAPTER 1 | WHAT’S IT ALL ABOUT?may be associated with the rules themselves to indicate that some are moreimportant, or more reliable, than others.You might be wondering whether there is a smaller rule set that performs aswell. If so, would you be better off using the smaller rule set <strong>and</strong>, if so, why?These are exactly the kinds of questions that will occupy us in this book. Becausethe examples form a complete set for the problem space, the rules do no morethan summarize all the information that is given, expressing it in a different <strong>and</strong>more concise way. Even though it involves no generalization, this is often a veryuseful thing to do! People frequently use machine learning techniques to gaininsight into the structure of their data rather than to make predictions for newcases. In fact, a prominent <strong>and</strong> successful line of research in machine learningbegan as an attempt to compress a huge database of possible chess endgames<strong>and</strong> their outcomes into a data structure of reasonable size. The data structurechosen for this enterprise was not a set of rules but a decision tree.Figure 1.2 shows a structural description for the contact lens data in the formof a decision tree, which for many purposes is a more concise <strong>and</strong> perspicuousrepresentation of the rules <strong>and</strong> has the advantage that it can be visualized moreeasily. (However, this decision tree—in contrast to the rule set given in Figure1.1—classifies two examples incorrectly.) The tree calls first for a test on tearproduction rate, <strong>and</strong> the first two branches correspond to the two possible outcomes.If tear production rate is reduced (the left branch), the outcome is none.If it is normal (the right branch), a second test is made, this time on astigmatism.Eventually, whatever the outcome of the tests, a leaf of the tree is reachedtear production ratereducednormalnoneastigmatismnoyessoftspectacle prescriptionmyopehypermetropeFigure 1.2 Decision tree for thecontact lens data.hardnone

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!