13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

188 CHAPTER 6 | IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMESto rule induction <strong>and</strong> continued with association rules, linear models, thenearest-neighbor method of instance-based learning, <strong>and</strong> clustering. Thepresent chapter develops all these topics except association rules, which havealready been covered in adequate detail.We begin with decision tree induction <strong>and</strong> work up to a full description ofthe C4.5 system, a l<strong>and</strong>mark decision tree program that is probably the machinelearning workhorse most widely used in practice to date. Next we describe decisionrule induction. Despite the simplicity of the idea, inducing decision rulesthat perform comparably with state-of-the-art decision trees turns out to bequite difficult in practice. Most high-performance rule inducers find an initialrule set <strong>and</strong> then refine it using a rather complex optimization stage that discardsor adjusts individual rules to make them work better together. We describethe ideas that underlie rule learning in the presence of noise, <strong>and</strong> then go on tocover a scheme that operates by forming partial decision trees, an approachthat has been demonstrated to perform as well as other state-of-the-art rulelearners yet avoids their complex <strong>and</strong> ad hoc heuristics. Following this, we takea brief look at how to generate rules with exceptions, which were described inSection 3.5.There has been resurgence of interest in linear models with the introductionof support vector machines, a blend of linear modeling <strong>and</strong> instance-basedlearning. Support vector machines select a small number of critical boundaryinstances called support vectors from each class <strong>and</strong> build a linear discriminantfunction that separates them as widely as possible. These systems transcendthe limitations of linear boundaries by making it practical to include extranonlinear terms in the function, making it possible to form quadratic, cubic,<strong>and</strong> higher-order decision boundaries. The same techniques can be applied tothe perceptron described in Section 4.6 to implement complex decision boundaries.An older technique for extending the perceptron is to connect unitstogether into multilayer “neural networks.” All these ideas are described inSection 6.3.The next section of the chapter describes instance-based learners, developingthe simple nearest-neighbor method introduced in Section 4.7 <strong>and</strong> showingsome more powerful alternatives that perform explicit generalization. Followingthat, we extend linear regression for numeric prediction to a more sophisticatedprocedure that comes up with the tree representation introduced inSection 3.7 <strong>and</strong> go on to describe locally weighted regression, an instance-basedstrategy for numeric prediction. Next we return to clustering <strong>and</strong> review somemethods that are more sophisticated than simple k-means, methods thatproduce hierarchical clusters <strong>and</strong> probabilistic clusters. Finally, we look atBayesian networks, a potentially very powerful way of extending the Naïve Bayesmethod to make it less “naïve” by dealing with datasets that have internaldependencies.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!