13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

316 CHAPTER 7 | TRANSFORMATIONS: ENGINEERING THE INPUT AND OUTPUTlearning techniques do this by learning an ensemble of models <strong>and</strong> using themin combination: prominent among these are schemes called bagging, boosting,<strong>and</strong> stacking. They can all, more often than not, increase predictive performanceover a single model. And they are general techniques that can be applied tonumeric prediction problems <strong>and</strong> to classification tasks.Bagging, boosting, <strong>and</strong> stacking have only been developed over the pastdecade, <strong>and</strong> their performance is often astonishingly good. <strong>Machine</strong> learningresearchers have struggled to underst<strong>and</strong> why. And during that struggle, newmethods have emerged that are sometimes even better. For example, whereashuman committees rarely benefit from noisy distractions, shaking up baggingby adding r<strong>and</strong>om variants of classifiers can improve performance. Closeranalysis revealed that boosting—perhaps the most powerful of the threemethods—is closely related to the established statistical technique of additivemodels, <strong>and</strong> this realization has led to improved procedures.These combined models share the disadvantage of being difficult to analyze:they can comprise dozens or even hundreds of individual models, <strong>and</strong> althoughthey perform well it is not easy to underst<strong>and</strong> in intuitive terms what factorsare contributing to the improved decisions. In the last few years methods havebeen developed that combine the performance benefits of committees withcomprehensible models. Some produce st<strong>and</strong>ard decision tree models; othersintroduce new variants of trees that provide optional paths.We close by introducing a further technique of combining models usingerror-correcting output codes. This is more specialized than the other three techniques:it applies only to classification problems, <strong>and</strong> even then only to ones thathave more than three classes.BaggingCombining the decisions of different models means amalgamating the variousoutputs into a single prediction. The simplest way to do this in the case of classificationis to take a vote (perhaps a weighted vote); in the case of numeric prediction,it is to calculate the average (perhaps a weighted average). Bagging <strong>and</strong>boosting both adopt this approach, but they derive the individual models in differentways. In bagging, the models receive equal weight, whereas in boosting,weighting is used to give more influence to the more successful ones—just asan executive might place different values on the advice of different expertsdepending on how experienced they are.To introduce bagging, suppose that several training datasets of the same sizeare chosen at r<strong>and</strong>om from the problem domain. Imagine using a particularmachine learning technique to build a decision tree for each dataset. You mightexpect these trees to be practically identical <strong>and</strong> to make the same predictionfor each new test instance. Surprisingly, this assumption is usually quite wrong,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!