13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

10.2 EXPLORING THE EXPLORER 391Figure 10.14 Configuring a metalearner for boosting decision stumps.Classify panel <strong>and</strong> choose the classifier AdaboostM1 from the meta section of thehierarchical menu. When you configure this classifier by clicking it, the objecteditor shown in Figure 10.14 appears. This has its own classifier field, which weset to DecisionStump (as shown). This method could itself be configured byclicking (except that DecisionStump happens to have no editable properties).Click OK to return to the main Classify panel <strong>and</strong> Start to try out boosting decisionstumps up to 10 times. It turns out that this mislabels only 7 of the 150instances in the Iris data—good performance considering the rudimentarynature of decision stumps <strong>and</strong> the rather small number of boosting iterations.Clustering <strong>and</strong> association rulesUse the Cluster <strong>and</strong> Associate panels to invoke clustering algorithms (Section6.6) <strong>and</strong> methods for finding association rules (Section 4.5). When clustering,Weka shows the number of clusters <strong>and</strong> how many instances each cluster contains.For some algorithms the number of clusters can be specified by setting aparameter in the object editor. For probabilistic clustering methods, Weka measuresthe log-likelihood of the clusters on the training data: the larger this quantity,the better the model fits the data. Increasing the number of clustersnormally increases the likelihood, but may overfit.The controls on the Cluster panel are similar to those for Classify. You canspecify some of the same evaluation methods—use training set, supplied testset, <strong>and</strong> percentage split (the last two are used with the log-likelihood). A further

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!