13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

10.4 LEARNING ALGORITHMS 403There is a supervised version of the NominalToBinary filter that transformsall multivalued nominal attributes to binary ones. In this version, the transformationdepends on whether the class is nominal or numeric. If nominal, thesame method as before is used: an attribute with k values is transformed into kbinary attributes. If the class is numeric, however, the method described inSection 6.5 (page 246) is applied. In either case the class itself is not transformed.ClassOrder changes the ordering of the class values. The user determineswhether the new ordering is r<strong>and</strong>om or in ascending or descending order ofclass frequency. This filter must not be used with the FilteredClassifier metalearningscheme! AttributeSelection can be used for automatic attribute selection<strong>and</strong> provides the same functionality as the Explorer’s Select attributes panel(described later).Supervised instance filtersThere are three supervised instance filters. Resample is like the eponymous unsupervisedinstance filter except that it maintains the class distribution in thesubsample. Alternatively, it can be configured to bias the class distributiontowards a uniform one. SpreadSubsample also produces a r<strong>and</strong>om subsample,but the frequency difference between the rarest <strong>and</strong> the most common class canbe controlled—for example, you can specify at most a 2 : 1 difference in classfrequencies. Like the unsupervised instance filter RemoveFolds, StratifiedRemoveFoldsoutputs a specified cross-validation fold for the dataset, exceptthat this time the fold is stratified.10.4 <strong>Learning</strong> algorithmsOn the Classify panel, when you select a learning algorithm using the Choosebutton the comm<strong>and</strong>-line version of the classifier appears in the line beside thebutton, including the parameters specified with minus signs. To change them,click that line to get an appropriate object editor. Table 10.5 lists Weka’s classifiers.They are divided into Bayesian classifiers, trees, rules, functions, lazy classifiers,<strong>and</strong> a final miscellaneous category. We describe them briefly here, alongwith their parameters. To learn more, choose one in the Weka Explorer interface<strong>and</strong> examine its object editor. A further kind of classifier, the Metalearner,is described in the next section.Bayesian classifiersNaiveBayes implements the probabilistic Naïve Bayes classifier (Section4.2). NaiveBayesSimple uses the normal distribution to model numeric attributes.NaiveBayes can use kernel density estimators, which improves performanceif the normality assumption is grossly incorrect; it can also h<strong>and</strong>le numeric

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!