13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

10.5 METALEARNING ALGORITHMS 417Combining classifiersVote provides a baseline method for combining classifiers by averaging theirprobability estimates (classification) or numeric predictions (regression).MultiScheme selects the best classifier from a set of c<strong>and</strong>idates using crossvalidationof percentage accuracy (classification) or mean-squared error(regression). The number of folds is a parameter. Performance on training datacan be used instead.Stacking combines classifiers using stacking (Section 7.5, page 332) for bothclassification <strong>and</strong> regression problems. You specify the base classifiers, the metalearner,<strong>and</strong> the number of cross-validation folds. StackingC implements a moreefficient variant for which the metalearner must be a numeric prediction scheme(Seewald 2002). In Grading, the inputs to the metalearner are base-level predictionsthat have been marked (i.e., “graded”) as correct or incorrect. For eachbase classifier, a metalearner is learned that predicts when the base classifier willerr. Just as stacking may be viewed as a generalization of voting, grading generalizesselection by cross-validation (Seewald <strong>and</strong> Fürnkranz 2001).Cost-sensitive learningThere are two metalearners for cost-sensitive learning (Section 5.7). The costmatrix can be supplied as a parameter or loaded from a file in the directory setby the onDem<strong>and</strong>Directory property, named by the relation name <strong>and</strong> with theextension cost. CostSensitiveClassifier either reweights training instances accordingto the total cost assigned to each class (cost-sensitive learning, page 165) orpredicts the class with the least expected misclassification cost rather than themost likely one (cost-sensitive classification, page 164). MetaCost generates asingle cost-sensitive classifier from the base learner (Section 7.5, pages 319–320).This implementation uses all bagging iterations when reclassifying training data(Domingos 1999 reports a marginal improvement when using only those iterationscontaining each training instance to reclassify it). You can specify eachbag’s size <strong>and</strong> the number of bagging iterations.Optimizing performanceThree metalearners use the wrapper technique to optimize the base classifier’sperformance. AttributeSelectedClassifier selects attributes, reducing the data’sdimensionality before passing it to the classifier (Section 7.1, page 290). You canchoose the attribute evaluator <strong>and</strong> search method using the Select attributespanel described in Section 10.2. CVParameterSelection optimizes performanceby using cross-validation to select parameters. For each parameter you give astring containing its lower <strong>and</strong> upper bounds <strong>and</strong> the desired number of increments.For example, to vary parameter -P from 1 to 10 in increments of 1, useP 1 10 11. The number of cross-validation folds can be specified.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!