13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

10.3 FILTERING ALGORITHMS 401it into a given number of cross-validation folds <strong>and</strong> reduce it to just one of them.If a r<strong>and</strong>om number seed is provided, the dataset will be shuffled before thesubset is extracted. RemovePercentage removes a given percentage of instances,<strong>and</strong> RemoveRange removes a certain range of instance numbers. To remove allinstances that have certain values for nominal attributes, or numeric valuesabove or below a certain threshold, use RemoveWithValues. By default allinstances are deleted that exhibit one of a given set of nominal attribute values(if the specified attribute is nominal) or a numeric value below a given threshold(if it is numeric). However, the matching criterion can be inverted.You can remove outliers by applying a classification method to the dataset(specifying it just as the clustering method was specified previously forAddCluster) <strong>and</strong> use RemoveMisclassified to delete the instances that itmisclassifies.Sparse instancesThe NonSparseToSparse <strong>and</strong> SparseToNonSparse filters convert between theregular representation of a dataset <strong>and</strong> its sparse representation (see Section2.4).Supervised filtersSupervised filters are available from the Explorer’s Preprocess panel, just as unsupervisedones are. You need to be careful with them because, despite appearances,they are not really preprocessing operations. We noted this previouslywith regard to discretization—the test data splits must not use the test data’sclass values because these are supposed to be unknown—<strong>and</strong> it is true for supervisedfilters in general.Because of popular dem<strong>and</strong>, Weka allows you to invoke supervised filters asa preprocessing operation, just like unsupervised filters. However, if you intendto use them for classification you should adopt a different methodology. A metalearneris provided that invokes a filter in a way that wraps the learning algorithminto the filtering mechanism. This filters the test data using the filter thathas been created by the training data. It is also useful for some unsupervisedfilters. For example, in StringToWordVector the dictionary will be created fromthe training data alone: words that are novel in the test data will be discarded.To use a supervised filter in this way, invoke the FilteredClassifier metalearningscheme from in the meta section of the menu displayed by the Classify panel’sChoose button. Figure 10.17(a) shows the object editor for this metalearningscheme. With it you choose a classifier <strong>and</strong> a filter. Figure 10.17(b) shows themenu of filters.Supervised filters, like unsupervised ones, are divided into attribute <strong>and</strong>instance filters, listed in Table 10.3 <strong>and</strong> Table 10.4.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!