13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

392 CHAPTER 10 | THE EXPLORERmethod, classes to clusters evaluation, compares how well the chosen clustersmatch a preassigned class in the data. You select an attribute (which must benominal) that represents the “true” class. Having clustered the data, Wekadetermines the majority class in each cluster <strong>and</strong> prints a confusion matrixshowing how many errors there would be if the clusters were used instead ofthe true class. If your dataset has a class attribute, you can ignore it during clusteringby selecting it from a pull-down list of attributes, <strong>and</strong> see how well theclusters correspond to actual class values. Finally, you can choose whether ornot to store the clusters for visualization. The only reason not to do so is to conservespace. As with classifiers, you visualize the results by right-clicking on theresult list, which allows you to view two-dimensional scatter plots like the onein Figure 10.6(b). If you have chosen classes to clusters evaluation, the classassignment errors are shown. For the Cobweb clustering scheme, you can alsovisualize the tree.The Associate panel is simpler than Classify or Cluster. Weka contains onlythree algorithms for determining association rules <strong>and</strong> no methods for evaluatingsuch rules. Figure 10.15 shows the output from the Apriori program forassociation rules (described in Section 4.5) on the nominal version of theweather data. Despite the simplicity of the data, several rules are found. Thenumber before the arrow is the mumber of instances for which the antecedentis true; that after the arrow is the number of instances in which the consequentis true also; <strong>and</strong> the confidence (in parentheses) is the ratio between the two.Ten rules are found by default: you can ask for more by using the object editorto change numRules.Attribute selectionThe Select attributes panel gives access to several methods for attribute selection.As explained in Section 7.1, this involves an attribute evaluator <strong>and</strong> a search1. outlook=overcast 4 ==> play=yes 4 conf:(1)2. temperature=cool 4 ==> humidity=normal 4 conf:(1)3. humidity=normal windy=FALSE 4 ==> play=yes 4 conf:(1)4. outlook=sunny play=no 3 ==> humidity=high 3 conf:(1)5. outlook=sunny humidity=high 3 ==> play=no 3 conf:(1)6. outlook=rainy play=yes 3 ==> windy=FALSE 3 conf:(1)7. outlook=rainy windy=FALSE 3 ==> play=yes 3 conf:(1)8. temperature=cool play=yes 3 ==> humidity=normal 3 conf:(1)9. outlook=sunny temperature=hot 2 ==> humidity=high 2 conf:(1)10. temperature=hot play=no 2 ==> outlook=sunny 2 conf:(1)Figure 10.15 Output from the Apriori program for association rules.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!