13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

432 CHAPTER 11 | THE KNOWLEDGE FLOW INTERFACEdata sourcefilterevaluationcrossValidationFoldMakerclassifierdata sinkall targets are suitable; applicable ones are highlighted. Items on the connectionsmenu are disabled (grayed out) until the component receives other connectionsthat render them applicable.There are two kinds of connection from data sources: dataset connections<strong>and</strong> instance connections. The former are for batch operations such as classifierslike J48; the latter are for stream operations such as NaiveBayesUpdateable.A data source component cannot provide both types of connection: once oneis selected, the other is disabled. When a dataset connection is made to a batchclassifier, the classifier needs to know whether it is intended to serve as a trainingset or a test set. To do this, you first make the data source into a test or trainingset using the TestSetMaker or TrainingSetMaker components from theEvaluation panel. On the other h<strong>and</strong>, an instance connection to an incrementalclassifier is made directly: there is no distinction between training <strong>and</strong> testingbecause the instances that flow update the classifier incrementally. In this casea prediction is made for each incoming instance <strong>and</strong> incorporated into the testresults; then the classifier is trained on that instance. If you make an instanceconnection to a batch classifier it will be used as a test instance because trainingcannot possibly be incremental whereas testing always can be. Conversely,it is quite possible to test an incremental classifier in batch mode using a datasetconnection.Connections from a filter component are enabled when it receives input froma data source, whereupon follow-on dataset or instance connections can bemade. Instance connections cannot be made to supervised filters or to unsu-ClassifierPerformance-EvaluatorvisualizationFigure 11.3 Operations on the Knowledge Flow components.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!