13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

chapter 5Credibility:Evaluating What’s Been LearnedEvaluation is the key to making real progress in data mining. There are lots ofways of inferring structure from data: we have encountered many already <strong>and</strong>will see further refinements, <strong>and</strong> new methods, in the next chapter. But to determinewhich ones to use on a particular problem we need systematic ways toevaluate how different methods work <strong>and</strong> to compare one with another. Evaluationis not as simple as it might appear at first sight.What’s the problem? We have the training set; surely we can just look at howwell different methods do on that. Well, no: as we will see very shortly, performanceon the training set is definitely not a good indicator of performanceon an independent test set. We need ways of predicting performance bounds inpractice, based on experiments with whatever data can be obtained.When a vast supply of data is available, this is no problem: just make a modelbased on a large training set, <strong>and</strong> try it out on another large test set. But althoughdata mining sometimes involves “big data”—particularly in marketing, sales,<strong>and</strong> customer support applications—it is often the case that data, quality data,is scarce. The oil slicks mentioned in Chapter 1 (pages 23–24) had to be detected143

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!