13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

158 CHAPTER 5 | CREDIBILITY: EVALUATING WHAT’S BEEN LEARNEDincorrect. In many situations, this is the most appropriate perspective. If thelearning scheme, when it is actually applied, results in either a correct or anincorrect prediction, success is the right measure to use. This is sometimes calleda 0 - 1 loss function: the “loss” is either zero if the prediction is correct or oneif it is not. The use of loss is conventional, although a more optimistic terminologymight couch the outcome in terms of profit instead.Other situations are softer edged. Most learning methods can associate aprobability with each prediction (as the Naïve Bayes method does). It might bemore natural to take this probability into account when judging correctness. Forexample, a correct outcome predicted with a probability of 99% should perhapsweigh more heavily than one predicted with a probability of 51%, <strong>and</strong>, in a twoclasssituation, perhaps the latter is not all that much better than an incorrectoutcome predicted with probability 51%. Whether it is appropriate to take predictionprobabilities into account depends on the application. If the ultimateapplication really is just a prediction of the outcome, <strong>and</strong> no prizes are awardedfor a realistic assessment of the likelihood of the prediction, it does not seemappropriate to use probabilities. If the prediction is subject to further processing,however—perhaps involving assessment by a person, or a cost analysis, ormaybe even serving as input to a second-level learning process—then it maywell be appropriate to take prediction probabilities into account.Quadratic loss functionSuppose that for a single instance there are k possible outcomes, or classes, <strong>and</strong>for a given instance the learning scheme comes up with a probability vector p 1 ,p 2 ,..., p k for the classes (where these probabilities sum to 1). The actualoutcome for that instance will be one of the possible classes. However, it is convenientto express it as a vector a 1 , a 2 ,...,a k whose ith component, where i isthe actual class, is 1 <strong>and</strong> all other components are 0. We can express the penaltyassociated with this situation as a loss function that depends on both the p vector<strong>and</strong> the a vector.One criterion that is frequently used to evaluate probabilistic prediction isthe quadratic loss function:Â ( j) 2 .pj- ajNote that this is for a single instance: the summation is over possible outputsnot over different instances. Just one of the a’s will be 1 <strong>and</strong> the rest will be 0,so the sum contains contributions of pj2 for the incorrect predictions <strong>and</strong>(1 - p i ) 2 for the correct one. Consequently, it can be written1 2p- i +Â2pj j,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!