12.07.2015 Views

1 Introduction

1 Introduction

1 Introduction

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1.5. Decision Theory 41Figure 1.25 An example of a loss matrix with elementsL kj for the cancer treatment problem. The rowscorrespond to the true class, whereas the columns correspondto the assignment of class made by our decisioncriterion.(cancer normal)cancer 0 1000normal 1 01.5.2 Minimizing the expected lossFor many applications, our objective will be more complex than simply minimizingthe number of misclassifications. Let us consider again the medical diagnosisproblem. We note that, if a patient who does not have cancer is incorrectly diagnosedas having cancer, the consequences may be some patient distress plus the need forfurther investigations. Conversely, if a patient with cancer is diagnosed as healthy,the result may be premature death due to lack of treatment. Thus the consequencesof these two types of mistake can be dramatically different. It would clearly be betterto make fewer mistakes of the second kind, even if this was at the expense of makingmore mistakes of the first kind.We can formalize such issues through the introduction of a loss function, alsocalled a cost function, which is a single, overall measure of loss incurred in takingany of the available decisions or actions. Our goal is then to minimize the total lossincurred. Note that some authors consider instead a utility function, whose valuethey aim to maximize. These are equivalent concepts if we take the utility to besimply the negative of the loss, and throughout this text we shall use the loss functionconvention. Suppose that, for a new value of x, the true class is C k and that we assignx to class C j (where j may or may not be equal to k). In so doing, we incur somelevel of loss that we denote by L kj , which we can view as the k, j element of a lossmatrix. For instance, in our cancer example, we might have a loss matrix of the formshown in Figure 1.25. This particular loss matrix says that there is no loss incurredif the correct decision is made, there is a loss of 1 if a healthy patient is diagnosed ashaving cancer, whereas there is a loss of 1000 if a patient having cancer is diagnosedas healthy.The optimal solution is the one which minimizes the loss function. However,the loss function depends on the true class, which is unknown. For a given inputvector x, our uncertainty in the true class is expressed through the joint probabilitydistribution p(x, C k ) and so we seek instead to minimize the average loss, where theaverage is computed with respect to this distribution, which is given by∑ ∑∫E[L] = L kj p(x, C k )dx. (1.80)k j R jEach x can be assigned independently to one of the decision regions R j . Our goalis to choose the regions R j in order to minimize the expected loss (1.80), whichimplies that for each x we should minimize ∑ k L kjp(x, C k ). As before, we can usethe product rule p(x, C k )=p(C k |x)p(x) to eliminate the common factor of p(x).Thus the decision rule that minimizes the expected loss is the one that assigns each

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!