20.04.2014 Views

Combining Pattern Classifiers

Combining Pattern Classifiers

Combining Pattern Classifiers

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

34 FUNDAMENTALS OF PATTERN RECOGNITION<br />

By design, the classification regions of D correspond to the true highest posterior<br />

probabilities. The bullet on the x-axis in Figure 1.12 splits R into R 1 (to the left) and<br />

R 2 (to the right). According to Eq. (1.54), the Bayes error will be the area under<br />

P(v 2 )p(xjv 2 )inR 1 plus the area under P(v 1)p(xjv 1 )inR 2 . The total area corresponding<br />

to the Bayes error is marked in light gray. If the boundary is shifted to<br />

the left or right, additional error will be incurred. We can think of this boundary<br />

as the result from classifier D, which is an imperfect approximation of D . The<br />

shifted boundary, depicted by an open circle, is called in this example the “real”<br />

boundary. Region R 1 is therefore R 1 extended to the right. The error calculated<br />

through Eq. (1.54) is the area under P(v 2 )p(xjv 2 ) in the whole of R 1 , and extra<br />

error will be incurred, measured by the area shaded in dark gray. Therefore, using<br />

the true posterior probabilities or an equivalent set of discriminant functions guarantees<br />

the smallest possible error rate, called the Bayes error.<br />

Since the true probabilities are never available in practice, it is impossible to calculate<br />

the exact Bayes error or design the perfect Bayes classifier. Even if the probabilities<br />

were given, it will be difficult to find the classification regions in R n and<br />

calculate the integrals. Therefore, we rely on estimates of the error as discussed<br />

in Section 1.3.<br />

1.5.6 Multinomial Selection Procedure for Comparing <strong>Classifiers</strong><br />

Alsing et al. [26] propose a different view of classification performance. The classifiers<br />

are compared on a labeled data set, relative to each other in order to identify<br />

which classifier has most often been closest to the true class label. We assume<br />

that each classifier gives at its output a set of c posterior probabilities, one for<br />

each class, guessing the chance of that class being the true label for the input vector<br />

x. Since we use labeled data, the posterior probabilities for the correct label of x are<br />

sorted and the classifier with the largest probability is nominated as the winner for<br />

this x.<br />

Suppose we have classifiers D 1 , ..., D L to be compared on a data set Z of size N.<br />

The multinomial selection procedure consists of the following steps.<br />

1. For i ¼ 1, ..., c,<br />

(a) Use only the N i data points whose true label is v i . Initialize an N i L performance<br />

array T.<br />

(b) For every point z j , such that l(z j ) ¼ v i , find the estimates of the posterior<br />

probability P(v i jz j ) guessed by each classifier. Identify the largest posterior<br />

probability, store a value of 1 for the winning classifier D q by setting<br />

T( j, q) ¼ 1 and values 0 for the remaining L 1 classifiers,<br />

T( j, k) ¼ 0, k ¼ 1, ..., L, k = q.<br />

(c) Calculate an estimate of each classifier being the winner for class v i<br />

assuming that the number of winnings follows a binomial distribution.<br />

The estimate of this probability will be the total number of 1s stored

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!