20.04.2014 Views

Combining Pattern Classifiers

Combining Pattern Classifiers

Combining Pattern Classifiers

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8 FUNDAMENTALS OF PATTERN RECOGNITION<br />

maximum membership rule (1.3), x will be labeled to the same class by any of the<br />

equivalent sets of discriminant functions.<br />

If the classes in Z can be separated completely from each other by a hyperplane<br />

(a point in R, a line in R 2 , a plane in R 3 ), they are called linearly separable. The two<br />

classes in Figure 1.5 are not linearly separable because of the dot at (5,6.6) which is<br />

on the wrong side of the discriminant function.<br />

1.3 CLASSIFICATION ERROR AND CLASSIFICATION ACCURACY<br />

It is important to know how well our classifier performs. The performance of a<br />

classifier is a compound characteristic, whose most important component is the<br />

classification accuracy. If we were able to try the classifier on all possible input<br />

objects, we would know exactly how accurate it is. Unfortunately, this is hardly a<br />

possible scenario, so an estimate of the accuracy has to be used instead.<br />

1.3.1 Calculation of the Error<br />

Assume that a labeled data set Z ts of size N ts n is available for testing the accuracy<br />

of our classifier, D. The most natural way to calculate an estimate of the error is to<br />

run D on all the objects in Z ts and find the proportion of misclassified objects<br />

Error(D) ¼ N error<br />

N ts<br />

(1:7)<br />

where N error is the number of misclassifications committed by D. This is called the<br />

counting estimator of the error rate because it is based on the count of misclassifications.<br />

Let s j [ V be the class label assigned by D to object z j . The counting<br />

estimator can be rewritten as<br />

Error(D) ¼ 1 X N ts<br />

<br />

1 Iðl(z j ), s j Þ , zj [ Z ts (1:8)<br />

N ts<br />

j¼1<br />

where I(a, b) is an indicator function taking value 1 if a ¼ b and 0 if a = b.<br />

Error(D) is also called the apparent error rate. Dual to this characteristic is the<br />

apparent classification accuracy which is calculated by 1 Error(D).<br />

To look at the error from a probabilistic point of view, we can adopt the following<br />

model. The classifier commits an error with probability P D on any object x [ R n<br />

(a wrong but useful assumption). Then the number of errors has a binomial distribution<br />

with parameters (P D , N ts ). An estimate of P D is<br />

^P D ¼ N error<br />

N ts<br />

(1:9)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!