20.04.2014 Views

Combining Pattern Classifiers

Combining Pattern Classifiers

Combining Pattern Classifiers

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

26 FUNDAMENTALS OF PATTERN RECOGNITION<br />

Equation (1.31) gives the probability mass function of the class label variable v<br />

for the observed x. The decision for that particular x should be made with respect to<br />

the posterior probability. Choosing the class with the highest posterior probability<br />

will lead to the smallest possible mistake when classifying x.<br />

The probability model described above is valid for the discrete case as well. Let x<br />

be a discrete variable with possible values in V ¼ {v 1 , ..., v s }. The only difference<br />

from the continuous-valued case is that instead of class-conditional pdf, we use<br />

class-conditional probability mass functions (pmf), P(xjv i ), giving the probability<br />

that a particular value from V occurs if we draw at random an object from class<br />

v i . For all pmfs,<br />

0 P(xjv i ) 1, 8x [ V, and<br />

X s<br />

j¼1<br />

P(v j jv i ) ¼ 1 (1:32)<br />

1.5.2 Normal Distribution<br />

An important example of class-conditional pdf is the normal distribution denoted<br />

p(xjv i ) N(m i ,S i ), where m i [ R n , and S i are the parameters of the distribution.<br />

m i is the mean of class v i , and S i is an n n covariance matrix. The classconditional<br />

pdf is calculated as<br />

<br />

1<br />

p(xjv i ) ¼ pffiffiffiffiffiffiffi<br />

exp<br />

(2p) n=2 jS i j<br />

<br />

1<br />

2 (x m i) T S 1<br />

i (x m i )<br />

(1:33)<br />

where jS i j is the determinant of S i . For the one-dimensional case, x and m i are<br />

scalars, and S i reduces to the variance of x for class v i , denoted s 2 i . Equation<br />

(1.33) simplifies to<br />

"<br />

p(xjv i ) ¼ pffiffiffiffiffiffi<br />

1<br />

#<br />

1 x m 2<br />

exp<br />

i<br />

2p si 2 s i<br />

(1:34)<br />

The normal distribution (or also Gaussian distribution) is the most natural<br />

assumption reflecting the following situation: there is an “ideal prototype” of<br />

class v i (a point in R n ) and all class members are distorted versions of it. Small distortions<br />

are more likely to occur than large distortions, causing more objects to be<br />

located in the close vicinity of the ideal prototype than far away from it. The prototype<br />

is represented by the population mean m i and the scatter of the points around it<br />

is associated with the covariance matrix S i .<br />

Example: Data Cloud Shapes and the Corresponding Covariance<br />

Matrices. Figure 1.8 shows four two-dimensional data sets generated from normal<br />

distributions with different covariance matrices as displayed underneath the respective<br />

scatterplot.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!