24.11.2013 Views

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

50 Chapter 4. <strong>Semi</strong>Boost and Visual Similarity Learning<br />

+<br />

+<br />

?<br />

?<br />

-<br />

Figure 4.4: The similarity between two samples x and x ′ is approximated by the difference<br />

of the responses from an a priori given classifier F P (·).<br />

Discussion If F (x, x ′ ) is a large margin function and two samples are identical F (x, x ′ )<br />

will be zero whereas F (x, x ′ ) will be large the more dissimilar x and x ′ are. The same<br />

holds <strong>for</strong> taking F (x) − F (x ′ ) which also will be zero if the two samples are identical and<br />

large the more dissimilar they are. Hence, approximating the similarity using a non-pairwise<br />

classifier corresponds to indirectly measuring the distance to the decision boarder.<br />

The principle is visualized in Figure 4.4.<br />

According to this discussion, we define as distance measure<br />

d(x, x ′ ) = |F P (x) − F P (x ′ )| (4.18)<br />

as the absolute difference of the classifier response to the decision boundary. In other<br />

words, samples are similar if they have a similar classifier response. The distance is converted<br />

to a similarity using Equation (4.14) as described in the previous subsection. Now,<br />

we are able to proceed training on the proposed <strong>Semi</strong>Boost manner.<br />

4.2.4 Classifier Combination<br />

If we train a <strong>Semi</strong>Boost classifier H(x) using the prior classifier F P (x) as similarity<br />

measure, it makes sense to use this prior knowledge <strong>for</strong> the final classification process as<br />

well (i.e., combine the two classifiers). This is closely related to the approach proposed<br />

by Schapire et al. [Schapire et al., 2002]. Similarly, we use the prior knowledge as the 0 th<br />

weak classifier f 0 (x) = σ −1 (P P (y = 1|x)) where P P (y = 1|x) is the a priori probability<br />

of the sample corresponding to the positive class and σ −1 (·) is the inverse function of our<br />

logistic model (see [Friedman et al., 2000])). Since we use boosting to train the prior<br />

classifier, we end up with f 0 (x) = F P (x) which is included in the combined classifier<br />

F C (x) = α 0 F P (x) + F (x). Note that <strong>for</strong> ease of notation, in the following we will write<br />

F P (x) = α 0 F P (x).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!