24.11.2013 Views

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.2. <strong>Semi</strong>Boost 51<br />

Similar to the standard boosting we can take a look at the expected value of the loss<br />

function [Friedman et al., 2000] and compared to Equation (2.10) we get <strong>for</strong> the combined<br />

classifier<br />

e F P (x)+F (x)<br />

P (y = 1|x) =<br />

e F P (x)+F (x)<br />

+ e . (4.19)<br />

−F P (x)−F (x)<br />

If we are only interested in the decision we see that a sample is classified as positive if we<br />

set P (y = 1|x) ≥ 0.5 and after some mathematical rewriting we get<br />

ŷ = sign ( sinh(F P (x) + F (x)) ) = sign ( F P (x) + F (x) ) . (4.20)<br />

Discussion The interpretation is as follows. A label switch can happen, i.e., F (x) can<br />

overrule F P (x), if the combined term has a different label as the prior F P (x). As can<br />

be easily seen, this is the case if |F | > |F P |. There<strong>for</strong>e, the more confident the prior<br />

is, the harder it is that the label changes. We do not make any statements whether this<br />

is a correct or incorrect label switch. Note that overall the prior classifier can be wrong,<br />

but it has to provide an “honest” decision. Meaning, if it is highly confident it must be<br />

ensured to be a correct decision. There are also relations to the co-training [Balcan et al.,<br />

2004] assumptions, i.e., a classifier should be never “confident but wrong”. By rewriting<br />

Equation (4.20) as<br />

ŷ =sign(sinh(F P (x) + F (x)) =<br />

=sign(cosh(F (x)) sinh(F P (x)) + cosh(F P (x)) sinh(F )) (4.21)<br />

one sees that the two classifiers weight each other using hyperbolic functions. The factor<br />

obtained by cosh(·) ≥ 1 weights the decision of the corresponding classifier passed<br />

through the asymmetric sinh(·) function. By an additional scaling factor more emphasis<br />

can be put either on the prior or the newly trained classifier; however, this is not explored<br />

in this thesis.<br />

Summarizing, after training F (x) the expected target of an example is obtained by a<br />

combined decision. The combined classifier can now be interpreted as improving F P (x)<br />

using labeled and unlabeled samples. We train F (x) with <strong>Semi</strong>Boost using labeled and<br />

unlabeled data, since F P (x) is used to calculate the similarity via (Equation (4.18) and<br />

Equation (4.14)) these two classifiers are tightly coupled via the training process and<br />

Equation (4.21) is not just a simple sum rule. If we use a complex classifier, i.e., consisting<br />

of many weak classifiers, and have a lot of training data F (x) will “absorb” the entire<br />

knowledge of F P (x); there<strong>for</strong>e the usual setting is that we us a rather small F (x) to only<br />

correct F P (x).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!