24.11.2013 Views

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

108 Chapter 8. Multiple Instance Learning with Random Forests<br />

to be trained and simultaneously a suitable set of labels y j i has to be found. Due to the<br />

integer values of the labels y j i , this problem is a type of integer programming and is<br />

usually difficult to solve. In order to solve this non-convex optimization problem without<br />

loosing too much of the training speed of random <strong>for</strong>ests, we will need a fast optimization<br />

procedure. As we have seen in the previous chapter <strong>for</strong> SSL, deterministic annealing<br />

is an ideal candidate <strong>for</strong> such a task and we will hence use it also here <strong>for</strong> solving our<br />

MIL-constrained learning task.<br />

8.3.1 Optimization<br />

In order to optimize our MIL objective function (Eq. (8.1)), we propose the following<br />

iterative strategy: In the first iteration, we train a naïve RF that ignores the MIL constraint<br />

and uses the corresponding bag labels <strong>for</strong> instances inside that bag. Then, after the first<br />

iteration, we treat the instance labels in positive bags as random binary variables. These<br />

random variables are defined over a space of probability distributions P. We now search<br />

a distribution ˆp ∈ P <strong>for</strong> each bag which solves our optimization problem in Eq. (8.1).<br />

Based on ˆp each tree randomly selects the instance labels <strong>for</strong> training. Hence, based on<br />

the optimization of ˆp we try to identify the real but hidden labels of instances.<br />

We re<strong>for</strong>mulate the objective function given in Eq. (8.1) so that it is suitable <strong>for</strong> DA<br />

optimization<br />

L DA (F, ˆp) =<br />

n∑ ∑n i K∑<br />

ˆp(k|x j i )l(F k(x j i )) + T<br />

i=1 j=1 k=1<br />

n∑<br />

H( ˆp i ), (8.3)<br />

i=1<br />

where T is the temperature parameter and<br />

∑n i<br />

H( ˆp i ) = −<br />

K∑<br />

j=1 k=1<br />

ˆp(k|x j i ) log(ˆp(k|xj i )) (8.4)<br />

is the entropy over the predicted distribution inside a bag. It can be seen that the parameter<br />

T steers the importance between the original objective function and the entropy. If T is<br />

high, the entropy dominates the loss function and the problem can be easier solved due<br />

to the convexity. If T = 0 the original loss dominates (Eq. (8.1)). Hence, DA first solves<br />

the easy task of entropy minimization and then by continuously decreasing T from high<br />

values to zero gradually solves the original optimization problem, i.e., finding the real but<br />

hidden instance labels y and simultaneously training an instance classifier.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!