PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision
PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision
PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
8.3. MILForests 107<br />
label. This makes MILForests different to most previous MIL algorithms that only yield<br />
binary classifiers and require to handle a multi-class problem by a sequence of binary<br />
ones.<br />
One obvious way to design RFs capable of solving MIL tasks is to adopt MIL versions<br />
<strong>for</strong> single decision trees [Blockeel et al., 2005]. However, strategies developed <strong>for</strong><br />
common decision trees are hard to apply <strong>for</strong> RFs due to the random split nature of their<br />
trees. For example, improper regularization of trees of a RF on the node level can decrease<br />
the diversity ¯ρ among trees and thus increase the overall generalization error (see<br />
Eq. (2.16)). Additionally, the method proposed in [Blockeel et al., 2005] is based on simple<br />
heuristics and needs a complicated inter-node communication channel. Thus, in order<br />
to per<strong>for</strong>m multiple instance learning with random <strong>for</strong>ests one has to find an optimization<br />
strategy that preserves the diversity among the trees. In fact, this is a similar condition as<br />
<strong>for</strong> SSL with random <strong>for</strong>ests. Hence, following this condition and the arguments stated in<br />
the previous section 8.2, we makes sense to use a similar optimization strategy as <strong>for</strong> our<br />
semi-supervised random <strong>for</strong>ests introduced in Chapter 6.<br />
There<strong>for</strong>e, we <strong>for</strong>mulate multiple instance learning as an optimization procedure where<br />
the labels of the instances become the optimization variables. The algorithm tries to uncover<br />
the true labels of the instances in an iterative manner. Given such labels, one can<br />
train a supervised classifier which then can be used to classify both instances and bags.<br />
Let B i , i = 1, . . . , n denote the i-th bag in the training set with label y i . Each bag<br />
consists of n i instances: {x 1 i , . . . , x n i<br />
i }. We write the objective function to optimize as:<br />
n∑<br />
({y j i }∗ , F ∗ ) =arg min<br />
{y j i },F (·) i=1<br />
∑n i<br />
s.t. ∀i :<br />
j=1<br />
∑n i<br />
j=1<br />
I(y i = arg max<br />
k∈Y<br />
l(F y<br />
j(x j i )) (8.1)<br />
i<br />
F k (x j i )) ≥ 1.<br />
The objective in this optimization procedure is to minimize a loss function l(·) which<br />
is defined over the entire set of instances by considering the condition that at least one<br />
instance in each bag has to be from the target class. Note that I(·) is an indicator function<br />
and F k (x) is the confidence of the classifier <strong>for</strong> the k-th class, i.e., F k (x) = p(k|x) − 1 K .<br />
Often, the loss function depends on the classification margin of an instance. In the case<br />
of Random Forests, the margin can be written as [Breiman, 2001]<br />
m(x, y) = p(y|x) − max<br />
k∈Y<br />
k≠y<br />
p(k|x) = F y (x) − max<br />
k∈Y<br />
k≠y<br />
F k (x). (8.2)<br />
Note that <strong>for</strong> a correct classification m(x, y) > 0 should hold. Overall, it can easy be<br />
seen that Eq. (8.1) is a non-convex optimization problem because a random <strong>for</strong>est has