24.11.2013 Views

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8.3. MILForests 107<br />

label. This makes MILForests different to most previous MIL algorithms that only yield<br />

binary classifiers and require to handle a multi-class problem by a sequence of binary<br />

ones.<br />

One obvious way to design RFs capable of solving MIL tasks is to adopt MIL versions<br />

<strong>for</strong> single decision trees [Blockeel et al., 2005]. However, strategies developed <strong>for</strong><br />

common decision trees are hard to apply <strong>for</strong> RFs due to the random split nature of their<br />

trees. For example, improper regularization of trees of a RF on the node level can decrease<br />

the diversity ¯ρ among trees and thus increase the overall generalization error (see<br />

Eq. (2.16)). Additionally, the method proposed in [Blockeel et al., 2005] is based on simple<br />

heuristics and needs a complicated inter-node communication channel. Thus, in order<br />

to per<strong>for</strong>m multiple instance learning with random <strong>for</strong>ests one has to find an optimization<br />

strategy that preserves the diversity among the trees. In fact, this is a similar condition as<br />

<strong>for</strong> SSL with random <strong>for</strong>ests. Hence, following this condition and the arguments stated in<br />

the previous section 8.2, we makes sense to use a similar optimization strategy as <strong>for</strong> our<br />

semi-supervised random <strong>for</strong>ests introduced in Chapter 6.<br />

There<strong>for</strong>e, we <strong>for</strong>mulate multiple instance learning as an optimization procedure where<br />

the labels of the instances become the optimization variables. The algorithm tries to uncover<br />

the true labels of the instances in an iterative manner. Given such labels, one can<br />

train a supervised classifier which then can be used to classify both instances and bags.<br />

Let B i , i = 1, . . . , n denote the i-th bag in the training set with label y i . Each bag<br />

consists of n i instances: {x 1 i , . . . , x n i<br />

i }. We write the objective function to optimize as:<br />

n∑<br />

({y j i }∗ , F ∗ ) =arg min<br />

{y j i },F (·) i=1<br />

∑n i<br />

s.t. ∀i :<br />

j=1<br />

∑n i<br />

j=1<br />

I(y i = arg max<br />

k∈Y<br />

l(F y<br />

j(x j i )) (8.1)<br />

i<br />

F k (x j i )) ≥ 1.<br />

The objective in this optimization procedure is to minimize a loss function l(·) which<br />

is defined over the entire set of instances by considering the condition that at least one<br />

instance in each bag has to be from the target class. Note that I(·) is an indicator function<br />

and F k (x) is the confidence of the classifier <strong>for</strong> the k-th class, i.e., F k (x) = p(k|x) − 1 K .<br />

Often, the loss function depends on the classification margin of an instance. In the case<br />

of Random Forests, the margin can be written as [Breiman, 2001]<br />

m(x, y) = p(y|x) − max<br />

k∈Y<br />

k≠y<br />

p(k|x) = F y (x) − max<br />

k∈Y<br />

k≠y<br />

F k (x). (8.2)<br />

Note that <strong>for</strong> a correct classification m(x, y) > 0 should hold. Overall, it can easy be<br />

seen that Eq. (8.1) is a non-convex optimization problem because a random <strong>for</strong>est has

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!