24.11.2013 Views

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 8<br />

Multiple Instance Learning with<br />

Random Forests<br />

<strong>Semi</strong>-supervised learning algorithms have to learn from ambiguously labeled samples<br />

because the true labels of unlabeled samples are unknown. In machine learning, there<br />

exist a second learning paradigm called multiple-instance learning (MIL) [Keeler et al.,<br />

1990, Dietterich et al., 1997] which is very similar to SSL and also has to resolve ambiguities<br />

during the learning process. In particular, in multiple-instance learning, training<br />

samples are provided in <strong>for</strong>m of bags, where each bag consists of several instances. Labels<br />

are only provided <strong>for</strong> the bags and not <strong>for</strong> the instances. The labels of instances inside<br />

positive bags are unknown, but it is guaranteed that at least one instance has a positive<br />

label. Contrary, in negative bags all instances can be considered as being negative. (See<br />

also Figure 8.1 <strong>for</strong> an illustration of the principle.)<br />

In this chapter, we present a multiple-instance learning algorithm based on random<br />

<strong>for</strong>ests. We thus call the method MILForests. MILForests bring the advantages of random<br />

<strong>for</strong>ests – i.e., speed, multi-class capability, multi-processing, noise resistance, etc.–<br />

to multiple-instance learning, where usually different methods have been applied. In turn,<br />

extending random <strong>for</strong>ests in order to allow <strong>for</strong> multiple-instance learning allows vision<br />

tasks where RFs are typically applied to benefit from the flexibility of MIL. MILForests<br />

are very similar to conventional random <strong>for</strong>ests. However, since the training data is provided<br />

in <strong>for</strong>m of bags, during learning the real class labels of instances inside bags are<br />

unknown. In the following, we will show that multiple-instance learning is a special case<br />

of SSL because all instances inside negative bags can be considered as labeled samples<br />

and all instances inside positive bags as unlabeled, respectively. Based on this insight we<br />

can use a similar optimization approach as <strong>for</strong> the semi-supervised RFs, i.e., deterministic<br />

annealing, in order to find the hidden class labels of instances in positive bags.<br />

103

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!