24.11.2013 Views

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

56 Chapter 4. <strong>Semi</strong>Boost and Visual Similarity Learning<br />

Improving a Detector<br />

Contrary to the previous experiment, where we used a pair-wise classifier, in this experiment<br />

we want to show that with our method it is possible to improve any given classifier<br />

on related unlabeled samples as proposed in Section 4.2.4. There<strong>for</strong>e, <strong>for</strong> each of the<br />

following two experiments, we first train a Viola/Jones detector [Viola and Jones, 2001]<br />

on labeled data. We denote this classifier as F P (x). The response of the last cascade layer<br />

is used as our prior classifier. The final detection results are obtained by non-maxima<br />

suppression as post processing step.<br />

Figure 4.10(a) depicts the results by applying the prior classifier trained on the frequently<br />

used MIT+CMU faces where state-of-the-art results are achieved. Now, we want<br />

to improve this classifier using related unlabeled data; i.e., the unlabeled data set should<br />

contain a significant amount of faces. Since this also can be a tedious task, we propose a<br />

simple approach to data-mine related unlabeled samples using web image search engines,<br />

where the key idea is to feed the search engine with queries that might lead to resulting<br />

images containing our target objects. For instance, if we want to train a car detector, we<br />

could use “highway”, “road”, “traffic”, etc. as queries and denote the obtained images as<br />

our first unlabeled data set X U . Of course, the thus obtained images might be still very<br />

noisy. So, in a next step we refine X U by applying F P (x) in a sliding-window manner<br />

over different scales in order to bootstrap <strong>for</strong> “interesting” unlabeled samples. We crop all<br />

detected objects of F P (x) and copy them to a new unlabeled set X U ∗ . After having obtained<br />

X U ∗ and some labeled data X L , we can now apply <strong>Semi</strong>Boost in order to improve<br />

F P (x). We depict the approach in Algorithm 4.2.<br />

Algorithm 4.2 Simple data mining <strong>for</strong> in<strong>for</strong>mative unlabeled data<br />

Require: Labeled training data (x, y) ∈ X L<br />

1: Train cascaded detector F P (x) on X L using [Viola and Jones, 2001]<br />

2: Use a web image search engine in order to collect huge amounts of possibly useful<br />

images X U ; pass phrases that are much likely related to your target object<br />

3: Apply F P (x) in a sliding window manner on X U and copy all detections to X U ∗<br />

4: Train a <strong>Semi</strong>Boost classifier F (x) on X L and X U ∗ using F P (x) as prior<br />

5: Output the final classifier F (x)<br />

As proof of concept, we applied F P (x) on 300 random images downloaded from<br />

Google-Image search with the keyword “team”. The delivered detections (>4000) are<br />

used as additional unlabeled data. The 50 most confident detections were used as positive<br />

labeled data and the 50 least confident detections were used as negative ones <strong>for</strong> training<br />

the <strong>Semi</strong>Boost classifier with only 30 weak classifiers. The proposed combination strategy<br />

(Equation (4.20)) improved the results (higher detection and lower false positive rate

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!