24.11.2013 Views

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.3. Experiments on Visual Classification and Detection 57<br />

(a) prior (b) trained (c) combined<br />

Figure 4.10: Detection results of a face detector (a) which serves as prior to build a<br />

<strong>Semi</strong>Boost classifier using additional unlabeled data. This classifier alone (b) has not the<br />

power of delivering good results but the combination improves the result essentially.<br />

as well as a better alignment of the detections) as shown in Figure 4.11. Note, the trained<br />

classifier alone consists only of 30 weak classifiers which yields poor results when applied<br />

on the image (Figure 4.10). Of course, when using more weak classifiers it will learn the<br />

prior in<strong>for</strong>mation as well. Additionally, Figure 4.11 shows representative examples which<br />

were obtained by our approach. Note that the most related approach, i.e., [Levin et al.,<br />

2003], fails on this task when only using single views and we hence omit the detailed<br />

comparison.<br />

Scene Adaption<br />

In the next experiment, we want to test our approach on a typical task occurring often in<br />

practice; i.e., scene adaptation or visual transfer learning. In particular, in multi-camera<br />

surveillance scenarios instead of collecting huge amounts of labeled data in order to train<br />

a general object detector one can collect data <strong>for</strong> specific camera views or scenarios and<br />

train a classifier only on these. This has the advantage that the model complexity can be<br />

reduced, which results in faster detectors, and that usually the accuracy can be increased<br />

because the classifier has only to discriminate the target object versus one specific background.<br />

However, as collecting and labeling the data <strong>for</strong> each camera is cost and time<br />

intensive (e.g., consider networks with tens or hundreds of surveillance cameras), one is<br />

interesting in reusing as much in<strong>for</strong>mation as possible between the cameras <strong>for</strong> training.<br />

In the following, we demonstrate that our approach is also useful <strong>for</strong> such a problem.<br />

For this purpose, we trained a car detector <strong>for</strong> a specific scene (i.e., “scene 1” illustrated<br />

in Figure 4.12(a)) using 1000 labeled samples. When applying this classifier on<br />

a different scene with a similar view point, as expected, it per<strong>for</strong>ms significantly worse.<br />

Hence, in order to adapt the detector to the different scene (i.e., “scene 2”, Figure 4.12(b)),<br />

we apply a simple motion detector to get potential positive samples of scene given in<br />

Figure 4.12(b). After collecting 1000 of them and additionally cropping 1000 random

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!