24.11.2013 Views

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5.4. Machine Learning Experiments 79<br />

Dataset KL-Exponential [Saffari et al., 2008] KL<br />

w U x ← |y p (x) cosh(F (x)) − sinh(F (x))|e −ypF (x) |y p (x) − tanh(F (x))|<br />

ŷ x ← sign(y p (x) cosh(F (x)) − sinh(F (x))) sign(y p (x) − tanh(F (x)))<br />

Table 5.3: Comparison of update rules depending on different loss functions; i.e., exponential<br />

and logit loss.<br />

5.4 Machine Learning Experiments<br />

In this set of experiments, we evaluated on-line <strong>Semi</strong>Boost and on-line SERBoost on standard<br />

semi-supervised benchmark data sets taken from [Chapelle et al., 2006] 1 . The data<br />

consists of both artificial and real-life data. Furthermore, on some data sets the cluster<br />

assumption holds while on some the manifold assumption holds. A summary of the data<br />

sets is presented in Table 5.4. We compared our methods to supervised off-line variants<br />

of nearest neighbor (1-NN), SVM and AdaBoost. We used LaRank SVM [Bordes et al.,<br />

2007] on-line GradientBoost with logistic loss (OLogitBoost) [Leistner et al., 2009a] as<br />

well as standard on-line boosting (OAdaBoost) [Grabner and Bischof, 2006] with LaRank<br />

as weak learners. For semi-supervised comparison we took the off-line versions of SER-<br />

Boost [Saffari et al., 2008], ManifoldBoost [Loeff et al., 2008] and TSVM [Joachims,<br />

1999]. For both on-line and off-line SERBoost we used k-means clustering as prior. For<br />

<strong>Semi</strong>Boost we used the Euclidean distance in order to calculate S ij , with σ set using<br />

5-fold cross validation. For all boosting methods we used 50 weak learners. For gradientbased<br />

methods we set the shrinkage factor to ν = 0.1. The importance λ of the unlabeled<br />

data was set to 0.1. We present the results in Table 5.5.<br />

As can be seen, both on-line <strong>Semi</strong>Boost and on-line SERBoost are competitive SSL<br />

methods and both are able to match the results of their off-line counterparts. As expected,<br />

<strong>Semi</strong>Boost has slight advantages on manifold-like data sets while SERBoost per<strong>for</strong>ms<br />

very well on cluster-based data sets. Interestingly, O<strong>Semi</strong>Boost is able to match the per<strong>for</strong>mance<br />

of OSERBoost on g241c, a manifold-based set, with l = 10 and OSERBoost is<br />

able to outper<strong>for</strong>m O<strong>Semi</strong>Boost on Digit1, a cluster-based set, with l = 10.<br />

5.5 Summary and Conclusion<br />

In this chapter, we introduced a novel on-line semi-supervised boosting algorithm based<br />

on <strong>Semi</strong>Boost. The algorithm is thus called on-line <strong>Semi</strong>Boost. We further illustrated<br />

that one of the major drawbacks of current boosting algorithms, both off-line and on-line,<br />

1 http://www.kyb.tuebingen.mpg.de/ssl-book

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!