24.11.2013 Views

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 4<br />

<strong>Semi</strong>Boost and Visual Similarity<br />

Learning<br />

As we have motivated in the previous chapter, semi-supervised learning is an increasingly<br />

important learning paradigm with various potential practical applications. Additionally,<br />

we have seen that there already exist a large amount of different SSL approaches.<br />

A large subset of SSL methods, especially those based on graphs and the manifold<br />

assumption, rely on a priori given similarities or distance measures which are needed<br />

in order to compare samples (labeled and unlabeled) in feature space. It is clear that the<br />

accuracy of these similarities determines the success of the semi-supervised learners. One<br />

way to obtain good similarities is to learn similarity or distance functions that can then be<br />

used <strong>for</strong> further reasoning or processing.<br />

In the following, we present an approach that combines visual similarity learning<br />

and semi-supervised boosting using manifold regularization. In particular, we use a limited<br />

amount of labeled samples in order to, first, train a discriminant distance function<br />

and, second, use this distance function as a metric in order to guide a variant of <strong>Semi</strong>-<br />

Boost [Mallapragada et al., 2009] through exploiting a huge set of unlabeled data.<br />

4.1 Learning Distance Functions<br />

In machine learning problems, the distance metrics are often given in huge matrices that<br />

encode standard distances such as the Euclidean or the Chi-Squared. Although this works<br />

<strong>for</strong> many applications it has several fundamental problems: First, the data stored in matrices<br />

grows with O(n 2 ), where n is the number of samples. Second, it is often hard to<br />

decide which is the best similarity metric in order to solve a certain task. Especially in<br />

computer vision, it is difficult to determine what makes some digital images similar and<br />

43

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!