24.11.2013 Views

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.5. Co-Training and Multi-View Learning 31<br />

Expectation Maximization In generative SSL methods, one of the most frequent optimization<br />

methods used is the expectation maximization (EM) algorithm. If the training<br />

data given is D = D L ∪D U , then the missing (hidden) variables are H = {y l+1 , . . . , y l+u }.<br />

The EM algorithm is an iterative method to find the model parameters θ that locally maximize<br />

p((D)|θ). EM in each iteration consists of two steps, an expectation step (E-step)<br />

and a maximization step (M-step). It keeps a distribution q t (H) over the hidden variables.<br />

In practice, EM was used <strong>for</strong> many SSL problems, e.g., text classification [Nigam et al.,<br />

2006], etc.. However, since it is a local optimizer, it can get stuck in local minima. We<br />

depict the EM method in detail in Algorithm 3.1.<br />

Algorithm 3.1 Expectation Maximization<br />

Require: Labeled data X l and unlabeled data X u<br />

Require: Initial parameter θ 0<br />

repeat<br />

E-step: compute q t (H) = p(D|θ, θ t )<br />

M-step: find θ t+1 that maximizes ∑ H q t(H) log p(D, H|θ t+1 )<br />

t = t + 1<br />

until p(D|θ t ) converges<br />

Output the final parameters θ.<br />

3.5 Co-Training and Multi-View Learning<br />

Co-training 1 [Blum and Mitchell, 1998] which exploits the redundancy of unlabeled<br />

input data is another popular SSL method. In co-training, two initial classifiers h 1 , h 2<br />

are trained on some labeled data D L using different redundant “views”. Different views<br />

can be, <strong>for</strong> instance, different types of uncorrelated features. Then, one classifier updates<br />

the other one on samples of the unlabeled data set D U where it is most confident. Cotraining<br />

is a wrapper method, which means it does not matter which learning algorithms<br />

are applied as long as they are able to deliver confidence-rated predictions. We depict the<br />

algorithmic steps in Algorithm 3.2.<br />

The approach has shown to converge if two conditions hold:<br />

1. There exist two separate views x = [x (1) , x (2) ] and the task is solvable under each<br />

view.<br />

2. The views should be conditionally independent given the class label;<br />

i.e., P (x (1) |y, x (2) ) = P (x (1) |y) and P (x (2) |y, x (1) ) = P (x (2) |y).<br />

1 A.K.A. collaborative bootstrapping or multi-view learning.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!