24.11.2013 Views

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5.2. On Robustness of On-line Boosting 71<br />

On-line Learning<br />

Based on these derivations, we propose an on-line version of GradientBoost, which we<br />

show in Algorithm 5.2. As in [Grabner and Bischof, 2006] we keep a fixed set of weak<br />

learners and per<strong>for</strong>m boosting on the selectors. The m th selector maintains a set of K<br />

weak learners S m = {f 1 m(x), · · · , f K m (x)} and at each stage it selects the best per<strong>for</strong>ming<br />

weak learners. The optimization step Equation (5.11) is then per<strong>for</strong>med iteratively by<br />

propagating the samples through the selectors and updating the weight estimate λ m according<br />

to the negative derivative of the loss function. Thus, the algorithm is independent<br />

of the used loss function.<br />

Algorithm 5.2 On-line GradientBoost<br />

Require: A training sample: (x n , y n ).<br />

Require: A differentiable loss function l(·).<br />

Require: Number of selectors M.<br />

Require: Number of weak learners per selector K.<br />

1: Set F 0 (x n ) = 0.<br />

2: Set the initial weight w n = −l ′ (0).<br />

3: <strong>for</strong> m = 1 to M do<br />

4: <strong>for</strong> k = 1 to K do<br />

5: Train k th weak learner fm(x) k with sample (x n , y n ) and weight w n .<br />

6: //Compute the error<br />

7: e k m ← e k m + w n I(sign(fm(x k n )) ≠ y n ).<br />

8: end <strong>for</strong><br />

9: Find the best weak learner with the least total weighted error: j = arg min e k m.<br />

k<br />

10: Set f m (x n ) = fm(x j n ).<br />

11: Set F m (x n ) = F m−1 (x n ) + f m (x n ).<br />

12: Set the weight w n = −l ′ (y n F m (x n )).<br />

13: end <strong>for</strong><br />

14:<br />

15: Output the final model: F (x)<br />

In Figure 5.4 we plot the functions <strong>for</strong> the weight updates <strong>for</strong> different popular loss<br />

functions. As can be seen, the exponential loss has an unbounded weight update function,<br />

while all other loss functions are bounded between [0, 1]. Most importantly, Logit and<br />

Hinge weight update functions saturate at 1, as the margin decreases, while the weights<br />

of Doom and Savage fades out. This also illustrates the success of SavageBoost on noisy<br />

samples; i.e., in contrast to AdaBoost if a sample is misclassified with high accuracy it is<br />

not incorporated into the learning with exponentially growing weight but it is considered

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!