PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

More documents

Recommendations

Info

70 Chapter 5. On-line Semi-Supervised Boosting According to the gradient descent principle in optimization, at stage t of Gradient- Boost, we are looking for a base function which has the maximum correlation with negative direction of the loss function. This can be written as f t (x) = arg max f(x) − ∇L T f(x), (5.8) where ∇L is the gradient vector of the loss at F t−1 (x) = ∑ t−1 m=1 f m(x). This can be simplified to f t (x) = arg max f(x) − N∑ y n l ′ (y n F t−1 (x n ))f(x n ). (5.9) n=1 where l ′ (·) shows the derivatives of the loss with respect to F t−1 . This can also be written as the maximum projection of the weak learner on to the negative gradient via the inner product: f t (x) = arg max f∈F 〈−∇LT (X , f(X )〉, (5.10) where F = {f 1 , . . . , f m } is the set of weak learners. See also Figure 5.3 for an illustration of the resulting boosting principle. The sample weights are calculated as w n = −l ′ (y n F t−1 (x n )). Thus, the optimization is equivalent to maximizing the weighted classification accuracy f t (x) = arg max f(x) N∑ w n y n f(x n ). (5.11) n=1 Figure 5.3: Illustration of Gradient Boosting.
5.2. On Robustness of On-line Boosting 71 On-line Learning Based on these derivations, we propose an on-line version of GradientBoost, which we show in Algorithm 5.2. As in [Grabner and Bischof, 2006] we keep a fixed set of weak learners and perform boosting on the selectors. The m th selector maintains a set of K weak learners S m = {f 1 m(x), · · · , f K m (x)} and at each stage it selects the best performing weak learners. The optimization step Equation (5.11) is then performed iteratively by propagating the samples through the selectors and updating the weight estimate λ m according to the negative derivative of the loss function. Thus, the algorithm is independent of the used loss function. Algorithm 5.2 On-line GradientBoost Require: A training sample: (x n , y n ). Require: A differentiable loss function l(·). Require: Number of selectors M. Require: Number of weak learners per selector K. 1: Set F 0 (x n ) = 0. 2: Set the initial weight w n = −l ′ (0). 3: for m = 1 to M do 4: for k = 1 to K do 5: Train k th weak learner fm(x) k with sample (x n , y n ) and weight w n . 6: //Compute the error 7: e k m ← e k m + w n I(sign(fm(x k n )) ≠ y n ). 8: end for 9: Find the best weak learner with the least total weighted error: j = arg min e k m. k 10: Set f m (x n ) = fm(x j n ). 11: Set F m (x n ) = F m−1 (x n ) + f m (x n ). 12: Set the weight w n = −l ′ (y n F m (x n )). 13: end for 14: 15: Output the final model: F (x) In Figure 5.4 we plot the functions for the weight updates for different popular loss functions. As can be seen, the exponential loss has an unbounded weight update function, while all other loss functions are bounded between [0, 1]. Most importantly, Logit and Hinge weight update functions saturate at 1, as the margin decreases, while the weights of Doom and Savage fades out. This also illustrates the success of SavageBoost on noisy samples; i.e., in contrast to AdaBoost if a sample is misclassified with high accuracy it is not incorporated into the learning with exponentially growing weight but it is considered
Page 1:
PhD Thesis Semi-Supervised Ensemble
Page 5:
Statutory Declaration I declare tha
Page 8 and 9:
Most of all, I would like to thank
Page 10 and 11:
learning. Finally, we hypothesize t
Page 12 and 13:
sten Teil dieser Arbeit schlagen wi
Page 14 and 15:
ii CONTENTS 3.6 Graph-based Methods
Page 16 and 17:
iv CONTENTS 10 Conclusion 137 10.1
Page 18 and 19:
vi LIST OF FIGURES 4.8 Performance
Page 20 and 21:
viii LIST OF FIGURES 9.7 Comparison
Page 22 and 23:
x LIST OF FIGURES
Page 24 and 25:
xii LIST OF TABLES 8.2 Results and
Page 26 and 27:
xiv LIST OF ALGORITHMS
Page 28 and 29:
2 Chapter 1. Introduction Figure 1.
Page 30 and 31:
4 Chapter 1. Introduction the liter
Page 32 and 33:
6 Chapter 1. Introduction 1.1 Contr
Page 34 and 35:
8 Chapter 1. Introduction
Page 36 and 37:
10 Chapter 2. Preliminaries and Not
Page 38 and 39:
Page 40 and 41:
Page 42 and 43:
Page 44 and 45:
Page 46 and 47: 20 Chapter 2. Preliminaries and Not
Page 52 and 53: 26 Chapter 3. Overview of Semi-Supe
Page 70 and 71: 44 Chapter 4. SemiBoost and Visual
Page 88 and 89: 62 Chapter 5. On-line Semi-Supervis
Page 110 and 111: 84 Chapter 6. Semi-Supervised Rando
Page 126 and 127: 100 Chapter 7. On-line Semi-Supervi
Page 128 and 129: 102 Chapter 7. On-line Semi-Supervi
Page 130 and 131: 104 Chapter 8. Multiple Instance Le
Page 142 and 143: 116 Chapter 9. Visual Object Tracki
Page 144 and 145: 118 Chapter 9. Visual Object Tracki
Page 146 and 147:
120 Chapter 9. Visual Object Tracki
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Page 154 and 155:
Page 156 and 157:
Page 158 and 159:
Page 160 and 161:
Page 162 and 163:
Page 164 and 165:
138 Chapter 10. Conclusion As many
Page 166 and 167:
140 Chapter 10. Conclusion positive
Page 168 and 169:
142 Chapter 10. Conclusion
Page 170 and 171:
144 Chapter A. Publications (8) Mar
Page 172 and 173:
146 Chapter A. Publications
Page 174 and 175:
148 Chapter B. Acronyms SVM Support
Page 176 and 177:
150 BIBLIOGRAPHY [Balcan et al., 20
Page 178 and 179:
152 BIBLIOGRAPHY [Chapelle and Zien
Page 180 and 181:
154 BIBLIOGRAPHY [Gall and Lempinsk
Page 182 and 183:
156 BIBLIOGRAPHY [Leistner et al.,
Page 184 and 185:
158 BIBLIOGRAPHY [Nigam et al., 200
Page 186 and 187:
160 BIBLIOGRAPHY [Shalev-Shwartz, 2
Page 188 and 189:
162 BIBLIOGRAPHY [Xu et al., 2009]
show all

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?