PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

More documents

Recommendations

Info

50 Chapter 4. SemiBoost and Visual Similarity Learning + + ? ? - Figure 4.4: The similarity between two samples x and x ′ is approximated by the difference of the responses from an a priori given classifier F P (·). Discussion If F (x, x ′ ) is a large margin function and two samples are identical F (x, x ′ ) will be zero whereas F (x, x ′ ) will be large the more dissimilar x and x ′ are. The same holds for taking F (x) − F (x ′ ) which also will be zero if the two samples are identical and large the more dissimilar they are. Hence, approximating the similarity using a non-pairwise classifier corresponds to indirectly measuring the distance to the decision boarder. The principle is visualized in Figure 4.4. According to this discussion, we define as distance measure d(x, x ′ ) = |F P (x) − F P (x ′ )| (4.18) as the absolute difference of the classifier response to the decision boundary. In other words, samples are similar if they have a similar classifier response. The distance is converted to a similarity using Equation (4.14) as described in the previous subsection. Now, we are able to proceed training on the proposed SemiBoost manner. 4.2.4 Classifier Combination If we train a SemiBoost classifier H(x) using the prior classifier F P (x) as similarity measure, it makes sense to use this prior knowledge for the final classification process as well (i.e., combine the two classifiers). This is closely related to the approach proposed by Schapire et al. [Schapire et al., 2002]. Similarly, we use the prior knowledge as the 0 th weak classifier f 0 (x) = σ −1 (P P (y = 1|x)) where P P (y = 1|x) is the a priori probability of the sample corresponding to the positive class and σ −1 (·) is the inverse function of our logistic model (see [Friedman et al., 2000])). Since we use boosting to train the prior classifier, we end up with f 0 (x) = F P (x) which is included in the combined classifier F C (x) = α 0 F P (x) + F (x). Note that for ease of notation, in the following we will write F P (x) = α 0 F P (x).
4.2. SemiBoost 51 Similar to the standard boosting we can take a look at the expected value of the loss function [Friedman et al., 2000] and compared to Equation (2.10) we get for the combined classifier e F P (x)+F (x) P (y = 1|x) = e F P (x)+F (x) + e . (4.19) −F P (x)−F (x) If we are only interested in the decision we see that a sample is classified as positive if we set P (y = 1|x) ≥ 0.5 and after some mathematical rewriting we get ŷ = sign ( sinh(F P (x) + F (x)) ) = sign ( F P (x) + F (x) ) . (4.20) Discussion The interpretation is as follows. A label switch can happen, i.e., F (x) can overrule F P (x), if the combined term has a different label as the prior F P (x). As can be easily seen, this is the case if |F | > |F P |. Therefore, the more confident the prior is, the harder it is that the label changes. We do not make any statements whether this is a correct or incorrect label switch. Note that overall the prior classifier can be wrong, but it has to provide an “honest” decision. Meaning, if it is highly confident it must be ensured to be a correct decision. There are also relations to the co-training [Balcan et al., 2004] assumptions, i.e., a classifier should be never “confident but wrong”. By rewriting Equation (4.20) as ŷ =sign(sinh(F P (x) + F (x)) = =sign(cosh(F (x)) sinh(F P (x)) + cosh(F P (x)) sinh(F )) (4.21) one sees that the two classifiers weight each other using hyperbolic functions. The factor obtained by cosh(·) ≥ 1 weights the decision of the corresponding classifier passed through the asymmetric sinh(·) function. By an additional scaling factor more emphasis can be put either on the prior or the newly trained classifier; however, this is not explored in this thesis. Summarizing, after training F (x) the expected target of an example is obtained by a combined decision. The combined classifier can now be interpreted as improving F P (x) using labeled and unlabeled samples. We train F (x) with SemiBoost using labeled and unlabeled data, since F P (x) is used to calculate the similarity via (Equation (4.18) and Equation (4.14)) these two classifiers are tightly coupled via the training process and Equation (4.21) is not just a simple sum rule. If we use a complex classifier, i.e., consisting of many weak classifiers, and have a lot of training data F (x) will “absorb” the entire knowledge of F P (x); therefore the usual setting is that we us a rather small F (x) to only correct F P (x).
Page 1:
PhD Thesis Semi-Supervised Ensemble
Page 5:
Statutory Declaration I declare tha
Page 8 and 9:
Most of all, I would like to thank
Page 10 and 11:
learning. Finally, we hypothesize t
Page 12 and 13:
sten Teil dieser Arbeit schlagen wi
Page 14 and 15:
ii CONTENTS 3.6 Graph-based Methods
Page 16 and 17:
iv CONTENTS 10 Conclusion 137 10.1
Page 18 and 19:
vi LIST OF FIGURES 4.8 Performance
Page 20 and 21:
viii LIST OF FIGURES 9.7 Comparison
Page 22 and 23:
x LIST OF FIGURES
Page 24 and 25:
xii LIST OF TABLES 8.2 Results and
Page 26 and 27: xiv LIST OF ALGORITHMS
Page 28 and 29: 2 Chapter 1. Introduction Figure 1.
Page 30 and 31: 4 Chapter 1. Introduction the liter
Page 32 and 33: 6 Chapter 1. Introduction 1.1 Contr
Page 34 and 35: 8 Chapter 1. Introduction
Page 36 and 37: 10 Chapter 2. Preliminaries and Not
Page 52 and 53: 26 Chapter 3. Overview of Semi-Supe
Page 70 and 71: 44 Chapter 4. SemiBoost and Visual
Page 88 and 89: 62 Chapter 5. On-line Semi-Supervis
Page 110 and 111: 84 Chapter 6. Semi-Supervised Rando
Page 126 and 127:
100 Chapter 7. On-line Semi-Supervi
Page 128 and 129:
102 Chapter 7. On-line Semi-Supervi
Page 130 and 131:
104 Chapter 8. Multiple Instance Le
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
116 Chapter 9. Visual Object Tracki
Page 144 and 145:
Page 146 and 147:
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Page 154 and 155:
Page 156 and 157:
Page 158 and 159:
Page 160 and 161:
Page 162 and 163:
Page 164 and 165:
138 Chapter 10. Conclusion As many
Page 166 and 167:
140 Chapter 10. Conclusion positive
Page 168 and 169:
142 Chapter 10. Conclusion
Page 170 and 171:
144 Chapter A. Publications (8) Mar
Page 172 and 173:
146 Chapter A. Publications
Page 174 and 175:
148 Chapter B. Acronyms SVM Support
Page 176 and 177:
150 BIBLIOGRAPHY [Balcan et al., 20
Page 178 and 179:
152 BIBLIOGRAPHY [Chapelle and Zien
Page 180 and 181:
154 BIBLIOGRAPHY [Gall and Lempinsk
Page 182 and 183:
156 BIBLIOGRAPHY [Leistner et al.,
Page 184 and 185:
158 BIBLIOGRAPHY [Nigam et al., 200
Page 186 and 187:
160 BIBLIOGRAPHY [Shalev-Shwartz, 2
Page 188 and 189:
162 BIBLIOGRAPHY [Xu et al., 2009]
show all

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

Create successful ePaper yourself

Delete template?

Save as template?