PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

More documents

Recommendations

Info

30 Chapter 3. Overview of Semi-Supervised Learning highly susceptible to class-label noise, such as boosting methods (See also section 5.2). 3.4 Generative Methods In principle, by looking only at unlabeled data, what we can get is an estimate for the marginal data distribution P (x). If we know how the instances from each class are distributed, we can decompose the mixture into individual classes and apply such mixture models to semi-supervised learning. Generative models perform classification by finding good estimates for p(x|y) and p(y). The class conditional p(x|y) can be estimated using some model parameters, e.g., the mean µ and covariance matrix σ of a Gaussian distribution. p(y) has to be estimated for K classes. All parameters in p(x|y) and p(y) can be summarized in one vector θ. Given the training data D, during training generative models try to find good estimates for θ using the maximum likelihood estimate (MLE) ˆθ = arg max p(D|θ) = arg max log p(D|θ) (3.4) θ θ Note that the log likelyhood log p(D|θ) is often preferred to estimating the likelihood directly because it is easier to handle. When we rewrite the log likelihood as follows l∏ log p(D|θ) = log p(x i , y i |θ) = i=1 l∑ p(y i |θ)p(x i |y i , θ), (3.5) i=1 the MLE can be easily found using constrained optimization. For the semi-supervised learning problem, where D = D L ∪ D U , the log likelihood function changes to ( l∏ log p(D|θ) =log p(x i , y i |θ) = i=1 ∏l+u i=l+1 l∑ log p(y i |θ)p(x i |y i , θ) + i=1 ) p(x i |θ) ∑l+u i=l+1 log p(x i |θ). (3.6) The task of a semi-supervised algorithm is now to find the MLE of Equation 3.6 which needs to fit both the labeled and the unlabeled instances. Note that since the labels for the unlabeled samples are not given, they become additional optimization variables, which makes the overall optimization problem non-convex and thus difficult.
3.5. Co-Training and Multi-View Learning 31 Expectation Maximization In generative SSL methods, one of the most frequent optimization methods used is the expectation maximization (EM) algorithm. If the training data given is D = D L ∪D U , then the missing (hidden) variables are H = {y l+1 , . . . , y l+u }. The EM algorithm is an iterative method to find the model parameters θ that locally maximize p((D)|θ). EM in each iteration consists of two steps, an expectation step (E-step) and a maximization step (M-step). It keeps a distribution q t (H) over the hidden variables. In practice, EM was used for many SSL problems, e.g., text classification [Nigam et al., 2006], etc.. However, since it is a local optimizer, it can get stuck in local minima. We depict the EM method in detail in Algorithm 3.1. Algorithm 3.1 Expectation Maximization Require: Labeled data X l and unlabeled data X u Require: Initial parameter θ 0 repeat E-step: compute q t (H) = p(D|θ, θ t ) M-step: find θ t+1 that maximizes ∑ H q t(H) log p(D, H|θ t+1 ) t = t + 1 until p(D|θ t ) converges Output the final parameters θ. 3.5 Co-Training and Multi-View Learning Co-training 1 [Blum and Mitchell, 1998] which exploits the redundancy of unlabeled input data is another popular SSL method. In co-training, two initial classifiers h 1 , h 2 are trained on some labeled data D L using different redundant “views”. Different views can be, for instance, different types of uncorrelated features. Then, one classifier updates the other one on samples of the unlabeled data set D U where it is most confident. Cotraining is a wrapper method, which means it does not matter which learning algorithms are applied as long as they are able to deliver confidence-rated predictions. We depict the algorithmic steps in Algorithm 3.2. The approach has shown to converge if two conditions hold: 1. There exist two separate views x = [x (1) , x (2) ] and the task is solvable under each view. 2. The views should be conditionally independent given the class label; i.e., P (x (1) |y, x (2) ) = P (x (1) |y) and P (x (2) |y, x (1) ) = P (x (2) |y). 1 A.K.A. collaborative bootstrapping or multi-view learning.
Page 1:
PhD Thesis Semi-Supervised Ensemble
Page 5: Statutory Declaration I declare tha
Page 8 and 9: Most of all, I would like to thank
Page 10 and 11: learning. Finally, we hypothesize t
Page 12 and 13: sten Teil dieser Arbeit schlagen wi
Page 14 and 15: ii CONTENTS 3.6 Graph-based Methods
Page 16 and 17: iv CONTENTS 10 Conclusion 137 10.1
Page 18 and 19: vi LIST OF FIGURES 4.8 Performance
Page 20 and 21: viii LIST OF FIGURES 9.7 Comparison
Page 22 and 23: x LIST OF FIGURES
Page 24 and 25: xii LIST OF TABLES 8.2 Results and
Page 26 and 27: xiv LIST OF ALGORITHMS
Page 28 and 29: 2 Chapter 1. Introduction Figure 1.
Page 30 and 31: 4 Chapter 1. Introduction the liter
Page 32 and 33: 6 Chapter 1. Introduction 1.1 Contr
Page 34 and 35: 8 Chapter 1. Introduction
Page 36 and 37: 10 Chapter 2. Preliminaries and Not
Page 52 and 53: 26 Chapter 3. Overview of Semi-Supe
Page 70 and 71: 44 Chapter 4. SemiBoost and Visual
Page 88 and 89: 62 Chapter 5. On-line Semi-Supervis
Page 106 and 107:
80 Chapter 5. On-line Semi-Supervis
Page 108 and 109:
Page 110 and 111:
84 Chapter 6. Semi-Supervised Rando
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
Page 126 and 127:
100 Chapter 7. On-line Semi-Supervi
Page 128 and 129:
102 Chapter 7. On-line Semi-Supervi
Page 130 and 131:
104 Chapter 8. Multiple Instance Le
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
116 Chapter 9. Visual Object Tracki
Page 144 and 145:
Page 146 and 147:
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Page 154 and 155:
Page 156 and 157:
Page 158 and 159:
Page 160 and 161:
Page 162 and 163:
Page 164 and 165:
138 Chapter 10. Conclusion As many
Page 166 and 167:
140 Chapter 10. Conclusion positive
Page 168 and 169:
142 Chapter 10. Conclusion
Page 170 and 171:
144 Chapter A. Publications (8) Mar
Page 172 and 173:
146 Chapter A. Publications
Page 174 and 175:
148 Chapter B. Acronyms SVM Support
Page 176 and 177:
150 BIBLIOGRAPHY [Balcan et al., 20
Page 178 and 179:
152 BIBLIOGRAPHY [Chapelle and Zien
Page 180 and 181:
154 BIBLIOGRAPHY [Gall and Lempinsk
Page 182 and 183:
156 BIBLIOGRAPHY [Leistner et al.,
Page 184 and 185:
158 BIBLIOGRAPHY [Nigam et al., 200
Page 186 and 187:
160 BIBLIOGRAPHY [Shalev-Shwartz, 2
Page 188 and 189:
162 BIBLIOGRAPHY [Xu et al., 2009]
show all

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

Create successful ePaper yourself

Delete template?

Save as template?