PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

More documents

Recommendations

Info

40 Chapter 3. Overview of Semi-Supervised Learning text classification, showed that even with this low-sophisticated method weakly-related unlabeled data can help improving the classification accuracy. Semi-Supervised Learning from Weakly-Related Unlabeled Data Building on the similar ideas as self-taught learning, recently Yang et al. [Yang et al., 2008] presented an improved version of STL called “Semi-Supervised Learning with Weakly-Related Unlabeled Data” (SSLW). In particular, Yang et al. highlight that many SSL approaches are based on the cluster assumption, which, however is violated if the unlabeled data is only weakly related to the target classes. SSLW also tries to find a better data representation that is both informative to the target class and consistent with the feature coherence patterns of the weakly related unlabeled data. In more detail, out of the labeled data D L , SSLW uses a document-word matrix M D = (d 1 , d 2 , . . . , d l ), where d i ∈ N V represents the word-frequency vector for document d i and V is the size of the vocabulary. Additionally, they make use of a second matrix, the word-document matrix G out of both labeled and unlabeled data. G = (g 1 , g 2 , . . . , g V ), where g i = (g i,1 , g i,2 , . . . , g i,n ) represents the occurance of the ith word in all the n documents. For a SVM-formulation one could now use M D in order to build the kernel K = MD TM D for the SVM’s dual formulation. However, such a kernel would discard weakly related documents, i.e., set the similarity to zero. Therefore, Yang et al. augment the kernel with a word-correlation matrix R ∈ R V ×V to be K = MD TRM D. In R, R ij represents the correlation between ith and the jth words. The goal is now to find the optimal R that maximizes the categorization margin. This is done by regularizing R according to G by introducing an internal representation of words W = (w 1 , w 2 , . . . , w V ), w i is the internal representation of the ith word. The word-correlation matrix can then be written as R = W T W . Now, the dual formulation of the SVM can be changed to a min-max problem in order to find both the maximum α and minimum R. min max α T e − 1 R∈∆,U,W α 2 (α ◦ y)T (MDRM T D )(α ◦ y) (3.16) Equation 3.16 can be efficiently solved using Second Order Cone Programming (SOCP) [Boyd and Vandenberghe, 2004]. For text categorization, SSLW has successfully demonstrated of being able to leverage the usage of both labeled and weakly-related unlabeled data in order to increase the generalization error and significantly outperformed selftaught learning and state-of-the-art SSL methods such as TSVM [Bennett and Demiriz, 1999] and manifold regularization [Belkin et al., 2006]. EigenTransfer Self-taught learning and SSLW can both also be considered as a transfer learning problem; however, without knowing the class labels of the source data.
3.10. Summary 41 Recently, Dai et al. [Dai et al., 2009] proposed a general transfer learning framework, EigenTransfer, were transfer learning from unlabeled data is considered a sub-problem of general transfer learning. In EigenTransfer, the main idea is to create a weighted graph G = (E, V ) of both source and target data, where the nodes V = {v i } n i=1 represent instances, features and labels and the edges E = {e ij } n i,j=1 represent the relations between end nodes, connecting the target and the source data. In particular, the weights φ ij of the edges are based on the number of co-occurances between the end nodes in the target and the source data. By learning the spectra, i.e., a set of eigenvalues, of this graph, Dai et al. obtain an eigen feature representation for all the nodes in the task graph. The new feature representation can then be used in order to transfer knowledge from the source to the target domain. In case of self-taught learning, in the task graph, the nodes that usually carry the labels of the source data are empty. [Dai et al., 2009] use normalized cut in order to learn the spectra of the graph. In more detail, they calculate the graph Laplacian L = D − W out of the adjacency matrix W and the diagonal matrix D. Then the first N eigenvectors of L build a new feature space where any supervised learning algorithm can be applied. In the experiments EigenTransfer has proven to be a meaningful approach to self-taught learning. Although the three methods shorty described in this section tried to tackle the problem of semi-supervised learning from unlabeled data that is either weakly related to the target class or the amount of good positive unlabeled data is limited, the problem is yet in its beginnings and still there remains several open questions and there is large room for further investigation. 3.10 Summary In this chapter, we have introduced the concept of semi-supervised learning and reviewed the current state-of-the-art. We have seen that there exist a vast amount of SSL algorithms, most of them extensions of popular supervised learning methods, and that they are already frequently used in many applications. Note that we left out a detailed review of semi-supervised kernel methods, because this would go beyond the scope of this work. However, we refer the interested reader to [Huang et al., 2006]. Most importantly, we showed that semi-supervised learning works by imposing certain assumptions, e.g., manifold or cluster assumptions, over the unlabeled data and that using the correct assumptions often decides over the success of an algorithm depending on the actual data set. In the following chapters, we will concentrate our further discussion on the usage of ensemble-based learning methods since they have demonstrated to be very powerful and are often used in various computer vision applications.
Page 1:
PhD Thesis Semi-Supervised Ensemble
Page 5:
Statutory Declaration I declare tha
Page 8 and 9:
Most of all, I would like to thank
Page 10 and 11:
learning. Finally, we hypothesize t
Page 12 and 13:
sten Teil dieser Arbeit schlagen wi
Page 14 and 15:
ii CONTENTS 3.6 Graph-based Methods
Page 16 and 17: iv CONTENTS 10 Conclusion 137 10.1
Page 18 and 19: vi LIST OF FIGURES 4.8 Performance
Page 20 and 21: viii LIST OF FIGURES 9.7 Comparison
Page 22 and 23: x LIST OF FIGURES
Page 24 and 25: xii LIST OF TABLES 8.2 Results and
Page 26 and 27: xiv LIST OF ALGORITHMS
Page 28 and 29: 2 Chapter 1. Introduction Figure 1.
Page 30 and 31: 4 Chapter 1. Introduction the liter
Page 32 and 33: 6 Chapter 1. Introduction 1.1 Contr
Page 34 and 35: 8 Chapter 1. Introduction
Page 36 and 37: 10 Chapter 2. Preliminaries and Not
Page 52 and 53: 26 Chapter 3. Overview of Semi-Supe
Page 70 and 71: 44 Chapter 4. SemiBoost and Visual
Page 88 and 89: 62 Chapter 5. On-line Semi-Supervis
Page 110 and 111: 84 Chapter 6. Semi-Supervised Rando
Page 116 and 117:
90 Chapter 6. Semi-Supervised Rando
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
98 Chapter 7. On-line Semi-Supervis
Page 126 and 127:
100 Chapter 7. On-line Semi-Supervi
Page 128 and 129:
102 Chapter 7. On-line Semi-Supervi
Page 130 and 131:
104 Chapter 8. Multiple Instance Le
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
116 Chapter 9. Visual Object Tracki
Page 144 and 145:
Page 146 and 147:
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Page 154 and 155:
Page 156 and 157:
Page 158 and 159:
Page 160 and 161:
Page 162 and 163:
Page 164 and 165:
138 Chapter 10. Conclusion As many
Page 166 and 167:
140 Chapter 10. Conclusion positive
Page 168 and 169:
142 Chapter 10. Conclusion
Page 170 and 171:
144 Chapter A. Publications (8) Mar
Page 172 and 173:
146 Chapter A. Publications
Page 174 and 175:
148 Chapter B. Acronyms SVM Support
Page 176 and 177:
150 BIBLIOGRAPHY [Balcan et al., 20
Page 178 and 179:
152 BIBLIOGRAPHY [Chapelle and Zien
Page 180 and 181:
154 BIBLIOGRAPHY [Gall and Lempinsk
Page 182 and 183:
156 BIBLIOGRAPHY [Leistner et al.,
Page 184 and 185:
158 BIBLIOGRAPHY [Nigam et al., 200
Page 186 and 187:
160 BIBLIOGRAPHY [Shalev-Shwartz, 2
Page 188 and 189:
162 BIBLIOGRAPHY [Xu et al., 2009]
show all

PhD Thesis Semi-Supervised Ensemble Methods for Computer Vision

Create successful ePaper yourself

Delete template?

Save as template?