10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.4 Semi-supervised Learning 57While Unlabeled data U ̸=/0 doTrain the classifier c based on L;Use c to classify the data in U;X=/0;For each data d ∈ U doIf d.con f ident ≥ t then X=X ∪ d;endL=L+X;U=U-X;endIn a brief summary, the advantages of the self-training approach are: (1) the method isintuitive <strong>and</strong> simple can be wrapped to other more complex classifiers; <strong>and</strong> (2) it can be appliedto many real applications, i.e., word sense disambiguation [265]. The disadvantage of the selftrainingis that the early error could be accumulated <strong>and</strong> enlarged in the later procedures.3.4.2 Co-TrainingBlum et al. [37] first proposed the co-training algorithm based on the idea of self-trainingby employing two classifiers. In the paper, the authors have made such assumptions that thefeatures of data can be partitioned into two independent subsets (w.r.t class) <strong>and</strong> each subsetcan be used to train a good classifier. The process of the algorithm thus can be fulfilled as adual training of the two classifiers to each other.The pseudo code of self-training is shown in Algorithm 3.11 [53].Algorithm 3.11: The co-training algorithmInput: A labeled dataset L, an unlabeled dataset U, a confident threshold tOutput: All data with labeled classSplit features into two sets, F 1 <strong>and</strong> F 2 ;Initial two classifiers c 1 <strong>and</strong> c 2 ;For ach data d ∈ L doProject d to the two feature sets, F 1 <strong>and</strong> F 2 , resulting in d 1 <strong>and</strong> d 2 ;Train c 1 based on d 1 <strong>and</strong> c 2 based on d 2 ;endWhile Unlabeled data U ̸=/0 do// Train c 2 by c 1 ;Use c 1 to classify U in the feature space F 1 ;X 1 =/0;For each data d ∈ U doIf d.con f ident ≥ t then X 1 =X 1 ∪ d ;endd 2 =d 2 ∪ X 1 ;// Train c 1 by c 2 ;Use c 2 to classify U in the feature space F 2 ;X 2 =/0;For each data d ∈ U doIf d.con f ident ≥ t then X 2 =X 2 ∪ d;

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!