10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.2 Supervised Learning 493.2.3 Bayesian ClassifiersBayesian Classi f iers is another one of the commonly used <strong>and</strong> popular approaches for supervisedlearning. In this section, the strategy will be briefly introduced. For more comprehensivedetail about Bayesian learning model refer [182, 33, 53].The key idea of Bayesian learning is that the learning process applies Bayes rule to update(or train) the prior distribution of the parameters in the model <strong>and</strong> computes the posteriordistribution for prediction. The Bayes rule can be represented as:Pr(s|D)= Pr(D|s)P(s)Pr(D)(3.2)where Pr(s) is the priori probability of the sample s, Pr(D) is the priori probability of thetraining data D, Pr(s|D) is the probability of s given D, <strong>and</strong> Pr(D|s) is the probability of Dgiven s.We give an example to illustrate the Bayesian learning process with analysis [53]. Notethat for simplicity <strong>and</strong> without loss of generality, we assume that each sample can be labeledby one class label or a set of class labels. The sample generation process can be modeled as:(1) each class c has an associated prior probability Pr(c), with ∑ c Pr(c) =1. The sample sfirst r<strong>and</strong>omly chooses a class label with the corresponding probability; <strong>and</strong> (2) based on theclass-conditional distribution Pr(s|c) <strong>and</strong> the chosen class label c, we can obtain the overallprobability to generate the sample, i.e., Pr(c)Pr(s|c). The posterior probability of s generatedfrom class c is thus can be deduced by using Bayes’s rulePr(c|s)=Pr(c)Pr(s|c)(3.3)∑ γ Pr(γ)Pr(s|γ)where γ crosses all the classes. Note that the probability Pr(s|c) can be determined by twokinds of parameters: (1) the priori domain knowledge unrelated to the training data; <strong>and</strong> (2)the parameters which belong to the training samples. For simplicity, the overall parameters aredenoted as Θ <strong>and</strong> thus, Equation 3.3 can be extended asPr(c|s)=∑ΘPr(c|s,Θ)Pr(Θ|S)=∑ΘPr(c|Θ)Pr(s|c,Θ)Pr(Θ|S) (3.4)∑ γ Pr(γ|Θ)Pr(s|γ,Θ)The sum may be taken to an integral in the limit for a continuous parameter space, which isthe common case. In effect, because we only know the training data for sure <strong>and</strong> are not sure ofthe parameter values, we are summing over all possible parameter values. Such a classificationframework is called Bayes optimal.Generally, it is very difficult to estimate the probability Pr(Θ|S) due to the limitation oncomputing ability. Therefore, a practical strategy named maximum likelihood estimate (MLE)is commonly applied <strong>and</strong> argmax Θ Pr(Θ|S) is computed.The advantages of Bayesian classifiers are: (1) it can outperform other classifiers whenthe size of training data is small; <strong>and</strong> (2) it is a model based on probabilistic theory, whichis robust to noise in real data. The limitation of Bayesian classifiers is that they are typicallylimited to learning classes that occupy contiguous regions of the instance space.Naive Bayes Learners

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!