Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

More documents

Recommendations

Info

48 3 Algorithms and TechniquesConcept Learning System (CLS) [83]. A later famous algorithm developed based on ID3 isC4.5 [210]. In this section we briefly introduce the ID3 algorithm.The intuitive idea of ID3 is that it searches the training data and looks for the attribute thatcan best separate the samples. If the attribute perfectly groups them then the algorithm terminates.Otherwise it will operate on m (a tunable constant) subsets to get the optimal attributein a recursive manner. Greedy search is applied in the algorithm. The pseudo code of ID3 isshown in Algorithm 3.4 [1].Algorithm 3.4: ID3 Decision Tree AlgorithmInput: A set of non-target attributes R, the target attribute C, a training set SOutput: A decision treeIf S is empty thenReturn a single node with value Failure;endIf S consists of records all with the same value for the target attribute thenReturn a single leaf node with that value;endIf R is empty thenReturn a single node with the value of the most frequent of the values of the target attributethat are found in records of S;endLet A be the attribute with largest Gain(A,S) among attributes in R;Let {a j | j = 1,2,...,m} be the values of attribute A;Let {S j | j = 1,2,...,m} be the subsets of S consisting respectively of records with value a j forA;Return a tree with root labeled A and arcs labeled a 1 ,a 2 ,...a m going respectively to the trees(ID3(R − A,C,S1),ID3(R − A,C,S2),...,ID3(R − A,C,S m );Recursively apply ID3 to subsets {S j | j = 1,2,...,m} until they are empty;The main advantages of the decision tree classifier are [215]: (1) complex decisions canbe divided into many simpler local decisions and this merit especially benefits those datawith high-dimensional spaces; (2) a sample is only tested on certain subsets of classes, whichimproves the efficiency compared with the traditional single-stage learners that each sampleis tested on all the classes; and (3) it can flexibly choose different subset of features during thetree traversing while the traditional single-stage learners only select one subset features. Thismerit on flexibility may improve the performance [186].The main drawbacks of the decision tree classifier are: (1) if there are huge number ofclasses, the tree would may become too large to be searched efficiently, while at the same timethe memory usage may be very high; (2) errors may be accumulated as we traverse the tree.There is a trade-off between the accuracy and the efficiency and thus, for any given accuracyan efficiency bound should be hold; and (3) it is not easy to optimize the decision tree yet theperformance of the classifier strongly depends on how well the tree is built.
3.2 Supervised Learning 493.2.3 Bayesian ClassifiersBayesian Classi f iers is another one of the commonly used and popular approaches for supervisedlearning. In this section, the strategy will be briefly introduced. For more comprehensivedetail about Bayesian learning model refer [182, 33, 53].The key idea of Bayesian learning is that the learning process applies Bayes rule to update(or train) the prior distribution of the parameters in the model and computes the posteriordistribution for prediction. The Bayes rule can be represented as:Pr(s|D)= Pr(D|s)P(s)Pr(D)(3.2)where Pr(s) is the priori probability of the sample s, Pr(D) is the priori probability of thetraining data D, Pr(s|D) is the probability of s given D, and Pr(D|s) is the probability of Dgiven s.We give an example to illustrate the Bayesian learning process with analysis [53]. Notethat for simplicity and without loss of generality, we assume that each sample can be labeledby one class label or a set of class labels. The sample generation process can be modeled as:(1) each class c has an associated prior probability Pr(c), with ∑ c Pr(c) =1. The sample sfirst randomly chooses a class label with the corresponding probability; and (2) based on theclass-conditional distribution Pr(s|c) and the chosen class label c, we can obtain the overallprobability to generate the sample, i.e., Pr(c)Pr(s|c). The posterior probability of s generatedfrom class c is thus can be deduced by using Bayes’s rulePr(c|s)=Pr(c)Pr(s|c)(3.3)∑ γ Pr(γ)Pr(s|γ)where γ crosses all the classes. Note that the probability Pr(s|c) can be determined by twokinds of parameters: (1) the priori domain knowledge unrelated to the training data; and (2)the parameters which belong to the training samples. For simplicity, the overall parameters aredenoted as Θ and thus, Equation 3.3 can be extended asPr(c|s)=∑ΘPr(c|s,Θ)Pr(Θ|S)=∑ΘPr(c|Θ)Pr(s|c,Θ)Pr(Θ|S) (3.4)∑ γ Pr(γ|Θ)Pr(s|γ,Θ)The sum may be taken to an integral in the limit for a continuous parameter space, which isthe common case. In effect, because we only know the training data for sure and are not sure ofthe parameter values, we are summing over all possible parameter values. Such a classificationframework is called Bayes optimal.Generally, it is very difficult to estimate the probability Pr(Θ|S) due to the limitation oncomputing ability. Therefore, a practical strategy named maximum likelihood estimate (MLE)is commonly applied and argmax Θ Pr(Θ|S) is computed.The advantages of Bayesian classifiers are: (1) it can outperform other classifiers whenthe size of training data is small; and (2) it is a model based on probabilistic theory, whichis robust to noise in real data. The limitation of Bayesian classifiers is that they are typicallylimited to learning classes that occupy contiguous regions of the instance space.Naive Bayes Learners
Page 2 and 3:
Web Mining and Social Networking
Page 4:
Guandong Xu • Yanchun Zhang • L
Page 8 and 9:
VIIIPrefacefollowing characteristic
Page 11:
Acknowledgements: We would like to
Page 14 and 15:
XIVContents3.1.2 Basic Algorithms f
Page 16 and 17: XVIContentsPart III Social Networki
Page 19: Part IFoundation
Page 22 and 23: 4 1 Introduction(3). Learning usefu
Page 24 and 25: 6 1 Introductioncalled computationa
Page 26 and 27: 8 1 Introduction• The data on the
Page 28 and 29: 10 1 Introductionin a broad range t
Page 31 and 32: 2Theoretical BackgroundsAs discusse
Page 33 and 34: 2.2 Textual, Linkage and Usage Expr
Page 35 and 36: 2.4 Eigenvector, Principal Eigenvec
Page 37 and 38: 2.5 Singular Value Decomposition (S
Page 39 and 40: 2.6 Tensor Expression and Decomposi
Page 41 and 42: 2.7 Information Retrieval Performan
Page 43 and 44: 2.8 Basic Concepts in Social Networ
Page 45: 2.8 Basic Concepts in Social Networ
Page 48 and 49: 30 3 Algorithms and TechniquesTable
Page 50 and 51: 32 3 Algorithms and TechniquesSpeci
Page 52 and 53: 34 3 Algorithms and Techniquesa sub
Page 54 and 55: 36 3 Algorithms and TechniquesMetho
Page 56 and 57: 38 3 Algorithms and TechniquesCusto
Page 58 and 59: 40 3 Algorithms and TechniquesTable
Page 60 and 61: 42 3 Algorithms and Techniquesa bSI
Page 62 and 63: 44 3 Algorithms and Techniques{a}10
Page 64 and 65: 46 3 Algorithms and Techniques3.2 S
Page 68 and 69: 50 3 Algorithms and TechniquesNaive
Page 70 and 71: 52 3 Algorithms and Techniquesuses
Page 72 and 73: 54 3 Algorithms and Techniquesin th
Page 74 and 75: 56 3 Algorithms and Techniques// Fu
Page 76 and 77: 58 3 Algorithms and Techniquesendd
Page 78 and 79: 60 3 Algorithms and Techniquesstart
Page 80 and 81: 62 3 Algorithms and TechniquesHere
Page 82 and 83: 64 3 Algorithms and Techniques3.8.2
Page 84 and 85: 66 3 Algorithms and Techniquesfor e
Page 86 and 87: 68 3 Algorithms and Techniquesthat
Page 89 and 90: 4Web Content MiningIn recent years
Page 91 and 92: score(q,d)=4.2 Web Search 73V(q) ·
Page 93 and 94: 4.2 Web Search 75algorithm. The Web
Page 95 and 96: 4.3 Feature Enrichment of Short Tex
Page 97 and 98: 4.4 Latent Semantic Indexing 794.4
Page 99 and 100: Notation4.5 Automatic Topic Extract
Page 101 and 102: 4.5 Automatic Topic Extraction from
Page 103 and 104: 4.6 Opinion Search and Opinion Spam
Page 105: 4.6 Opinion Search and Opinion Spam
Page 108 and 109: 90 5 Web Linkage Mining5.2 Co-citat
Page 110 and 111: 92 5 Web Linkage Mining{ /1 out deg
Page 112 and 113: 94 5 Web Linkage Mininga =(a(1),·
Page 114 and 115: 96 5 Web Linkage Mining5.4.1 Bipart
Page 116 and 117:
98 5 Web Linkage MiningNext, consid
Page 118 and 119:
100 5 Web Linkage Mining(5) Creatin
Page 120 and 121:
102 5 Web Linkage Miningpower-law d
Page 122 and 123:
104 5 Web Linkage MiningFig. 5.10.
Page 124 and 125:
106 5 Web Linkage Miningbetween use
Page 126 and 127:
6Web Usage MiningIn previous chapte
Page 129 and 130:
6.1 Modeling Web User Interests usi
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
6.2 Web Usage Mining using Probabil
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
6.3 Finding User Access Pattern via
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
6.4 Co-Clustering Analysis of weblo
Page 151 and 152:
6.5 Web Usage Mining Applications 1
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161:
Part IIISocial Networking and Web R
Page 164 and 165:
146 7 Extracting and Analyzing Web
Page 166 and 167:
Page 168 and 169:
Page 170 and 171:
Page 172 and 173:
Page 174 and 175:
Page 176 and 177:
Page 178 and 179:
Page 180 and 181:
Page 182 and 183:
Page 184 and 185:
Page 186 and 187:
Page 188 and 189:
170 8 Web Mining and Recommendation
Page 190 and 191:
Page 192 and 193:
Page 194 and 195:
Page 196 and 197:
Page 198 and 199:
Page 200 and 201:
Page 202 and 203:
Page 204 and 205:
Page 206 and 207:
Page 208 and 209:
190 9 Conclusionsries commonly used
Page 210 and 211:
192 9 Conclusionsas computer scienc
Page 212 and 213:
194 9 Conclusionsresearches have de
Page 214 and 215:
196 References14. J. Ayres, J. Gehr
Page 216 and 217:
198 References49. D. Chakrabarti, R
Page 218 and 219:
200 References82. C. Dwork, R. Kuma
Page 220 and 221:
202 References119. J. Hou and Y. Zh
Page 222 and 223:
204 References151. A. N. Langville
Page 224 and 225:
206 References186. J. K. Mui and K.
Page 226 and 227:
208 References223. C. Shahabi, A. M
Page 228:
210 References260. G.-R. Xue, D. Sh
show all

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Create successful ePaper yourself

Delete template?

Save as template?