Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

More documents

Recommendations

Info

118 6 Web Usage Mining6.2 Web Usage Mining using Probabilistic Latent SemanticAnalysisIn previous section, we partially discuss the topic of using latent semantic analysis in Webusage mining. The capability of the mentioned approach, i.e. latent semantic indexing, however,is limited, though it is able to map the original user sessions onto a more latent semanticspace but without revealing the semantic space itself. In contrast, another variant of LSI, ProbabilisticLatent Semantic Analysis (PLSA) is a promising paradigm which can not only revealunderlying correlation hidden in Web co-occurrence observation, but also identify the latenttask factor associated with usage knowledge.In this section Probabilistic Latent Semantic Analysis (PLSA) model is introduced intoWeb usage mining, to generate Web user groups and Web page clusters based on latent usageanalysis [257, 127].6.2.1 Probabilistic Latent Semantic Analysis ModelThe PLSA model has been firstly presented and successfully applied in text mining by [118].In contrast to the standard LSI algorithm, which utilizes the Frobenius norm as an optimizationcriterion, PLSA model is based on a maximum likelihood principle, which is derived from theuncertainty theory in statistics.Basically, the PLSA model is based on a statistic model called aspect model, which canbe utilized to identify the hidden semantic relationships among general co-occurrence activities.Theoretically, we can conceptually view the user sessions over Web pages space asco-occurrence activities in the context of Web usage mining, to infer the latent usage pattern.Given the aspect model over user access pattern in the context of Web usage mining, it isfirst assumed that there is a latent factor space Z =(z 1 ,z 2 ,···z k ) , and each co-occurrenceobservation data (s i , p j ) (i.e. the visit of page p j in user session s i ) is associated with the factorz k ∈ Z by a varying degree to z k . According to the viewpoint of the aspect model, it canbe inferred that there do exist different relationships among Web users or pages correspondingto different factors. Furthermore, the different factors can be considered to represent thecorresponding user access pattern. For example, during a Web usage mining process on ane-commerce website, we can define that there exist k latent factors associated with k kinds ofnavigational behavior patterns, such as z 1 factor standing for having interests in sports-specificproduct category, z 2 for sale product interest and z 3 for browsing through a variety of productpages in different categories and z 4 cdots etc. In this manner, each co-occurrence observationdata (s i , p j ) may convey user navigational interest by mapping the observation data into thek-dimensional latent factor space. The degree, to which such relationships are “explained” byeach factors, is derived by a conditional probability distribution associated with the Web usagedata. Thus, the goal of employing PLSA model, therefore, is to determine the conditionalprobability distribution, in turn, to reveal the intrinsic relationships among Web users or pagesbased on a probability inference approach. In one word, the PLSA model is to model andinfer user navigational behavior in a latent semantic space, and identify the latent factor associated.Before we propose the PLSA based algorithm for Web usage mining, it is necessaryto introduce the mathematical background of the PLSA model, and the algorithm which isused to estimate the conditional probability distribution. Firstly, let’s introduce the followingprobability definitions:• P(s i ) denotes the probability that a particular user session s i will be observed in the occurrencesdata,
6.2 Web Usage Mining using Probabilistic Latent Semantic Analysis 119• P(z k |s i ) denotes a user session-specific probability distribution on the latent class factorz k ,• P(p j |z k ) denotes the class-conditional probability distribution of pages over a specificlatent variable z k .Based on these definitions, the probabilistic latent semantic model can be expressed in thefollowing way:• Select a user session s i with a probability P(s i ) ,• Pick a hidden factor z k with a probability P(z k |s i ),• Generate a page p j with probability P(p j |z k ) ;As a result, we can obtain an occurrence probability of an observed pair (s i , p j ) by adoptingthe latent factor variable z k . Translating this process into a probability model results in theexpression:P(s i , p j )=P(s i ) · P(p j |s i ) (6.13)where P(p j |s i )= ∑ P(p j |z) · P(z|s i ).z∈ZBy applying the Bayesian formula, a re-parameterized version will be transformed basedon above equations asP(s i , p j )=∑ P(z)P(s i |z)P(p j |z) (6.14)z∈ZFollowing the likelihood principle, we can determine the total likelihood of the observation as∑Li = m(s i , p j ) · logP(s i , p j ) (6.15)s i ∈S,p j ∈Pwhere m(s i , p j ) corresponds to the entry of the session-pageview matrix associated with sessions i and pageview p j , which is discussed in the previous section.In order to maximize the total likelihood, it needs to repeatedly generate the conditionalprobabilities of P(z), P(s i |z ) and P(p j |z ) by utilizing the usage observation data. Knownfrom statistics, Expectation-Maximization (EM) algorithm is an efficient procedure to performmaximum likelihood estimation in latent variable model [72]. Generally, two steps need toimplement in the procedure alternately: (1) Expectation (E) step where posterior probabilitiesare calculated for the latent factors based on the current estimates of conditional probability,and (2) Maximization (M) step, where the estimated conditional probabilities are updated andused to maximize the likelihood based on the posterior probabilities computed in the previousE step.The whole procedure is given as follows:First, given the randomized initial values of P(z), P(s i |z ), P(p j |z )then, in E-step, we can simply apply Bayesian formula to generate the following variablebased on the usage observation:P(z k |s i , p j )= P(z k)P(s i |z k )P(p j |z k )∑ P(z k )P(s i |z k )P(p j |z k )z k ∈Z(6.16)furthermore, in M-step, we compute:P(p j |z k )=∑ m(s i , p j )P(z k |s i , p j )s i ∈S∑ m(s i , p ′s i ∈S,p ′ j ∈P j )P(z k|s i , p ′ j ) (6.17)
Page 2 and 3:
Web Mining and Social Networking
Page 4:
Guandong Xu • Yanchun Zhang • L
Page 8 and 9:
VIIIPrefacefollowing characteristic
Page 11:
Acknowledgements: We would like to
Page 14 and 15:
XIVContents3.1.2 Basic Algorithms f
Page 16 and 17:
XVIContentsPart III Social Networki
Page 19:
Part IFoundation
Page 22 and 23:
4 1 Introduction(3). Learning usefu
Page 24 and 25:
6 1 Introductioncalled computationa
Page 26 and 27:
8 1 Introduction• The data on the
Page 28 and 29:
10 1 Introductionin a broad range t
Page 31 and 32:
2Theoretical BackgroundsAs discusse
Page 33 and 34:
2.2 Textual, Linkage and Usage Expr
Page 35 and 36:
2.4 Eigenvector, Principal Eigenvec
Page 37 and 38:
2.5 Singular Value Decomposition (S
Page 39 and 40:
2.6 Tensor Expression and Decomposi
Page 41 and 42:
2.7 Information Retrieval Performan
Page 43 and 44:
2.8 Basic Concepts in Social Networ
Page 45:
2.8 Basic Concepts in Social Networ
Page 48 and 49:
30 3 Algorithms and TechniquesTable
Page 50 and 51:
32 3 Algorithms and TechniquesSpeci
Page 52 and 53:
34 3 Algorithms and Techniquesa sub
Page 54 and 55:
36 3 Algorithms and TechniquesMetho
Page 56 and 57:
38 3 Algorithms and TechniquesCusto
Page 58 and 59:
40 3 Algorithms and TechniquesTable
Page 60 and 61:
42 3 Algorithms and Techniquesa bSI
Page 62 and 63:
44 3 Algorithms and Techniques{a}10
Page 64 and 65:
46 3 Algorithms and Techniques3.2 S
Page 66 and 67:
48 3 Algorithms and TechniquesConce
Page 68 and 69:
50 3 Algorithms and TechniquesNaive
Page 70 and 71:
52 3 Algorithms and Techniquesuses
Page 72 and 73:
54 3 Algorithms and Techniquesin th
Page 74 and 75:
56 3 Algorithms and Techniques// Fu
Page 76 and 77:
58 3 Algorithms and Techniquesendd
Page 78 and 79:
60 3 Algorithms and Techniquesstart
Page 80 and 81:
62 3 Algorithms and TechniquesHere
Page 82 and 83:
64 3 Algorithms and Techniques3.8.2
Page 84 and 85:
66 3 Algorithms and Techniquesfor e
Page 86 and 87: 68 3 Algorithms and Techniquesthat
Page 89 and 90: 4Web Content MiningIn recent years
Page 91 and 92: score(q,d)=4.2 Web Search 73V(q) ·
Page 93 and 94: 4.2 Web Search 75algorithm. The Web
Page 95 and 96: 4.3 Feature Enrichment of Short Tex
Page 97 and 98: 4.4 Latent Semantic Indexing 794.4
Page 99 and 100: Notation4.5 Automatic Topic Extract
Page 101 and 102: 4.5 Automatic Topic Extraction from
Page 103 and 104: 4.6 Opinion Search and Opinion Spam
Page 105: 4.6 Opinion Search and Opinion Spam
Page 108 and 109: 90 5 Web Linkage Mining5.2 Co-citat
Page 110 and 111: 92 5 Web Linkage Mining{ /1 out deg
Page 112 and 113: 94 5 Web Linkage Mininga =(a(1),·
Page 114 and 115: 96 5 Web Linkage Mining5.4.1 Bipart
Page 116 and 117: 98 5 Web Linkage MiningNext, consid
Page 118 and 119: 100 5 Web Linkage Mining(5) Creatin
Page 120 and 121: 102 5 Web Linkage Miningpower-law d
Page 122 and 123: 104 5 Web Linkage MiningFig. 5.10.
Page 124 and 125: 106 5 Web Linkage Miningbetween use
Page 126 and 127: 6Web Usage MiningIn previous chapte
Page 129 and 130: 6.1 Modeling Web User Interests usi
Page 135: 6.1 Modeling Web User Interests usi
Page 139 and 140: 6.2 Web Usage Mining using Probabil
Page 141 and 142: 6.2 Web Usage Mining using Probabil
Page 143 and 144: 6.3 Finding User Access Pattern via
Page 149 and 150: 6.4 Co-Clustering Analysis of weblo
Page 151 and 152: 6.5 Web Usage Mining Applications 1
Page 161: Part IIISocial Networking and Web R
Page 164 and 165: 146 7 Extracting and Analyzing Web
Page 186 and 187:
168 7 Extracting and Analyzing Web
Page 188 and 189:
170 8 Web Mining and Recommendation
Page 190 and 191:
Page 192 and 193:
Page 194 and 195:
Page 196 and 197:
Page 198 and 199:
Page 200 and 201:
Page 202 and 203:
Page 204 and 205:
Page 206 and 207:
Page 208 and 209:
190 9 Conclusionsries commonly used
Page 210 and 211:
192 9 Conclusionsas computer scienc
Page 212 and 213:
194 9 Conclusionsresearches have de
Page 214 and 215:
196 References14. J. Ayres, J. Gehr
Page 216 and 217:
198 References49. D. Chakrabarti, R
Page 218 and 219:
200 References82. C. Dwork, R. Kuma
Page 220 and 221:
202 References119. J. Hou and Y. Zh
Page 222 and 223:
204 References151. A. N. Langville
Page 224 and 225:
206 References186. J. K. Mui and K.
Page 226 and 227:
208 References223. C. Shahabi, A. M
Page 228:
210 References260. G.-R. Xue, D. Sh
show all

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?