Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

More documents

Recommendations

Info

128 6 Web Usage MiningAlgorithm 6.7: Variational EM AlgorithmStep 1: (E-step) For each document, find the optimizing values of variational parameters θ ∗ mand φ ∗ m .Step 2: (M-step) Maximize the resulting low bound on the likelihood with respect to modelparameters α and β. This corresponds to finding the maximum likelihood estimate with theapproximate posterior which is computed in the E-step.The E-step and M-step are executed iteratively until a maximum likelihood value reaches.Meanwhile, the calculated estimation parameters can be used to infer topic distribution ofa new document by performing the variational inference. More details with respect to thevariational EM algorithm are referred to [35].Web Usage Data ModelIn section 6.1.2, we give a Web data model of session-pageview matrix. Here we use the samedata expression, which summarized as:• S = {s i ,i = 1,...,m}: a set of m user sessions.• P = {p j , j = 1,...,n} : a set of n Web pageviews.• For each user, a user session is represented as a vector of visited pageviews with correspondingweights: s i = {a ij , j = 1,...,n}, where a ij denotes the weight for page p j visitedin s i user session. The corresponding weight is usually determined by the number of hit orthe amount time spent on the specific page. Here, we use the latter to construct the usagedata for the dataset used in experimental study.• SP m×n = {a ij ,i = 1,...,m, j = 1,...,n}: the ultimate usage data in the form of a weightmatrix with a dimensionality of m × n .6.3.2 Modeling User Navigational Task via LDASimilar to capturing the underlying topics over the word vocabulary and each document’sprobability distribution over the mixing topic space, LDA is also able to be used to discoverthe hidden access tasks and the user preference mixtures over the uncovered task space fromuser navigational observations. That is, from the usage data, LDA identifies the hidden tasksand represent each task as a simplex of Web pages. Furthermore, LDA characterizes each Webuser session as a simplex of these discovered tasks. In other words, LDA reveals two aspectsof underlying usage information to us: the hidden task space and the task mixture distributionof each Web user session, which reflect the underlying correlations between Web pages aswell as Web user sessions. With the discovered task-simplex expressions, it is viable to modelthe user access patterns in the forms of weighted page vectors, in turn, to predict the targetuser’s potentially interested pages by employing a collaborative recommendation algorithm.In the following part, we first discuss how to discover the user access patterns in terms oftask-simplex expression based on LDA model and how to infer a target user’s navigationalpreference by using the estimated task-simplex space, and then present a collaborative recommendationalgorithm by incorporating the inferred navigational task distribution of the targetuser with the discovered user access patterns to make Web recommendations.Similar to the implementation of document-topic expression in text mining discussedabove, viewing Web user sessions as mixtures of tasks makes it possible to formulate theproblem of identifying the set of underlying tasks hidden in a collection of user clicks. Givenm Web user sessions containing t hidden tasks expressed over n distinctive pageviews, we can
6.3 Finding User Access Pattern via Latent Dirichlet Allocation Model 129represent P r (p|z) with a set of t multinomial distributions φ over the n Web pages, such that fora hidden task z k , the associative likelihood on the page p j is P r (p j |z k )=φ k, j , j = 1,...,n,k =1,...,t, and P r (z|s) with a set of m multinomial distributions θ over the t tasks, such that for aWeb session s i , the associative likelihood on the task z k is P r (z k |s i )=θ i,k .Here we use the LDA model described above to generate φ and θ resulting in the maximumlikelihood estimates of LDA model in the Web usage data. The complete probabilitymodel is as follows:θ i ∼ Dirichlet(α)z k |θ i ∼ Discrete(θ i )(6.28)φ k ∼ Dirichlet(β)p j |φ k ∼ Discrete(φ k )Here, z k stands for a specific task, θ i denotes the Web session s i ’s navigational preferencedistribution over the tasks and φ k represents the specific task z k ’s association distribution overthe pages. Parameters α and β are the hyper-parameters of the prior variables of θ and φ.We use the variational inference algorithm described above to estimate each Web session’scorrelation with the multiple tasks (θ), and the associations between the tasks and Web pages(φ), with which we can capture the user visit preference distribution exhibited by each usersession and identify the semantics of the task space. Interpreting the contents of the prominentpages related to each task based on φ will eventually result in defining the nature of each task.Meanwhile, the task-based user access patterns are constructed by examining the calculateduser session’s associations with the multiple tasks and aggregating all sessions whose associativedegrees with a specific task are greater than a threshold. We describe our approach todiscovering the task-based user access patterns below.Given this representation, for each latent task, (1) we can consider user sessions with θ iexceeding a threshold as the “prototypical” user sessions associated with that task. In otherwords, these top user sessions contribute significantly to the navigational pattern of this task,in turn, are used to construct this task-specific user access pattern; (2) we select all Web pagesas contributive pages of each task dependent on the values of φ k , and capture the semantics ofthe task by interpreting the contents of those pages.Thus, for each latent task corresponding to one access pattern, we choose all user sessionswith θ i exceeding a certain threshold as the candidates of this specific access pattern. As eachuser session is represented by a weighted page vector in the original usage data space, we cancreate an aggregate of user sessions to represent this task-specific access pattern in the form ofweighted page vector. The algorithm of generating the task-specific access pattern is describedas follows:Algorithm 6.8: Building Task-Specific User Access PatternsInput: the discovered session-task preference distribution matrix θ, m user sessions S = {s i ,i =1,...,m}, and the predefined threshold μ.Output: task-specific user access patterns, TAP = {ap k ,k = 1,···,t}Step 1: For each latent task z k , choose all user sessions with θ i,k > μ to construct a user sessionaggregation R k corresponding to z k ;R k = { s i∣ ∣ θ i,k > μ,k = 1,···,t } (6.29)Step 2: Within the R k , compute the aggregated task-specific user access pattern in terms of aweighted page vector by taking the sessions’ associations with z k , i.e. θ i,k , into account:
Page 2 and 3:
Web Mining and Social Networking
Page 4:
Guandong Xu • Yanchun Zhang • L
Page 8 and 9:
VIIIPrefacefollowing characteristic
Page 11:
Acknowledgements: We would like to
Page 14 and 15:
XIVContents3.1.2 Basic Algorithms f
Page 16 and 17:
XVIContentsPart III Social Networki
Page 19:
Part IFoundation
Page 22 and 23:
4 1 Introduction(3). Learning usefu
Page 24 and 25:
6 1 Introductioncalled computationa
Page 26 and 27:
8 1 Introduction• The data on the
Page 28 and 29:
10 1 Introductionin a broad range t
Page 31 and 32:
2Theoretical BackgroundsAs discusse
Page 33 and 34:
2.2 Textual, Linkage and Usage Expr
Page 35 and 36:
2.4 Eigenvector, Principal Eigenvec
Page 37 and 38:
2.5 Singular Value Decomposition (S
Page 39 and 40:
2.6 Tensor Expression and Decomposi
Page 41 and 42:
2.7 Information Retrieval Performan
Page 43 and 44:
2.8 Basic Concepts in Social Networ
Page 45:
2.8 Basic Concepts in Social Networ
Page 48 and 49:
30 3 Algorithms and TechniquesTable
Page 50 and 51:
32 3 Algorithms and TechniquesSpeci
Page 52 and 53:
34 3 Algorithms and Techniquesa sub
Page 54 and 55:
36 3 Algorithms and TechniquesMetho
Page 56 and 57:
38 3 Algorithms and TechniquesCusto
Page 58 and 59:
40 3 Algorithms and TechniquesTable
Page 60 and 61:
42 3 Algorithms and Techniquesa bSI
Page 62 and 63:
44 3 Algorithms and Techniques{a}10
Page 64 and 65:
46 3 Algorithms and Techniques3.2 S
Page 66 and 67:
48 3 Algorithms and TechniquesConce
Page 68 and 69:
50 3 Algorithms and TechniquesNaive
Page 70 and 71:
52 3 Algorithms and Techniquesuses
Page 72 and 73:
54 3 Algorithms and Techniquesin th
Page 74 and 75:
56 3 Algorithms and Techniques// Fu
Page 76 and 77:
58 3 Algorithms and Techniquesendd
Page 78 and 79:
60 3 Algorithms and Techniquesstart
Page 80 and 81:
62 3 Algorithms and TechniquesHere
Page 82 and 83:
64 3 Algorithms and Techniques3.8.2
Page 84 and 85:
66 3 Algorithms and Techniquesfor e
Page 86 and 87:
68 3 Algorithms and Techniquesthat
Page 89 and 90:
4Web Content MiningIn recent years
Page 91 and 92:
score(q,d)=4.2 Web Search 73V(q) ·
Page 93 and 94:
4.2 Web Search 75algorithm. The Web
Page 95 and 96: 4.3 Feature Enrichment of Short Tex
Page 97 and 98: 4.4 Latent Semantic Indexing 794.4
Page 99 and 100: Notation4.5 Automatic Topic Extract
Page 101 and 102: 4.5 Automatic Topic Extraction from
Page 103 and 104: 4.6 Opinion Search and Opinion Spam
Page 105: 4.6 Opinion Search and Opinion Spam
Page 108 and 109: 90 5 Web Linkage Mining5.2 Co-citat
Page 110 and 111: 92 5 Web Linkage Mining{ /1 out deg
Page 112 and 113: 94 5 Web Linkage Mininga =(a(1),·
Page 114 and 115: 96 5 Web Linkage Mining5.4.1 Bipart
Page 116 and 117: 98 5 Web Linkage MiningNext, consid
Page 118 and 119: 100 5 Web Linkage Mining(5) Creatin
Page 120 and 121: 102 5 Web Linkage Miningpower-law d
Page 122 and 123: 104 5 Web Linkage MiningFig. 5.10.
Page 124 and 125: 106 5 Web Linkage Miningbetween use
Page 126 and 127: 6Web Usage MiningIn previous chapte
Page 129 and 130: 6.1 Modeling Web User Interests usi
Page 137 and 138: 6.2 Web Usage Mining using Probabil
Page 143 and 144: 6.3 Finding User Access Pattern via
Page 145: 6.3 Finding User Access Pattern via
Page 149 and 150: 6.4 Co-Clustering Analysis of weblo
Page 151 and 152: 6.5 Web Usage Mining Applications 1
Page 161: Part IIISocial Networking and Web R
Page 164 and 165: 146 7 Extracting and Analyzing Web
Page 188 and 189: 170 8 Web Mining and Recommendation
Page 196 and 197:
178 8 Web Mining and Recommendation
Page 198 and 199:
Page 200 and 201:
Page 202 and 203:
Page 204 and 205:
Page 206 and 207:
Page 208 and 209:
190 9 Conclusionsries commonly used
Page 210 and 211:
192 9 Conclusionsas computer scienc
Page 212 and 213:
194 9 Conclusionsresearches have de
Page 214 and 215:
196 References14. J. Ayres, J. Gehr
Page 216 and 217:
198 References49. D. Chakrabarti, R
Page 218 and 219:
200 References82. C. Dwork, R. Kuma
Page 220 and 221:
202 References119. J. Hou and Y. Zh
Page 222 and 223:
204 References151. A. N. Langville
Page 224 and 225:
206 References186. J. K. Mui and K.
Page 226 and 227:
208 References223. C. Shahabi, A. M
Page 228:
210 References260. G.-R. Xue, D. Sh
show all

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Create successful ePaper yourself

Delete template?

Save as template?