Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

More documents

Recommendations

Info

130 6 Web Usage Miningap k =∑s i ∈R kθ i,k · s i|R k |(6.30)where |R k | is the number of the chosen user sessions in R k .Step 3: Output a set of task-specific user access patterns TAP corresponding to t tasks,TAP = {ap k ,k = 1,···,t}. In this expression, each user access pattern is represented by aweighted page vector, where the weights indicate the relative visit preferences of pages exhibitedby all associated user sessions for this task-specific access pattern.6.4 Co-Clustering Analysis of weblogs using Bipartite SpectralProjection ApproachIn previous sections, we broadly discussed Web clustering in Web usage mining. BasicallyWeb clustering could be performed on either Web pages or user sessions in the context of Webusage mining. Web page clustering is one of popular topics in Web clustering, which aimsto discover Web page groups sharing similar functionality or semantics. For example, [114]proposed a technique LSH (Local Sensitive Hash) for clustering the entire Web, concentratingon the scalability of clustering. Snippet-based clustering is well studied in [92]. [147] reportedusing a hierarchical monothetic document clustering for summarizing the search results. [121]proposed a Web page clustering algorithm based on measuring page similarity in terms ofcorrelation. In contrast to Web page clustering, Web usage clustering is proposed to discoverWeb user behavior patterns and associations between Web pages and users from the perspectiveof Web user. In practice, Mobasher et al. [184] combined user transaction and pageviewclustering techniques, which was to employ the traditional k-means clustering algorithm tocharacterize user access patterns for Web personalization based on mining Web usage data. In[258] Xu et al. attempted to discover user access patterns and Web page segments from Weblog files by utilizing a so-called Probabilistic Semantic Latent Analysis (PLSA) model.The clustering algorithms described above are mainly manipulated on one dimension (orattribute) of the Web usage data only, i.e. user or page solely, rather than taking into accountthe correlation between Web users and pages. However, in most cases, the Web object clustersdo often exist in the forms of co-occurrence of pages and users - the users from the samegroup are particularly interested in one subset of Web pages. For example, in the contextof customer behavior analysis in e-commerce, this observation could correspond to the phenomenonthat one specific group of customers show strong interest to one particular categoryof goods. In this scenario, Web co-clustering is probably an effective means to address thementioned challenge. The study of co-clustering is firstly proposed to deal with co-clusteringof documents and words in digital library [73]. And it has been widely utilized in many studieswhich involved in multiple attribute analysis, such as social tagging system [98] and geneticrepresentation [108] etc. In this section, we will propose a co-clustering algorithm for Webusage mining based on bipartite spectral clustering.
6.4 Co-Clustering Analysis of weblogs using Bipartite Spectral Projection Approach 1316.4.1 Problem FormulationBipartite Graph ModelAs the nature of Web usage data is a reflection of a set of Web users visiting a number of Webpages, it is intuitive to introduce a graph model to represent the visiting relationship betweenthem. In particular, here we use the Bipartite Graph Model to illustrate it.Definition 6.1. Given a graph G =(V,E), where V is a set of vertices V = {v 1 ,···v n } and Eis a set of edges {i, j} with edge weight E ij , the adjacency matrix M of the graph G is definedby{ Eij if there is an edge (i,j)M ij =0 otherwiseDefinition 6.2. (Cut of Graph): Given a partition of the vertex set V into multiple subsetsV 1 ,···,V k , the cut of the graph is the sum of edge weights whose vertices are assigned to twodifferent subsets of vertices:∑cut(V 1 ,V 2 ,···,V k )= M iji∈V i , j∈V jAs discussed above, the usage data is indeed demonstrated by the visits of Web users on variousWeb pages. In this case, there are NO edges between user sessions or between Web pages,instead there are only edges between user sessions and Web pages. Thus it is essential thatthe bipartite graph model is an appropriate graphic representation to characterize their mutualrelationships.Definition 6.3. (Bipartite Graph Representation): Consider a graph G =(S,P;E) consistingof a set of vertices V{s i , p j : s i ∈ S, p j ∈ P;i = 1,···,m, j = 1,···,n}, where S and P are theuser session collection and Web page collection, respectively, and a set of edges {s i , p j } eachwith its weight a ij , where s i ∈ S and p j ∈ P, the links between user sessions and Web pagesrepresent the visits of users on specific Web pages, whose weights indicate the visit preferenceor significance on respective pages.Furthermore, given the m × n session-by-pageview matrix A such that a ij equals to the edgeweight E ij , it is easy to formulate the adjacency matrix M of the bipartite graph G as[ ]0 AM =A t 0In this manner, the first m rows in the reconstructed matrix M denote the co-occurrence of usersessions while the last n rows index the Web pages. The element value of M is determined byclick times or duration period. Because the ultimate goal is to extract subsets of user sessionsand Web pageviews to construct a variety of co-clusters of them such that they possess thecloser cohesion within the same cluster but the stronger disjointness from other clusters, it isnecessary to model the user session and Web page vectors in a same single unified space. Inthe coming section, we will discuss how to perform co-clustering on them.
Page 2 and 3:
Web Mining and Social Networking
Page 4:
Guandong Xu • Yanchun Zhang • L
Page 8 and 9:
VIIIPrefacefollowing characteristic
Page 11:
Acknowledgements: We would like to
Page 14 and 15:
XIVContents3.1.2 Basic Algorithms f
Page 16 and 17:
XVIContentsPart III Social Networki
Page 19:
Part IFoundation
Page 22 and 23:
4 1 Introduction(3). Learning usefu
Page 24 and 25:
6 1 Introductioncalled computationa
Page 26 and 27:
8 1 Introduction• The data on the
Page 28 and 29:
10 1 Introductionin a broad range t
Page 31 and 32:
2Theoretical BackgroundsAs discusse
Page 33 and 34:
2.2 Textual, Linkage and Usage Expr
Page 35 and 36:
2.4 Eigenvector, Principal Eigenvec
Page 37 and 38:
2.5 Singular Value Decomposition (S
Page 39 and 40:
2.6 Tensor Expression and Decomposi
Page 41 and 42:
2.7 Information Retrieval Performan
Page 43 and 44:
2.8 Basic Concepts in Social Networ
Page 45:
2.8 Basic Concepts in Social Networ
Page 48 and 49:
30 3 Algorithms and TechniquesTable
Page 50 and 51:
32 3 Algorithms and TechniquesSpeci
Page 52 and 53:
34 3 Algorithms and Techniquesa sub
Page 54 and 55:
36 3 Algorithms and TechniquesMetho
Page 56 and 57:
38 3 Algorithms and TechniquesCusto
Page 58 and 59:
40 3 Algorithms and TechniquesTable
Page 60 and 61:
42 3 Algorithms and Techniquesa bSI
Page 62 and 63:
44 3 Algorithms and Techniques{a}10
Page 64 and 65:
46 3 Algorithms and Techniques3.2 S
Page 66 and 67:
48 3 Algorithms and TechniquesConce
Page 68 and 69:
50 3 Algorithms and TechniquesNaive
Page 70 and 71:
52 3 Algorithms and Techniquesuses
Page 72 and 73:
54 3 Algorithms and Techniquesin th
Page 74 and 75:
56 3 Algorithms and Techniques// Fu
Page 76 and 77:
58 3 Algorithms and Techniquesendd
Page 78 and 79:
60 3 Algorithms and Techniquesstart
Page 80 and 81:
62 3 Algorithms and TechniquesHere
Page 82 and 83:
64 3 Algorithms and Techniques3.8.2
Page 84 and 85:
66 3 Algorithms and Techniquesfor e
Page 86 and 87:
68 3 Algorithms and Techniquesthat
Page 89 and 90:
4Web Content MiningIn recent years
Page 91 and 92:
score(q,d)=4.2 Web Search 73V(q) ·
Page 93 and 94:
4.2 Web Search 75algorithm. The Web
Page 95 and 96:
4.3 Feature Enrichment of Short Tex
Page 97 and 98: 4.4 Latent Semantic Indexing 794.4
Page 99 and 100: Notation4.5 Automatic Topic Extract
Page 101 and 102: 4.5 Automatic Topic Extraction from
Page 103 and 104: 4.6 Opinion Search and Opinion Spam
Page 105: 4.6 Opinion Search and Opinion Spam
Page 108 and 109: 90 5 Web Linkage Mining5.2 Co-citat
Page 110 and 111: 92 5 Web Linkage Mining{ /1 out deg
Page 112 and 113: 94 5 Web Linkage Mininga =(a(1),·
Page 114 and 115: 96 5 Web Linkage Mining5.4.1 Bipart
Page 116 and 117: 98 5 Web Linkage MiningNext, consid
Page 118 and 119: 100 5 Web Linkage Mining(5) Creatin
Page 120 and 121: 102 5 Web Linkage Miningpower-law d
Page 122 and 123: 104 5 Web Linkage MiningFig. 5.10.
Page 124 and 125: 106 5 Web Linkage Miningbetween use
Page 126 and 127: 6Web Usage MiningIn previous chapte
Page 129 and 130: 6.1 Modeling Web User Interests usi
Page 137 and 138: 6.2 Web Usage Mining using Probabil
Page 143 and 144: 6.3 Finding User Access Pattern via
Page 145 and 146: 6.3 Finding User Access Pattern via
Page 147: 6.3 Finding User Access Pattern via
Page 151 and 152: 6.5 Web Usage Mining Applications 1
Page 161: Part IIISocial Networking and Web R
Page 164 and 165: 146 7 Extracting and Analyzing Web
Page 188 and 189: 170 8 Web Mining and Recommendation
Page 198 and 199:
180 8 Web Mining and Recommendation
Page 200 and 201:
Page 202 and 203:
Page 204 and 205:
Page 206 and 207:
Page 208 and 209:
190 9 Conclusionsries commonly used
Page 210 and 211:
192 9 Conclusionsas computer scienc
Page 212 and 213:
194 9 Conclusionsresearches have de
Page 214 and 215:
196 References14. J. Ayres, J. Gehr
Page 216 and 217:
198 References49. D. Chakrabarti, R
Page 218 and 219:
200 References82. C. Dwork, R. Kuma
Page 220 and 221:
202 References119. J. Hou and Y. Zh
Page 222 and 223:
204 References151. A. N. Langville
Page 224 and 225:
206 References186. J. K. Mui and K.
Page 226 and 227:
208 References223. C. Shahabi, A. M
Page 228:
210 References260. G.-R. Xue, D. Sh
show all

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?