Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

More documents

Recommendations

Info

6Web Usage MiningIn previous chapters, we have discussed the techniques with respect to Web content and Weblinkage mining. These mining processes are mainly performed on the Web page itself, eitherfrom the perspective of textual content or from the perspective of linking property. Since Webis an interaction media between Web users and Web pages, the role of users’ action to the Webneeds to be fully concerned during Web mining. Mining user navigational behavior, i.e. Webusage mining, is able to capture the underlying correlations between users and pages duringbrowsing, in turn, providing complementary assistance for advanced Web applications, suchas adaptive Web design and Web recommendation. In this chapter, we will concentrate thediscussion on this aspect. We start with the theoretical background of Web usage in termsof data model and algorithmic issues, then present a number of Web usage mining solutionsalong with the experimental studies. Finally, we present a number of applications in Web usagemining.6.1 Modeling Web User Interests using ClusteringWeb clustering is one of the mostly used techniques in the context of Web mining, which isto aggregate similar Web objects, such as Web page or user session, into a number of objectgroups via measuring their mutual vector distance. Basically, clustering can be performedupon these two types of Web objects, which results in clustering Web users or Web pages,respectively. The resulting Web user session groups are considered as representatives of usernavigational behavior patterns, while Web page clusters are used for generating task-orientedfunctionality aggregations of Web organization. Moreover, the mined usage knowledge interms of Web usage pattern and page aggregate property can be utilized to improve Web sitestructure design such as adaptive Web design and Web personalization. In this section, wepresent two studies on clustering Web users.6.1.1 Measuring Similarity of Interest for Clustering Web UsersCapturing the characteristics of Web users is an important task for the Web site designers. Bymining Web users’ historical access patterns, not only the information about how the Web isbeing used, but also some demographics and behavioral characteristics of Web users couldG. Xu et al., Web Mining and Social Networking,DOI 10.1007/978-1-4419-7735-9_6, © Springer Science+Business Media, LLC 2011
110 6 Web Usage Miningbe determined [56]. The navigation path of the Web-users, if available to the server, carriesvaluable information about the user interests.The purpose of finding similar interests among the Web-users is to discover knowledgefrom the user profile. If a Web site is well designed, there will be strong correlation among thesimilarity of the navigation paths and similarity among the user interests. Therefore, clusteringof the former could be used to cluster the latter.The definition of the similarity is application dependent. The similarity function can bebased on visiting the same or similar pages, or the frequency of access to a page [77, 140],or even on the visiting orders of links (i.e., clients’ navigation paths). In the latter case, twousers that access the same pages can be mapped into different groups of interest similaritiesif they access pages in distinct visiting orders. In [256], Xiao et al. propose several similaritymeasures to capture the users’ interests. A matrix-based algorithm is then developed to clusterWeb users such that the users in the same cluster are closely related with respect to thesimilarity measure.Problem DefinitionsThe structure of an Internet server site S could be abstracted as a directed graph called connectivitygraph: The node set of the graph consists of all Web pages of the site. The hypertextlinks between pages can be taken as directed edges of the graph as each link has a startingpage and an ending page. For some of the links starting points or end points could be somepages outside the site. It is imagined that the connectivity graph could be quite complicated.For simplicity, the concerns here is limited on the part of clients’ navigation path inside aparticular site. From the Internet browsing logs, the following information about a Web usercould be gathered: the frequency of a hyper-page usage, the lists of links she]he selected, theelapsed time between two links, and the order of pages accessed by individual Web users.Similarity MeasuresSuppose that, for a given Web site S, there are m users U = {u 1 ,u 2 ,...,u m } who accessed ndifferent Web pages P = p 1 , p 2 ,..., p n in some time interval. For each page p i , and each useru j , it is associated with a usage value, denoted as use(p i ,u j ), and defined asuse ( p i ,u j)={1 if pi is accessed by u j0 OtherwiseThe use vector can be obtained by retrieving the access logs of the site. If two usersaccessed the same pages, they might have some similar interests in the sense that they are interestedin the same information (e.g., news, electrical products etc). The number of commonpages they accessed can measure this similarity. The measure is defined bySim1 ( () ∑ k use(pk ,u i ) ∗ use ( ))p k ,u ju i ,u j = √∑ k use(p k ,u i ) ∗ ∑ k use ( ) (6.1)p k ,u jwhere ∑ k use(p k ,u i ) is the total number of pages that were accessed by user u i , and the productof ∑ k use(p k ,u i ) ∗ ∑ k use ( p k ,u j)is the number of common pages accessed by both user ui ,and u j . If two users access the exact same pages, their similarity will be 1. The similaritymeasure defined in this way is called Usage Based (UB) measure.
Page 2 and 3:
Web Mining and Social Networking
Page 4:
Guandong Xu • Yanchun Zhang • L
Page 8 and 9:
VIIIPrefacefollowing characteristic
Page 11:
Acknowledgements: We would like to
Page 14 and 15:
XIVContents3.1.2 Basic Algorithms f
Page 16 and 17:
XVIContentsPart III Social Networki
Page 19:
Part IFoundation
Page 22 and 23:
4 1 Introduction(3). Learning usefu
Page 24 and 25:
6 1 Introductioncalled computationa
Page 26 and 27:
8 1 Introduction• The data on the
Page 28 and 29:
10 1 Introductionin a broad range t
Page 31 and 32:
2Theoretical BackgroundsAs discusse
Page 33 and 34:
2.2 Textual, Linkage and Usage Expr
Page 35 and 36:
2.4 Eigenvector, Principal Eigenvec
Page 37 and 38:
2.5 Singular Value Decomposition (S
Page 39 and 40:
2.6 Tensor Expression and Decomposi
Page 41 and 42:
2.7 Information Retrieval Performan
Page 43 and 44:
2.8 Basic Concepts in Social Networ
Page 45:
2.8 Basic Concepts in Social Networ
Page 48 and 49:
30 3 Algorithms and TechniquesTable
Page 50 and 51:
32 3 Algorithms and TechniquesSpeci
Page 52 and 53:
34 3 Algorithms and Techniquesa sub
Page 54 and 55:
36 3 Algorithms and TechniquesMetho
Page 56 and 57:
38 3 Algorithms and TechniquesCusto
Page 58 and 59:
40 3 Algorithms and TechniquesTable
Page 60 and 61:
42 3 Algorithms and Techniquesa bSI
Page 62 and 63:
44 3 Algorithms and Techniques{a}10
Page 64 and 65:
46 3 Algorithms and Techniques3.2 S
Page 66 and 67:
48 3 Algorithms and TechniquesConce
Page 68 and 69:
50 3 Algorithms and TechniquesNaive
Page 70 and 71:
52 3 Algorithms and Techniquesuses
Page 72 and 73:
54 3 Algorithms and Techniquesin th
Page 74 and 75:
56 3 Algorithms and Techniques// Fu
Page 76 and 77: 58 3 Algorithms and Techniquesendd
Page 78 and 79: 60 3 Algorithms and Techniquesstart
Page 80 and 81: 62 3 Algorithms and TechniquesHere
Page 82 and 83: 64 3 Algorithms and Techniques3.8.2
Page 84 and 85: 66 3 Algorithms and Techniquesfor e
Page 86 and 87: 68 3 Algorithms and Techniquesthat
Page 89 and 90: 4Web Content MiningIn recent years
Page 91 and 92: score(q,d)=4.2 Web Search 73V(q) ·
Page 93 and 94: 4.2 Web Search 75algorithm. The Web
Page 95 and 96: 4.3 Feature Enrichment of Short Tex
Page 97 and 98: 4.4 Latent Semantic Indexing 794.4
Page 99 and 100: Notation4.5 Automatic Topic Extract
Page 101 and 102: 4.5 Automatic Topic Extraction from
Page 103 and 104: 4.6 Opinion Search and Opinion Spam
Page 105: 4.6 Opinion Search and Opinion Spam
Page 108 and 109: 90 5 Web Linkage Mining5.2 Co-citat
Page 110 and 111: 92 5 Web Linkage Mining{ /1 out deg
Page 112 and 113: 94 5 Web Linkage Mininga =(a(1),·
Page 114 and 115: 96 5 Web Linkage Mining5.4.1 Bipart
Page 116 and 117: 98 5 Web Linkage MiningNext, consid
Page 118 and 119: 100 5 Web Linkage Mining(5) Creatin
Page 120 and 121: 102 5 Web Linkage Miningpower-law d
Page 122 and 123: 104 5 Web Linkage MiningFig. 5.10.
Page 124 and 125: 106 5 Web Linkage Miningbetween use
Page 129 and 130: 6.1 Modeling Web User Interests usi
Page 137 and 138: 6.2 Web Usage Mining using Probabil
Page 143 and 144: 6.3 Finding User Access Pattern via
Page 149 and 150: 6.4 Co-Clustering Analysis of weblo
Page 151 and 152: 6.5 Web Usage Mining Applications 1
Page 161: Part IIISocial Networking and Web R
Page 164 and 165: 146 7 Extracting and Analyzing Web
Page 176 and 177:
158 7 Extracting and Analyzing Web
Page 178 and 179:
Page 180 and 181:
Page 182 and 183:
Page 184 and 185:
Page 186 and 187:
Page 188 and 189:
170 8 Web Mining and Recommendation
Page 190 and 191:
Page 192 and 193:
Page 194 and 195:
Page 196 and 197:
Page 198 and 199:
Page 200 and 201:
Page 202 and 203:
Page 204 and 205:
Page 206 and 207:
Page 208 and 209:
190 9 Conclusionsries commonly used
Page 210 and 211:
192 9 Conclusionsas computer scienc
Page 212 and 213:
194 9 Conclusionsresearches have de
Page 214 and 215:
196 References14. J. Ayres, J. Gehr
Page 216 and 217:
198 References49. D. Chakrabarti, R
Page 218 and 219:
200 References82. C. Dwork, R. Kuma
Page 220 and 221:
202 References119. J. Hou and Y. Zh
Page 222 and 223:
204 References151. A. N. Langville
Page 224 and 225:
206 References186. J. K. Mui and K.
Page 226 and 227:
208 References223. C. Shahabi, A. M
Page 228:
210 References260. G.-R. Xue, D. Sh
show all

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Create successful ePaper yourself

Delete template?

Save as template?