Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

More documents

Recommendations

Info

176 8 Web Mining and Recommendation SystemsHowever, users’ preferences can be in terms of explicit ratings on some scaleincluding a binary like/dislike, or they can be implicit. For instance, a customer’spurchase history or her browsing patterns is implicit. A recommender system mayalso maintain demographic and other information about the users, and informationabout item features such as actors, directors, and genres in the case of a movie.This additional content information can be used to create user − based(content −based)filtering[181, 216], which can help improve a CF system, particularly whererating data is limited or absent (Cold start problem-new user or new item).We now turn to the problem statement. A Web recommender system may easilyinvolve millions of customers and products[165]. This amount of data poses a greatchallenge to the CF algorithms in that the recommendations need to be generated inreal-time. Furthermore, the algorithm also has to cope with a steady influx of newusers and items. For the majority of the algorithms proposed to date, the primaryemphasis has been given into improving recommendation accuracy. While accuracyis certainly important and can affect the profitability of the company, the operatorsimply cannot deploy the system if it does not scale to the vast data of the site.8.2.2 Hybrid User and Item-based ApproachBefore we introduce the hybrid approach, we briefly discuss user-based and itembasedalgorithm and describe how hybrid approach leverages the advantages of bothtypes of algorithms.There are two workaround in User-based recommender system. One is to onlyconsider a subset of the preference data in the calculation, but doing this can reduceboth recommendation quality and the number of items due to data being omitted fromthe calculation. Another is to perform as much of the computation as possible in anoffline setting. However, this may make it difficult to add new users to the system ona real-time basis, which is a basic necessity of most online systems. Furthermore, thestorage requirements for the precomputed data could be high.On the other hand, Item-based algorithm is time-consuming and is only doneperiodically. The disadvantage is that adding new users, items, or preferences can betantamount to recomputing.The Hybrid Approach-CLUSTKNN has the advantages from both types. One ofthe goals is to maintain simplicity and intuitiveness throughout the approach. It isimportant for the recommendation presentation [227]. This is achieved by utilizinga straightforward partitional clustering algorithm [122] for modeling users. Anearest-neighbor algorithm is used to generate recommendations from the learnedmodel. Since the data is greatly compressed after the model is built, recommendationscan be computed quickly, which solves the scalability challenge.The algorithm has a tunable parameter, the number of clusters k which can beadjusted to trade off accuracy for time and space requirements. In addition, the algorithmalso has two phases: model building (offline) and generation of predictions orrecommendations (online).
Model Building8.2 A Hybrid User-based and Item-based Web Recommendation System 177• Select the number of user-clusters k, considering the effect on the recommendationaccuracy and resource requirements.• Perform BISECTING k−MEANS clustering on the user-preference data.• Build the model with k surrogate users, directly derived from the kcentroids:{c 1 ,c 2 ,...c k }, where each c i is a vector of size m, the number of items. That isc i =(˜R ci ,a1, ˜R ci ,a2,..., ˜R ci ,aj),where ˜R ci ,aj is the element in the centroid vector c icorresponding to the item a j . Further, since ˜R ci ,aj is essentially an average value,it is 0 if nobody in the i-th cluster has rated a j .Prediction GenerationIn order to compute the rating prediction ˆR ct ,at for the target(user, item) pare (u t ,a t ),the following steps are taken.• Compute similarity of the target user with each of the surrogate model users whohave rated a t using the Pearson correlation coefficient:w ut ,c i=Σ a∈τ (R ut ,a − ¯R ut )( ˜R ci ,a − ¯R ci )√Σ a∈τ (R ut ,a − ¯R ut ) 2 Σ a∈τ ( ˜R ci ,a − ¯R ci ) 2 (8.6)where τ is the set of items reated by both the target user and i-th surrogate user.• Find up to l surrogate users most similar to the target user.• Compute prediction using the adjusted weighted average:ˆR ut ,a t= ¯R ut + Σ l i=1 ( ˜R ci ,a t− ¯R ci )w ut ,c iΣ l i=1 w u t ,c i(8.7)Note that any partitionalclustering [122] techniques can be used for modelbuildingin this hybrid approach. BISECTING k−MEANS is an extension to and animproved version of the basic k-MEANS algorithm. The algorithm starts by consideringall data points as a single cluster. Then it repeats the following steps (k−1) timesto produce k clusters.• Pick the largest cluster to split• Apply the basic k-MEANS (2MEANS , to be exact)clustering to produce 2 subclusters.• Repeat step 2 for j times and take the best split, one way of determining whichis looking for the best intra-cluster similarity.At this stage, it is straightforward to derive the time-complexity. Note that thetime complexity of CF algorithm can be divided into two parts: one for the offlinemodeling-building, and the other for the online generation of recommendations.
Page 2 and 3:
Web Mining and Social Networking
Page 4:
Guandong Xu • Yanchun Zhang • L
Page 8 and 9:
VIIIPrefacefollowing characteristic
Page 11:
Acknowledgements: We would like to
Page 14 and 15:
XIVContents3.1.2 Basic Algorithms f
Page 16 and 17:
XVIContentsPart III Social Networki
Page 19:
Part IFoundation
Page 22 and 23:
4 1 Introduction(3). Learning usefu
Page 24 and 25:
6 1 Introductioncalled computationa
Page 26 and 27:
8 1 Introduction• The data on the
Page 28 and 29:
10 1 Introductionin a broad range t
Page 31 and 32:
2Theoretical BackgroundsAs discusse
Page 33 and 34:
2.2 Textual, Linkage and Usage Expr
Page 35 and 36:
2.4 Eigenvector, Principal Eigenvec
Page 37 and 38:
2.5 Singular Value Decomposition (S
Page 39 and 40:
2.6 Tensor Expression and Decomposi
Page 41 and 42:
2.7 Information Retrieval Performan
Page 43 and 44:
2.8 Basic Concepts in Social Networ
Page 45:
2.8 Basic Concepts in Social Networ
Page 48 and 49:
30 3 Algorithms and TechniquesTable
Page 50 and 51:
32 3 Algorithms and TechniquesSpeci
Page 52 and 53:
34 3 Algorithms and Techniquesa sub
Page 54 and 55:
36 3 Algorithms and TechniquesMetho
Page 56 and 57:
38 3 Algorithms and TechniquesCusto
Page 58 and 59:
40 3 Algorithms and TechniquesTable
Page 60 and 61:
42 3 Algorithms and Techniquesa bSI
Page 62 and 63:
44 3 Algorithms and Techniques{a}10
Page 64 and 65:
46 3 Algorithms and Techniques3.2 S
Page 66 and 67:
48 3 Algorithms and TechniquesConce
Page 68 and 69:
50 3 Algorithms and TechniquesNaive
Page 70 and 71:
52 3 Algorithms and Techniquesuses
Page 72 and 73:
54 3 Algorithms and Techniquesin th
Page 74 and 75:
56 3 Algorithms and Techniques// Fu
Page 76 and 77:
58 3 Algorithms and Techniquesendd
Page 78 and 79:
60 3 Algorithms and Techniquesstart
Page 80 and 81:
62 3 Algorithms and TechniquesHere
Page 82 and 83:
64 3 Algorithms and Techniques3.8.2
Page 84 and 85:
66 3 Algorithms and Techniquesfor e
Page 86 and 87:
68 3 Algorithms and Techniquesthat
Page 89 and 90:
4Web Content MiningIn recent years
Page 91 and 92:
score(q,d)=4.2 Web Search 73V(q) ·
Page 93 and 94:
4.2 Web Search 75algorithm. The Web
Page 95 and 96:
4.3 Feature Enrichment of Short Tex
Page 97 and 98:
4.4 Latent Semantic Indexing 794.4
Page 99 and 100:
Notation4.5 Automatic Topic Extract
Page 101 and 102:
4.5 Automatic Topic Extraction from
Page 103 and 104:
4.6 Opinion Search and Opinion Spam
Page 105:
4.6 Opinion Search and Opinion Spam
Page 108 and 109:
90 5 Web Linkage Mining5.2 Co-citat
Page 110 and 111:
92 5 Web Linkage Mining{ /1 out deg
Page 112 and 113:
94 5 Web Linkage Mininga =(a(1),·
Page 114 and 115:
96 5 Web Linkage Mining5.4.1 Bipart
Page 116 and 117:
98 5 Web Linkage MiningNext, consid
Page 118 and 119:
100 5 Web Linkage Mining(5) Creatin
Page 120 and 121:
102 5 Web Linkage Miningpower-law d
Page 122 and 123:
104 5 Web Linkage MiningFig. 5.10.
Page 124 and 125:
106 5 Web Linkage Miningbetween use
Page 126 and 127:
6Web Usage MiningIn previous chapte
Page 129 and 130:
6.1 Modeling Web User Interests usi
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
6.2 Web Usage Mining using Probabil
Page 139 and 140:
Page 141 and 142:
Page 143 and 144: 6.3 Finding User Access Pattern via
Page 149 and 150: 6.4 Co-Clustering Analysis of weblo
Page 151 and 152: 6.5 Web Usage Mining Applications 1
Page 161: Part IIISocial Networking and Web R
Page 164 and 165: 146 7 Extracting and Analyzing Web
Page 188 and 189: 170 8 Web Mining and Recommendation
Page 208 and 209: 190 9 Conclusionsries commonly used
Page 210 and 211: 192 9 Conclusionsas computer scienc
Page 212 and 213: 194 9 Conclusionsresearches have de
Page 214 and 215: 196 References14. J. Ayres, J. Gehr
Page 216 and 217: 198 References49. D. Chakrabarti, R
Page 218 and 219: 200 References82. C. Dwork, R. Kuma
Page 220 and 221: 202 References119. J. Hou and Y. Zh
Page 222 and 223: 204 References151. A. N. Langville
Page 224 and 225: 206 References186. J. K. Mui and K.
Page 226 and 227: 208 References223. C. Shahabi, A. M
Page 228: 210 References260. G.-R. Xue, D. Sh
show all

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Create successful ePaper yourself

Delete template?

Save as template?