Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

More documents

Recommendations

Info

170 8 Web Mining and Recommendation Systemsone. Recently collaborative filtering has been widely adopted in Web recommendation applicationsand have achieved great successes as well [116, 139, 224]. In the following sections,we first present introductions on collaborative filtering based recommender systems.8.1.1 User-based Collaborative FilteringThe basic idea of collaborative filtering (CF) based recommendation algorithm is to provideitem recommendation or prediction based on the common opinion of other like-minded users.The opinion of users is usually expressed explicitly by user rating or implicitly by other implicitmeasures. As such user rating data is the most common data format used in CF-basedalgorithm. Below let’s first look at the data organization of user rating data.User Rating Data Format and Collaborative Filtering ProcessSince a collaborative filtering algorithm makes recommendation based on the user’s previouslikings and the choosing of other like-minded users, the data format used in CF-based algorithmprovides a fundamental start to understand the collaborative filtering process. Given atypical CF scenario, there are a collection of m users U = {u 1 ,u 2 ,···,u m } and a collection ofn items I = {i 1 ,i 2 ,···,i n }. The favorite degree of each item i j by user u i is explicitly given bya rating score, generally represented by a certain numerical scale, e.g. a scale range of 1-5. Inthis fashion, the relationships between users and items are modeled as two-way matrix, whereeach row corresponds to a user u i and each column an item i j . The entry of user u i on itemi k denotes the favorite degree (i.e. rating scale). Figure 8.1 gives the schematic diagram of therating data format as well as the collaborative filtering process [218]. Note that in the ratingmatrix, if the user did not rate the item yet, the corresponding scale score is set to 0. As shownin the figure, the aim of collaborative filtering process could be prediction: assigning a numericalscore, expressing the predicted likeliness of each item the target user did not rate yet; orrecommendation: presenting a list of top-N items that the target user will like the most but didnot rate before. In the context of collaborative filtering, there are usually two main categoriesFig. 8.1. The user rating data and CF process [218]of algorithms - Memory-based (User-based) and Model-based (Item-based) algorithm. In thispart, we first briefly discuss the principle of the former one.
8.1 User-based and Item-based Collaborative Filtering Recommender Systems 171Memory-based algorithm starts from the whole user-item database to make predictions.These systems employ statistical learning techniques to determine a set of users, known asneighbors, who have the similar intent or preference to the target user (i.e. like-minded users).After the neighborhood of the target user is determined, the historic preference of the likemindedusers are utilized to make a prediction or recommend a list of top-N items to the targetuser. Since the main idea of such techniques are based on the preference of nearest neighbors(or like-minded users), it is also called as nearest-neighbor or user-based collaborativefiltering. User-based CF algorithms are popular and widely used in a variety of applications[212].Major Challenges of Memory-based CF AlgorithmsAlthough user-based collaborative filtering algorithms have been successful in practice, theysuffer from some potential challenges exhibited in real applications, such as• Sparsity: In practice, the rating datasets used in recommender systems are extremely big,probably containing several hundred thousands users and millions of item. In these systems,however, even active users may have chosen to rate or purchase a very small ratioof total items or products, saying well below 1%. This makes the collected rating datasetsvery sparse, resulting in the big challenges to such recommender systems. Thus a recommendersystem based on user-based CF may be unable to make any item recommendationfor any user, and the performance of recommendations may be quite limited.• Scalability: User-based CF algorithms need to compute the nearest neighbors in a hugespace of millions users and millions items. And the consistent growth of user and itemnumber due to the commercial expansion adds the extra computational difficulties to therecommendations. The scalability problems suffer the user-based CF recommender systemssignificantly.8.1.2 Item-based Collaborative Filtering AlgorithmThe item-based (or model-based) algorithms were first proposed by Sarwar et al. [218]. Themain idea of the approach is to calculate the similarity between different items based on theuser-item rating data and then to compute the prediction score for a given item based on thecalculated similarity scores. The intuition behind the proposed approach is that a user wouldbe interested in items that are similar to the items, which the user has exhibited interests orpurchased before. Thus in order to make recommendations, an item similarity (or an equivalentmodel) matrix needs to learn first or the recommendation is based on item computation. Fromthe perspective of data analysis target, such approach is named item-based or model-basedcollaborative filtering.Since the item similarity matrix is computed in advance and no nearest neighbor computationneeds, item-based CF approach has the less significant scalability problems and is fastin running the recommendations. Thus item-based CF approach has shown strength in practice[41, 185, 212]. In the following part, we will discuss the item-based collaborative filteringalgorithm reported in [218].Item Similarity ComputationThere are a number of different measures to compute the similarity between two items, suchas cosine-based, correlation-based and adjusted cosine-based similarity. Below the definitionsof them are given respectively [17, 218].
Page 2 and 3:
Web Mining and Social Networking
Page 4:
Guandong Xu • Yanchun Zhang • L
Page 8 and 9:
VIIIPrefacefollowing characteristic
Page 11:
Acknowledgements: We would like to
Page 14 and 15:
XIVContents3.1.2 Basic Algorithms f
Page 16 and 17:
XVIContentsPart III Social Networki
Page 19:
Part IFoundation
Page 22 and 23:
4 1 Introduction(3). Learning usefu
Page 24 and 25:
6 1 Introductioncalled computationa
Page 26 and 27:
8 1 Introduction• The data on the
Page 28 and 29:
10 1 Introductionin a broad range t
Page 31 and 32:
2Theoretical BackgroundsAs discusse
Page 33 and 34:
2.2 Textual, Linkage and Usage Expr
Page 35 and 36:
2.4 Eigenvector, Principal Eigenvec
Page 37 and 38:
2.5 Singular Value Decomposition (S
Page 39 and 40:
2.6 Tensor Expression and Decomposi
Page 41 and 42:
2.7 Information Retrieval Performan
Page 43 and 44:
2.8 Basic Concepts in Social Networ
Page 45:
2.8 Basic Concepts in Social Networ
Page 48 and 49:
30 3 Algorithms and TechniquesTable
Page 50 and 51:
32 3 Algorithms and TechniquesSpeci
Page 52 and 53:
34 3 Algorithms and Techniquesa sub
Page 54 and 55:
36 3 Algorithms and TechniquesMetho
Page 56 and 57:
38 3 Algorithms and TechniquesCusto
Page 58 and 59:
40 3 Algorithms and TechniquesTable
Page 60 and 61:
42 3 Algorithms and Techniquesa bSI
Page 62 and 63:
44 3 Algorithms and Techniques{a}10
Page 64 and 65:
46 3 Algorithms and Techniques3.2 S
Page 66 and 67:
48 3 Algorithms and TechniquesConce
Page 68 and 69:
50 3 Algorithms and TechniquesNaive
Page 70 and 71:
52 3 Algorithms and Techniquesuses
Page 72 and 73:
54 3 Algorithms and Techniquesin th
Page 74 and 75:
56 3 Algorithms and Techniques// Fu
Page 76 and 77:
58 3 Algorithms and Techniquesendd
Page 78 and 79:
60 3 Algorithms and Techniquesstart
Page 80 and 81:
62 3 Algorithms and TechniquesHere
Page 82 and 83:
64 3 Algorithms and Techniques3.8.2
Page 84 and 85:
66 3 Algorithms and Techniquesfor e
Page 86 and 87:
68 3 Algorithms and Techniquesthat
Page 89 and 90:
4Web Content MiningIn recent years
Page 91 and 92:
score(q,d)=4.2 Web Search 73V(q) ·
Page 93 and 94:
4.2 Web Search 75algorithm. The Web
Page 95 and 96:
4.3 Feature Enrichment of Short Tex
Page 97 and 98:
4.4 Latent Semantic Indexing 794.4
Page 99 and 100:
Notation4.5 Automatic Topic Extract
Page 101 and 102:
4.5 Automatic Topic Extraction from
Page 103 and 104:
4.6 Opinion Search and Opinion Spam
Page 105:
4.6 Opinion Search and Opinion Spam
Page 108 and 109:
90 5 Web Linkage Mining5.2 Co-citat
Page 110 and 111:
92 5 Web Linkage Mining{ /1 out deg
Page 112 and 113:
94 5 Web Linkage Mininga =(a(1),·
Page 114 and 115:
96 5 Web Linkage Mining5.4.1 Bipart
Page 116 and 117:
98 5 Web Linkage MiningNext, consid
Page 118 and 119:
100 5 Web Linkage Mining(5) Creatin
Page 120 and 121:
102 5 Web Linkage Miningpower-law d
Page 122 and 123:
104 5 Web Linkage MiningFig. 5.10.
Page 124 and 125:
106 5 Web Linkage Miningbetween use
Page 126 and 127:
6Web Usage MiningIn previous chapte
Page 129 and 130:
6.1 Modeling Web User Interests usi
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138: 6.2 Web Usage Mining using Probabil
Page 143 and 144: 6.3 Finding User Access Pattern via
Page 149 and 150: 6.4 Co-Clustering Analysis of weblo
Page 151 and 152: 6.5 Web Usage Mining Applications 1
Page 161: Part IIISocial Networking and Web R
Page 164 and 165: 146 7 Extracting and Analyzing Web
Page 190 and 191: 172 8 Web Mining and Recommendation
Page 208 and 209: 190 9 Conclusionsries commonly used
Page 210 and 211: 192 9 Conclusionsas computer scienc
Page 212 and 213: 194 9 Conclusionsresearches have de
Page 214 and 215: 196 References14. J. Ayres, J. Gehr
Page 216 and 217: 198 References49. D. Chakrabarti, R
Page 218 and 219: 200 References82. C. Dwork, R. Kuma
Page 220 and 221: 202 References119. J. Hou and Y. Zh
Page 222 and 223: 204 References151. A. N. Langville
Page 224 and 225: 206 References186. J. K. Mui and K.
Page 226 and 227: 208 References223. C. Shahabi, A. M
Page 228: 210 References260. G.-R. Xue, D. Sh
show all

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?