10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Model Building8.2 A Hybrid User-based <strong>and</strong> Item-based <strong>Web</strong> Recommendation System 177• Select the number of user-clusters k, considering the effect on the recommendationaccuracy <strong>and</strong> resource requirements.• Perform BISECTING k−MEANS clustering on the user-preference data.• Build the model with k surrogate users, directly derived from the kcentroids:{c 1 ,c 2 ,...c k }, where each c i is a vector of size m, the number of items. That isc i =(˜R ci ,a1, ˜R ci ,a2,..., ˜R ci ,aj),where ˜R ci ,aj is the element in the centroid vector c icorresponding to the item a j . Further, since ˜R ci ,aj is essentially an average value,it is 0 if nobody in the i-th cluster has rated a j .Prediction GenerationIn order to compute the rating prediction ˆR ct ,at for the target(user, item) pare (u t ,a t ),the following steps are taken.• Compute similarity of the target user with each of the surrogate model users whohave rated a t using the Pearson correlation coefficient:w ut ,c i=Σ a∈τ (R ut ,a − ¯R ut )( ˜R ci ,a − ¯R ci )√Σ a∈τ (R ut ,a − ¯R ut ) 2 Σ a∈τ ( ˜R ci ,a − ¯R ci ) 2 (8.6)where τ is the set of items reated by both the target user <strong>and</strong> i-th surrogate user.• Find up to l surrogate users most similar to the target user.• Compute prediction using the adjusted weighted average:ˆR ut ,a t= ¯R ut + Σ l i=1 ( ˜R ci ,a t− ¯R ci )w ut ,c iΣ l i=1 w u t ,c i(8.7)Note that any partitionalclustering [122] techniques can be used for modelbuildingin this hybrid approach. BISECTING k−MEANS is an extension to <strong>and</strong> animproved version of the basic k-MEANS algorithm. The algorithm starts by consideringall data points as a single cluster. Then it repeats the following steps (k−1) timesto produce k clusters.• Pick the largest cluster to split• Apply the basic k-MEANS (2MEANS , to be exact)clustering to produce 2 subclusters.• Repeat step 2 for j times <strong>and</strong> take the best split, one way of determining whichis looking for the best intra-cluster similarity.At this stage, it is straightforward to derive the time-complexity. Note that thetime complexity of CF algorithm can be divided into two parts: one for the offlinemodeling-building, <strong>and</strong> the other for the online generation of recommendations.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!