10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8.1 User-based <strong>and</strong> Item-based Collaborative Filtering Recommender Systems 171Memory-based algorithm starts from the whole user-item database to make predictions.These systems employ statistical learning techniques to determine a set of users, known asneighbors, who have the similar intent or preference to the target user (i.e. like-minded users).After the neighborhood of the target user is determined, the historic preference of the likemindedusers are utilized to make a prediction or recommend a list of top-N items to the targetuser. Since the main idea of such techniques are based on the preference of nearest neighbors(or like-minded users), it is also called as nearest-neighbor or user-based collaborativefiltering. User-based CF algorithms are popular <strong>and</strong> widely used in a variety of applications[212].Major Challenges of Memory-based CF AlgorithmsAlthough user-based collaborative filtering algorithms have b<strong>ee</strong>n successful in practice, theysuffer from some potential challenges exhibited in real applications, such as• Sparsity: In practice, the rating datasets used in recommender systems are extremely big,probably containing several hundred thous<strong>and</strong>s users <strong>and</strong> millions of item. In these systems,however, even active users may have chosen to rate or purchase a very small ratioof total items or products, saying well below 1%. This makes the collected rating datasetsvery sparse, resulting in the big challenges to such recommender systems. Thus a recommendersystem based on user-based CF may be unable to make any item recommendationfor any user, <strong>and</strong> the performance of recommendations may be quite limited.• Scalability: User-based CF algorithms n<strong>ee</strong>d to compute the nearest neighbors in a hugespace of millions users <strong>and</strong> millions items. And the consistent growth of user <strong>and</strong> itemnumber due to the commercial expansion adds the extra computational difficulties to therecommendations. The scalability problems suffer the user-based CF recommender systemssignificantly.8.1.2 Item-based Collaborative Filtering AlgorithmThe item-based (or model-based) algorithms were first proposed by Sarwar et al. [218]. Themain idea of the approach is to calculate the similarity betw<strong>ee</strong>n different items based on theuser-item rating data <strong>and</strong> then to compute the prediction score for a given item based on thecalculated similarity scores. The intuition behind the proposed approach is that a user wouldbe interested in items that are similar to the items, which the user has exhibited interests orpurchased before. Thus in order to make recommendations, an item similarity (or an equivalentmodel) matrix n<strong>ee</strong>ds to learn first or the recommendation is based on item computation. Fromthe perspective of data analysis target, such approach is named item-based or model-basedcollaborative filtering.Since the item similarity matrix is computed in advance <strong>and</strong> no nearest neighbor computationn<strong>ee</strong>ds, item-based CF approach has the less significant scalability problems <strong>and</strong> is fastin running the recommendations. Thus item-based CF approach has shown strength in practice[41, 185, 212]. In the following part, we will discuss the item-based collaborative filteringalgorithm reported in [218].Item Similarity ComputationThere are a number of different measures to compute the similarity betw<strong>ee</strong>n two items, suchas cosine-based, correlation-based <strong>and</strong> adjusted cosine-based similarity. Below the definitionsof them are given respectively [17, 218].

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!