SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute


You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.3. Evaluation Method<br />

We use top-N precision to evaluate the performance of<br />

algorithms in recommendation task. In order to measure<br />

precision, we first separate the original gathered data set<br />

which is mentioned in section 3.1 into two parts. The<br />

training set contains 80 percent of data, and the testing set<br />

20 percent. To satisfy the different recommendation algorithms<br />

mentioned above, we apply two methods to separate<br />

the data set. For content-based, item-to-item-CF and userbased-CF,<br />

we cannot do any recommendation when the user<br />

comes to visit our site for the first time. So when we randomly<br />

select 20 percent of the data from the original gathered<br />

data set, we exclude those records of the products user<br />

visits when they come to our site at first. For the last three<br />

algorithms, the recommendation is based on the overall data<br />

and not specific to each user. So we select the 20 percent<br />

testing data completely randomly.<br />

The computation of precision proceeds as follows. For<br />

any single test, if the test product is one of the recommended<br />

N products it is a hit case. Then the overall precision can be<br />

defined by follow formula:<br />

hit − times<br />

precision(N) = (3)<br />

N ∗|U T |<br />

where hit-times is in all test cases the times of recommended<br />

N products contain test product, T is the test set<br />

and |U T | is the number of users T contains.<br />

Except for precision, we also evaluate the cost of each<br />

recommendation method according to following five criterions:1.<br />

require item feature;2. require rating information;3.<br />

require user behavior history data;4. require statistical information<br />

of the data set;5. require relatively complex calculation;<br />

Each criterion adds 1 point to the total cost.<br />

4. Result and Discussion<br />

4.1. Experimental Result<br />

In this section, we present the experimental results of<br />

the recommender algorithms mentioned in above section on<br />

our data set.We apply the evaluation method described in<br />

section 3.3 to evaluate six recommender algorithms.<br />

Figure 1 shows that item-to-item-CF gets the best performance<br />

with precision of 19.50%, it outperforms other<br />

methods quite a lot, and its 3 points cost is relatively high<br />

but not the highest. The most-popular as benchmark method<br />

gets precision of 8.86% which is the second best performance,<br />

and its 2 points cost is relatively low compared<br />

with other methods with similar performance. It shows that<br />

most-popular has really high cost performance. Contentbased<br />

with a 6.85% precision and user-based-CF getting a<br />

7.39% precision do not outperformmost-popular which we<br />

Figure 1. Top-5 precision and cost of methods<br />

select as the benchmark. Also, these two methods have the<br />

highest 4 points cost. This does not indicate that these two<br />

main recommendation methods are not effective, but maybe<br />

they have somelimitations under vertical B2C E-commerce<br />

cased. Cheapest and newest have not brought us any surprise.<br />

Although their cost is really low with 1 point, but<br />

the performances of cheapest with a 6.12% precision and<br />

newest with a 3.23% precision are not acceptable.<br />

4.2. Discussion<br />

Content-based: Content-based do not outperform the<br />

benchmark in our experiment, and we think two main reasons<br />

are the low quality of product feature information and<br />

lack of item ratings to detect users’ preference. The low<br />

quality feature information of bags mainly comes from two<br />

reasons, one is some suppliers do not give all feature information<br />

and the other is different suppliers use different<br />

words as features. To preprocess the feature information<br />

from different suppliers, we need to fill in some missing<br />

features and unify different words which express the same<br />

meaning. In our experiments, we assume all visited items<br />

are of the same interest to user. But if we know the degree<br />

of users’ interest in each item, we can weight most preferred<br />

items’ features more.<br />

Item-to-item-CF: Item-to-item-CF gets the best performance,<br />

this is an algorithm successfully applied in Amazon’s<br />

recommender system. In this algorithm, the main<br />

cost is that we need to count all item pairs’ co-visited or<br />

co-purchased times. Although the average number of items<br />

visited by one single user is only about 5, we have sufficient<br />

visiting history of different users to one single item,<br />

and it is exactly what this algorithm needs to figure out similar<br />

items from the item perspective. Also this method do<br />

not need item feature information, it is one advantage to<br />


Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!