27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.3. Evaluation Method<br />

We use top-N precision to evaluate the performance of<br />

algorithms in recommendation task. In order to measure<br />

precision, we first separate the original gathered data set<br />

which is mentioned in section 3.1 into two parts. The<br />

training set contains 80 percent of data, and the testing set<br />

20 percent. To satisfy the different recommendation algorithms<br />

mentioned above, we apply two methods to separate<br />

the data set. For content-based, item-to-item-CF and userbased-CF,<br />

we cannot do any recommendation when the user<br />

comes to visit our site for the first time. So when we randomly<br />

select 20 percent of the data from the original gathered<br />

data set, we exclude those records of the products user<br />

visits when they come to our site at first. For the last three<br />

algorithms, the recommendation is based on the overall data<br />

and not specific to each user. So we select the 20 percent<br />

testing data completely randomly.<br />

The computation of precision proceeds as follows. For<br />

any single test, if the test product is one of the recommended<br />

N products it is a hit case. Then the overall precision can be<br />

defined by follow formula:<br />

hit − times<br />

precision(N) = (3)<br />

N ∗|U T |<br />

where hit-times is in all test cases the times of recommended<br />

N products contain test product, T is the test set<br />

and |U T | is the number of users T contains.<br />

Except for precision, we also evaluate the cost of each<br />

recommendation method according to following five criterions:1.<br />

require item feature;2. require rating information;3.<br />

require user behavior history data;4. require statistical information<br />

of the data set;5. require relatively complex calculation;<br />

Each criterion adds 1 point to the total cost.<br />

4. Result and Discussion<br />

4.1. Experimental Result<br />

In this section, we present the experimental results of<br />

the recommender algorithms mentioned in above section on<br />

our data set.We apply the evaluation method described in<br />

section 3.3 to evaluate six recommender algorithms.<br />

Figure 1 shows that item-to-item-CF gets the best performance<br />

with precision of 19.50%, it outperforms other<br />

methods quite a lot, and its 3 points cost is relatively high<br />

but not the highest. The most-popular as benchmark method<br />

gets precision of 8.86% which is the second best performance,<br />

and its 2 points cost is relatively low compared<br />

with other methods with similar performance. It shows that<br />

most-popular has really high cost performance. Contentbased<br />

with a 6.85% precision and user-based-CF getting a<br />

7.39% precision do not outperformmost-popular which we<br />

Figure 1. Top-5 precision and cost of methods<br />

select as the benchmark. Also, these two methods have the<br />

highest 4 points cost. This does not indicate that these two<br />

main recommendation methods are not effective, but maybe<br />

they have somelimitations under vertical B2C E-commerce<br />

cased. Cheapest and newest have not brought us any surprise.<br />

Although their cost is really low with 1 point, but<br />

the performances of cheapest with a 6.12% precision and<br />

newest with a 3.23% precision are not acceptable.<br />

4.2. Discussion<br />

Content-based: Content-based do not outperform the<br />

benchmark in our experiment, and we think two main reasons<br />

are the low quality of product feature information and<br />

lack of item ratings to detect users’ preference. The low<br />

quality feature information of bags mainly comes from two<br />

reasons, one is some suppliers do not give all feature information<br />

and the other is different suppliers use different<br />

words as features. To preprocess the feature information<br />

from different suppliers, we need to fill in some missing<br />

features and unify different words which express the same<br />

meaning. In our experiments, we assume all visited items<br />

are of the same interest to user. But if we know the degree<br />

of users’ interest in each item, we can weight most preferred<br />

items’ features more.<br />

Item-to-item-CF: Item-to-item-CF gets the best performance,<br />

this is an algorithm successfully applied in Amazon’s<br />

recommender system. In this algorithm, the main<br />

cost is that we need to count all item pairs’ co-visited or<br />

co-purchased times. Although the average number of items<br />

visited by one single user is only about 5, we have sufficient<br />

visiting history of different users to one single item,<br />

and it is exactly what this algorithm needs to figure out similar<br />

items from the item perspective. Also this method do<br />

not need item feature information, it is one advantage to<br />

141

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!