02.05.2014 Views

Proceedings - Österreichische Gesellschaft für Artificial Intelligence

Proceedings - Österreichische Gesellschaft für Artificial Intelligence

Proceedings - Österreichische Gesellschaft für Artificial Intelligence

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

develop a metric for positivity of terms, and examine<br />

their relative distributions. This is followed<br />

by an examination of the relation between ratings<br />

and texts in the two data sets. We show that<br />

the hypothesis is strongly confirmed in all three<br />

of its variants. Finally, we observe that these results<br />

could have far-reaching implications for the<br />

interpretation of recommender systems and user<br />

ratings, the use of which has exploded in recent<br />

years.<br />

45000<br />

40000<br />

35000<br />

30000<br />

25000<br />

20000<br />

15000<br />

10000<br />

5000<br />

0<br />

1 2 3 4 5 6 7 8 9 10<br />

2 Data<br />

The Danish data was downloaded from the Danish<br />

movie website scope.dk and contains rated<br />

user reviews from 829 films and has a total size of<br />

1,624,049 words. The U.S. data was downloaded<br />

from The Internet Movie Database (imdb.com)<br />

and contains rated user reviews from 678 films<br />

and has a total size of 34,599,486 words.<br />

A search function on www.imdb.com was used<br />

to create a list of films and matching IMDb ID<br />

tags for films produced in the years 1920-2011.<br />

678 films on the list had a match in the Scope data<br />

on title and production year . The IMDb ID tags<br />

was used to find the page containing data for each<br />

of the films and all reviews which had a correlated<br />

rating were downloaded for those 678 films. The<br />

U.S. IMDb reviews are rated on a scale of 1 to<br />

10, while the Danish Scope reviews are rated on a<br />

scale of 1 to 6.<br />

3 Ratings<br />

Figure 1 gives the number of reviews in each category<br />

for IMDb.<br />

For IMDb, the top category of 10 has by far the<br />

most reviews. For the most part the number of<br />

reviews decreases from category 10, with a modest<br />

increase in the number of reviews for the lowest<br />

category, 1. This distribution makes intuitive<br />

sense – it’s not surprising that people would be<br />

most motivated to write reviews of films they are<br />

most enthusiastic about, and, to a lesser extent,<br />

also be motivated in cases where they have strong<br />

negative feelings. This has been noted in the literature:<br />

(Wu and Huberman, 2010) point out that<br />

the so-called “brag and moan” view of ratings is<br />

fairly typical (as also mentioned by (Hu et al.,<br />

2006; Dellarocas and Narayan, 2006)). The tendency<br />

of the top category to be the most frequent<br />

Figure 1: IMDb reviews per category<br />

is also mentioned on the yelp.com site, where the<br />

top category of 5 is the most frequent: “The numbers<br />

don’t lie: people love to talk about the things<br />

they love!” (FAQ, 2012).<br />

5000<br />

4500<br />

4000<br />

3500<br />

3000<br />

2500<br />

2000<br />

1500<br />

1000<br />

500<br />

0<br />

1 2 3 4 5 6<br />

Figure 2: Scope reviews per category<br />

There is a very different distribution in the Danish<br />

Scope data, as shown in Figure 2. Here, category<br />

4 (out of 6) is the most frequent. This supports<br />

the general prediction that highly positive<br />

evaluations are over-represented in the U.S. data<br />

compared to the Danish data.<br />

4 Text<br />

We turn now to a second version of our hypothesis:<br />

that highly positive terms are overrepresented<br />

in the U.S. data. We consider highly<br />

positive terms to be those that tend to occur in the<br />

most positive category and tend not to occur in the<br />

other categories.<br />

320<br />

<strong>Proceedings</strong> of KONVENS 2012 (PATHOS 2012 workshop), Vienna, September 21, 2012

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!