08.04.2013 Views

Extraction and Integration of MovieLens and IMDb Data - APMD

Extraction and Integration of MovieLens and IMDb Data - APMD

Extraction and Integration of MovieLens and IMDb Data - APMD

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Extraction</strong> <strong>and</strong> <strong>Integration</strong> <strong>of</strong> <strong>MovieLens</strong> <strong>and</strong> <strong>IMDb</strong> <strong>Data</strong> – Technical Report<br />

2.1.2. Source files <strong>of</strong> the small data set<br />

The small data set consists in 5 text files, with tabular format, describing 100000 anonymous ratings <strong>of</strong> 1682<br />

movies made by 943 users during the seven-month period from September 19 th , 1997 through April 22 nd , 1998.<br />

In the following, we describe the contents <strong>of</strong> each text file.<br />

u.data<br />

This file contains data about 100000 ratings (1 evaluation, corresponding to 1 user <strong>and</strong> 1 movie, in each line). It<br />

is a tab-separated list <strong>of</strong> UserID, MovieID, Rating <strong>and</strong> Timestamp, where:<br />

4<br />

− UserID is an integer, ranging from 1 to 943, that identifies a user. Each user has rated at least 20 movies.<br />

− MovieID is an integer, ranging from 1 to 1682, that identifies a movie.<br />

− Rating is an integer, ranging from 1 to 5, made on a 5-star scale (whole-star ratings only).<br />

− Timestamp is represented in seconds since 1/1/1970 UTC<br />

<strong>Data</strong> is r<strong>and</strong>omly ordered. Figure 5 shows an extract <strong>of</strong> the file corresponding to 5 ratings.<br />

u.item<br />

196 242 3 881250949<br />

186 302 3 891717742<br />

22 377 1 878887116<br />

244 51 2 880606923<br />

166 346 1 886397596<br />

Figure 5 – Extract <strong>of</strong> the u.data file<br />

This file contains data about 1682 movies (1 movie in each line). This is a pipe-separated list <strong>of</strong> MovieID,<br />

MovieTitle, ReleaseDate, VideoReleaseDate, <strong>IMDb</strong>URL, unknown, Action, Adventure, Animation, Children’s,<br />

Comedy, Crime, Documentary, Drama, Fantasy, Film-Noir, Horror, Musical, Mystery, Romance, Sci-Fi,<br />

Thriller, War <strong>and</strong> Western, where<br />

− MovieID is an integer, ranging from 1 to 1682, that identifies a movie.<br />

− MovieTitle is a String that concatenates movie title <strong>and</strong> year <strong>of</strong> release (between brackets).<br />

− ReleaseDate is a DD-Mon-YYYY date indicating movie release date<br />

− VideoReleaseDate was destinated to video release date but it is always NULL<br />

− <strong>IMDb</strong>URL indicates the URL <strong>of</strong> the movie in the <strong>IMDb</strong> site.<br />

− The last 19 fields correspond to genres; a 1 indicates the movie is <strong>of</strong> that genre, a 0 indicates it is not;<br />

movies can be in several genres at once.<br />

Figure 6 shows an extract <strong>of</strong> the file corresponding to 5 movies.<br />

1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|<br />

0|0|0|0|0|0|0|0|0|0|0<br />

2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|<br />

0|0|0|0|0|0|0|1|0|0<br />

3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|<br />

0|0|0|0|0|0|0|0|0|1|0|0<br />

4|Get Shorty (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Get%20Shorty%20(1995)|0|1|0|0|0|1|0|<br />

0|1|0|0|0|0|0|0|0|0|0|0<br />

5|Copycat (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Copycat%20(1995)|0|0|0|0|0|0|1|0|1|0|0|<br />

0|0|0|0|0|1|0|0<br />

Figure 6 – Extract <strong>of</strong> the u.item file

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!