Extraction and Integration of MovieLens and IMDb Data - APMD
Extraction and Integration of MovieLens and IMDb Data - APMD
Extraction and Integration of MovieLens and IMDb Data - APMD
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Extraction</strong> <strong>and</strong> <strong>Integration</strong> <strong>of</strong> <strong>MovieLens</strong> <strong>and</strong> <strong>IMDb</strong> <strong>Data</strong> – Technical Report<br />
2.1.2. Source files <strong>of</strong> the small data set<br />
The small data set consists in 5 text files, with tabular format, describing 100000 anonymous ratings <strong>of</strong> 1682<br />
movies made by 943 users during the seven-month period from September 19 th , 1997 through April 22 nd , 1998.<br />
In the following, we describe the contents <strong>of</strong> each text file.<br />
u.data<br />
This file contains data about 100000 ratings (1 evaluation, corresponding to 1 user <strong>and</strong> 1 movie, in each line). It<br />
is a tab-separated list <strong>of</strong> UserID, MovieID, Rating <strong>and</strong> Timestamp, where:<br />
4<br />
− UserID is an integer, ranging from 1 to 943, that identifies a user. Each user has rated at least 20 movies.<br />
− MovieID is an integer, ranging from 1 to 1682, that identifies a movie.<br />
− Rating is an integer, ranging from 1 to 5, made on a 5-star scale (whole-star ratings only).<br />
− Timestamp is represented in seconds since 1/1/1970 UTC<br />
<strong>Data</strong> is r<strong>and</strong>omly ordered. Figure 5 shows an extract <strong>of</strong> the file corresponding to 5 ratings.<br />
u.item<br />
196 242 3 881250949<br />
186 302 3 891717742<br />
22 377 1 878887116<br />
244 51 2 880606923<br />
166 346 1 886397596<br />
Figure 5 – Extract <strong>of</strong> the u.data file<br />
This file contains data about 1682 movies (1 movie in each line). This is a pipe-separated list <strong>of</strong> MovieID,<br />
MovieTitle, ReleaseDate, VideoReleaseDate, <strong>IMDb</strong>URL, unknown, Action, Adventure, Animation, Children’s,<br />
Comedy, Crime, Documentary, Drama, Fantasy, Film-Noir, Horror, Musical, Mystery, Romance, Sci-Fi,<br />
Thriller, War <strong>and</strong> Western, where<br />
− MovieID is an integer, ranging from 1 to 1682, that identifies a movie.<br />
− MovieTitle is a String that concatenates movie title <strong>and</strong> year <strong>of</strong> release (between brackets).<br />
− ReleaseDate is a DD-Mon-YYYY date indicating movie release date<br />
− VideoReleaseDate was destinated to video release date but it is always NULL<br />
− <strong>IMDb</strong>URL indicates the URL <strong>of</strong> the movie in the <strong>IMDb</strong> site.<br />
− The last 19 fields correspond to genres; a 1 indicates the movie is <strong>of</strong> that genre, a 0 indicates it is not;<br />
movies can be in several genres at once.<br />
Figure 6 shows an extract <strong>of</strong> the file corresponding to 5 movies.<br />
1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|<br />
0|0|0|0|0|0|0|0|0|0|0<br />
2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|<br />
0|0|0|0|0|0|0|1|0|0<br />
3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|<br />
0|0|0|0|0|0|0|0|0|1|0|0<br />
4|Get Shorty (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Get%20Shorty%20(1995)|0|1|0|0|0|1|0|<br />
0|1|0|0|0|0|0|0|0|0|0|0<br />
5|Copycat (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Copycat%20(1995)|0|0|0|0|0|0|1|0|1|0|0|<br />
0|0|0|0|0|1|0|0<br />
Figure 6 – Extract <strong>of</strong> the u.item file