08.04.2013 Views

Extraction and Integration of MovieLens and IMDb Data - APMD

Extraction and Integration of MovieLens and IMDb Data - APMD

Extraction and Integration of MovieLens and IMDb Data - APMD

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

u.genre<br />

Verónika Peralta<br />

This file contains data about 19 movie genres (1 genre in each line). This is a pipe-separated list <strong>of</strong> GenreID <strong>and</strong><br />

GenreName, where<br />

− GenreID is an integer, ranging from 0 to 18, that identifies a genre.<br />

− GenreName is a String describing the genre. Genre names corresponds to genre columns <strong>of</strong> the u.item<br />

file.<br />

Figure 7 shows an extract <strong>of</strong> the file corresponding to 5 genres.<br />

u.user<br />

unknown|0<br />

Action|1<br />

Adventure|2<br />

Animation|3<br />

Children's|4<br />

Figure 7 – Extract <strong>of</strong> the u.genre file<br />

This file contains data about 943 users (1 user in each line). This is a pipe-separated list <strong>of</strong> UserID, Age, Gender,<br />

Occupation <strong>and</strong> Zip-code, where:<br />

− UserID is an integer, ranging from 1 to 943, that identifies a user<br />

− Age is an integer indicating user’s age<br />

− Gender is denoted by a "M" for male <strong>and</strong> "F" for female<br />

− Occupation is an String indicating user occupation.<br />

− Zip-code is a five-digits integer indicating user ZIP-code.<br />

All demographic information was provided voluntarily by the users <strong>and</strong> was not checked for accuracy. Only<br />

users who have provided some demographic information are included in this data set. Figure 8 shows an extract<br />

<strong>of</strong> the file corresponding to 5 users.<br />

u.occupation<br />

1|24|M|technician|85711<br />

2|53|F|other|94043<br />

3|23|M|writer|32067<br />

4|24|M|technician|43537<br />

5|33|F|other|15213<br />

Figure 8 – Extract <strong>of</strong> the u.user file<br />

This file lists 21 user occupations (1 occupation in each line). Figure 9 shows an extract <strong>of</strong> the file corresponding<br />

to 5 occupations.<br />

2.2. <strong>MovieLens</strong> target schemas<br />

administrator<br />

artist<br />

doctor<br />

educator<br />

engineer<br />

Figure 9 – Extract <strong>of</strong> the u.occupations file<br />

Both <strong>MovieLens</strong> data set were extracted to a Micros<strong>of</strong>t Access® database. The following sub-sections describe<br />

the target schemas for both data sets.<br />

5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!