Extraction and Integration of MovieLens and IMDb Data - APMD
Extraction and Integration of MovieLens and IMDb Data - APMD
Extraction and Integration of MovieLens and IMDb Data - APMD
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
u.genre<br />
Verónika Peralta<br />
This file contains data about 19 movie genres (1 genre in each line). This is a pipe-separated list <strong>of</strong> GenreID <strong>and</strong><br />
GenreName, where<br />
− GenreID is an integer, ranging from 0 to 18, that identifies a genre.<br />
− GenreName is a String describing the genre. Genre names corresponds to genre columns <strong>of</strong> the u.item<br />
file.<br />
Figure 7 shows an extract <strong>of</strong> the file corresponding to 5 genres.<br />
u.user<br />
unknown|0<br />
Action|1<br />
Adventure|2<br />
Animation|3<br />
Children's|4<br />
Figure 7 – Extract <strong>of</strong> the u.genre file<br />
This file contains data about 943 users (1 user in each line). This is a pipe-separated list <strong>of</strong> UserID, Age, Gender,<br />
Occupation <strong>and</strong> Zip-code, where:<br />
− UserID is an integer, ranging from 1 to 943, that identifies a user<br />
− Age is an integer indicating user’s age<br />
− Gender is denoted by a "M" for male <strong>and</strong> "F" for female<br />
− Occupation is an String indicating user occupation.<br />
− Zip-code is a five-digits integer indicating user ZIP-code.<br />
All demographic information was provided voluntarily by the users <strong>and</strong> was not checked for accuracy. Only<br />
users who have provided some demographic information are included in this data set. Figure 8 shows an extract<br />
<strong>of</strong> the file corresponding to 5 users.<br />
u.occupation<br />
1|24|M|technician|85711<br />
2|53|F|other|94043<br />
3|23|M|writer|32067<br />
4|24|M|technician|43537<br />
5|33|F|other|15213<br />
Figure 8 – Extract <strong>of</strong> the u.user file<br />
This file lists 21 user occupations (1 occupation in each line). Figure 9 shows an extract <strong>of</strong> the file corresponding<br />
to 5 occupations.<br />
2.2. <strong>MovieLens</strong> target schemas<br />
administrator<br />
artist<br />
doctor<br />
educator<br />
engineer<br />
Figure 9 – Extract <strong>of</strong> the u.occupations file<br />
Both <strong>MovieLens</strong> data set were extracted to a Micros<strong>of</strong>t Access® database. The following sub-sections describe<br />
the target schemas for both data sets.<br />
5