Extraction and Integration of MovieLens and IMDb Data - APMD
Extraction and Integration of MovieLens and IMDb Data - APMD
Extraction and Integration of MovieLens and IMDb Data - APMD
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4.2. Integrated schema<br />
Verónika Peralta<br />
The integrated schema consists in 52 tables describing movies, companies <strong>and</strong> persons related to movies <strong>and</strong> the<br />
users that evaluated movies. Figure 23 shows the tables <strong>of</strong> the integrated schema, their primary keys (underlined<br />
attributes) <strong>and</strong> foreign keys (arrows between tables). Additional dotted lines relate some tables to a fictitious (not<br />
implemented) relation, describing persons. Shadow tables were used for the construction <strong>of</strong> the referential but<br />
are not visible for making queries. Table 13 describes each table, its attributes <strong>and</strong> constraints.<br />
Table Attributes Constraints<br />
I_Movies<br />
Join between <strong>IMDb</strong> <strong>and</strong><br />
<strong>MovieLens</strong><br />
I_Actors<br />
Master <strong>of</strong> actors<br />
I_Actresses<br />
Master <strong>of</strong> actresses<br />
I_Ages<br />
Age intervals<br />
I_Biographies<br />
Biographies <strong>of</strong> actors,<br />
actresses, directors,<br />
producers <strong>and</strong> other<br />
people involved in movies<br />
I_Budgets<br />
Master <strong>of</strong> budget<br />
intervals<br />
I_Colors<br />
Master <strong>of</strong> colors<br />
I_Countries<br />
Master <strong>of</strong> countries,<br />
providing several<br />
aggregation criteria <strong>and</strong><br />
international country<br />
codes<br />
− MovieId: Numeric(4); <strong>MovieLens</strong>’ id<br />
− Title<strong>MovieLens</strong>: String(100)<br />
− TitleImdb: String(250)<br />
− Actor: String(75)<br />
− MovieQuantity: Numeric(3); the number<br />
<strong>of</strong> played movies<br />
− Actress: String(75)<br />
− MovieQuantity: Numeric(3); the number<br />
<strong>of</strong> played movies<br />
− AgeId: Numeric(2)<br />
− MinAge: Numeric(2)<br />
− MaxAge: Numeric(2)<br />
− Name: String(70)<br />
− RealName: String(220)<br />
− Birth: String(130); date <strong>and</strong> place <strong>of</strong> birth<br />
− Decease: String(160); date, place <strong>and</strong><br />
cause <strong>of</strong> decease<br />
− Height: String(15)<br />
− BudgetUSD: Numeric(15,2); start <strong>of</strong> an<br />
interval (internal use)<br />
Primary key: MovieId<br />
Unique: TitleImdb<br />
Primary key: Actor<br />
Primary key: Actress<br />
Primary key: AgeId<br />
Primary key: Name<br />
Primary key: BudgetUSD<br />
− Color: String(20) Primary key: Color<br />
− Country: String(40)<br />
− LongName: String(110)<br />
− DomainCode: String(2); internet domain<br />
− ISO2Code: String(2); ISO3166-1-alpha2<br />
code<br />
− ISO3Code: String(3); ISO3166-1-alpha2<br />
code<br />
− UNnumericalCode: Numeric(3); united<br />
nations country code<br />
− IsCurrent: Numeric(1); 1 for current<br />
countries, 0 for old ones<br />
− IsSovereign: Numeric(1); 1 for sovereign<br />
UN nations, 2 for sovereign non-UN nations,<br />
3 for sovereign non-recognized nations, 4 for<br />
dependent territories <strong>and</strong> 5 for areas <strong>of</strong><br />
special sovereignty<br />
− Sovereign: String(40); name <strong>of</strong> sovereign<br />
nation (current country in the case <strong>of</strong> old<br />
countries)<br />
− Continent: String(20)<br />
− SecondaryContinent: String(20); for<br />
countries having territories in two continents<br />
(the one with the highest are is taken as main<br />
continent)<br />
− Area: Numeric(8)<br />
− Inhabitants: Numeric(10)<br />
Primary key: Country<br />
29