23.01.2014 Views

An Object Oriented Multidimensional Data Model for OLAP - CiteSeerX

An Object Oriented Multidimensional Data Model for OLAP - CiteSeerX

An Object Oriented Multidimensional Data Model for OLAP - CiteSeerX

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>An</strong> <strong>Object</strong> <strong>Oriented</strong> <strong>Multidimensional</strong> <strong>Data</strong> <strong>Model</strong> <strong>for</strong><br />

<strong>OLAP</strong><br />

Thanh Binh Nguyen, A Min Tjoa, and Roland Wagner<br />

Institute of Software Technology (E188), Vienna University of Technology<br />

Favoritenstrasse 9-11/188, A-1040 Vienna, Austria<br />

{binh,tjoa}@ifs.tuwien.ac.at<br />

Institute of Applied Knowledge Processing, University of Linz<br />

Altenberger Strasse 69, A-4040 Linz, Austria<br />

wagner@ifs.uni-linz.ac.at<br />

Abstract. Online <strong>An</strong>alytical Processing (<strong>OLAP</strong>) data is frequently organized in<br />

the <strong>for</strong>m of multidimensional data cubes each of which is used to examine a set<br />

of data values, called measures, associated with multiple dimensions and their<br />

multiple levels. In this paper, we first propose a conceptual multidimensional<br />

data model, which is able to represent and capture natural hierarchical<br />

relationships among members within a dimension as well as the relationships<br />

between dimension members and measure data values. Hereafter, dimensions<br />

and data cubes with their operators are <strong>for</strong>mally introduced. Afterward, we use<br />

UML (Unified <strong>Model</strong>ing Language) to model the conceptual multidimensional<br />

model in the context of object oriented databases.<br />

1. Introduction<br />

<strong>Data</strong> warehouses and <strong>OLAP</strong> are essential elements of decision support [5], they<br />

enable business decision makers to creatively approach, analyze and understand<br />

business problems [16]. While data warehouses are built to store very large amounts<br />

of integrated data used to assist the decision-making process [9], the concept of<br />

<strong>OLAP</strong>, which is first <strong>for</strong>mulated in 1993 by [6] to enable business decision makers to<br />

work with data warehouses, supports dynamic synthesis, analysis, and consolidation<br />

of large volumes of multidimensional data [7]. <strong>OLAP</strong> systems organize data using the<br />

multidimensional paradigm in the <strong>for</strong>m of data cubes, each of which is a combination<br />

of multiple dimensions with multiple levels per dimension. Summarized data is preaggregated<br />

and stored with the main purpose to explore the relationship between<br />

independent, static variables, dimensions, and dependent, dynamic variables,<br />

measures [3]. Moreover, dimensions always have structures and are linguistic<br />

categories that describe different ways of looking at the in<strong>for</strong>mation [4]. These<br />

dimensions contain one or more natural hierarchies, together with other attributes that<br />

do not have a hierarchy’s relationship to any of the attributes in the dimensions [10].<br />

Having and handling the predefined hierarchy or hierarchies within dimensions<br />

provide the foundation of two typical operations like rolling up and drilling down.<br />

Because unbalanced and multiple hierarchical structures (Fig. 1,2) are the common


structures of dimensions, the two current <strong>OLAP</strong> technologies, namely R<strong>OLAP</strong> and<br />

M<strong>OLAP</strong>, have limitations in the handling of dimensions with these structures [15].<br />

all<br />

All<br />

1999<br />

Year<br />

Q1.1999<br />

Quater<br />

Jan.1999<br />

Feb.1999<br />

Mar.1999<br />

W1.1999<br />

W5.1999 W9.1999<br />

Month<br />

Week<br />

Day<br />

1.Jan.1999 6.Jan.1999 1.Feb.1999 3.Feb.1999 3.Mar.1999<br />

Fig. 1. <strong>An</strong> instance of the dimension Time with unbalanced and<br />

multiple hierarchical structure<br />

Fig. 2. A schema of the<br />

dimension Time with<br />

multihierarchical<br />

structure.<br />

R<strong>OLAP</strong> (Relational <strong>OLAP</strong>) products are set on top of existing relational database<br />

management systems (RDBMS), which are well standardized and meet the needs of<br />

storing large amounts of data. Dimensions and facts are mapped into relational tables,<br />

called fact and dimension tables, organized as Star Schema and/or Snowflake Schema<br />

[10]. There<strong>for</strong>e in many cases, R<strong>OLAP</strong> products are not suitable <strong>for</strong> handling<br />

dimensions with multihierarchical and unbalanced structures. Furthermore, existing<br />

relational query languages (e.g. SQL) are not sufficiently powerful or flexible enough<br />

to support true <strong>OLAP</strong> capabilities [19]. [3] clearly demonstrated the mismatch<br />

between multidimensional operations and SQL.<br />

Although M<strong>OLAP</strong> (<strong>Multidimensional</strong> <strong>OLAP</strong>) easily supports dimensions with<br />

multiple and unbalanced hierarchical structures and M<strong>OLAP</strong> queries are very<br />

powerful and flexible in terms of <strong>OLAP</strong> processing [14], there are still several<br />

challenges <strong>for</strong> these products. First, the underlying data structures are limited in their<br />

ability to support multiple subject areas and to provide access to detailed data.<br />

Navigation and analysis of data is limited because the data is designed according to<br />

previously determined requirements [7]. In addition, with products that require<br />

complete pre-calculation, the dimensional explosion could result in physical database<br />

that is unmanageable [14].<br />

The first goal of this paper is the introduction of a conceptual multidimensional<br />

data model that facilitates a precise rigorous conceptualization <strong>for</strong> <strong>OLAP</strong>. First, the<br />

model is able to represent and capture natural hierarchical relationships among<br />

members within a dimension. There<strong>for</strong>e, dimensions with complex structures, such<br />

as: unbalanced and multihierarchical structures, can be handled. Moreover, the data<br />

model is able to represent the relationships between dimension members and measure<br />

data values by mean of cube cells. Hereafter, the data cubes, which are basic<br />

components in multidimensional data analysis, are <strong>for</strong>mally introduced. Furthermore,<br />

cube operators (e.g. jumping, rollingUp and drillingDown) are defined in a very<br />

elegant manner.


The second goal is the modeling of the conceptual multidimensional data model in<br />

term of classes by using UML. Based on the <strong>for</strong>mal representation of the class<br />

specifications in UML, the design and implementation of the data model <strong>for</strong> object<br />

oriented databases are straight<strong>for</strong>ward.<br />

The remainder of this paper is organized as follows. In section 2, we discuss about<br />

related works. Then in section 3, we introduce a conceptual data model that will be<br />

mapped into object-oriented database by means of UML in section 4. The paper<br />

concludes with section 5, which presents our current and future works.<br />

2. Related works<br />

Since Codd’s [6] <strong>for</strong>mulated the term Online <strong>An</strong>alytical Processing (<strong>OLAP</strong>) in 1993,<br />

many commercial products, like Arborsoft (now Hyperion) Essbase, Cognos<br />

Powerplay or MicroStrategy’s DSS Agent have been introduced on the market [2].<br />

But un<strong>for</strong>tunately, sound concepts were not available at the time of the commercial<br />

products being developed. The scientific community struggles hard to deliver a<br />

common basis <strong>for</strong> multidimensional data models ([1], [4], [8], [11], [12], [13], [21]).<br />

The data models presented so far differ in expressive power, complexity and<br />

<strong>for</strong>malism. In the followings, some research works in the field of data warehousing<br />

systems and <strong>OLAP</strong> tools are summarized.<br />

In [12] a multidimensional data model is introduced based on relational elements.<br />

Dimensions are modeled as “dimension relations”, practically annotating attributes<br />

with dimension names. The cubes are modeled as functions from the Cartesian<br />

product of the dimensions to the measure and are mapped to “grouping relations”<br />

through an applicability definition.<br />

In [8] n-dimensional tables are defined and a relational mapping is provided<br />

through the notation of completion. <strong>Multidimensional</strong> database are considered to be<br />

composed from set of tables <strong>for</strong>ming denormalized star schemata. Attribute<br />

hierarchies are modeled through the introduction of functional dependencies in the<br />

attributes of dimension tables.<br />

[4] modeled a multidimensional database through the notations of dimensions and<br />

f-tables. Dimensions are constructed from hierarchies of dimension levels, whereas f-<br />

tables are repositories <strong>for</strong> the factual data. <strong>Data</strong> are characterized from a set of roll-up<br />

functions, mapping the instance of a dimension level to instances of other dimension<br />

level.<br />

In statistical databases, [17] presented a comparison of work done in statistical and<br />

multidimensional databases. The comparison was made with respect to application<br />

areas, conceptual modeling, data structure representation, operations, physical<br />

organization aspects and privacy issues.<br />

In [3], a framework <strong>for</strong> <strong>Object</strong>-<strong>Oriented</strong> <strong>OLAP</strong> is introduced. Two major physical<br />

implementations exist today: R<strong>OLAP</strong> and M<strong>OLAP</strong> and their advantages and<br />

disadvantages due to physical implementation were introduced. The paper also<br />

presented another physical implementation called O3LAP model.<br />

[20] took the concepts and basic ideas of the classical multidimensional model<br />

based on the <strong>Object</strong>-<strong>Oriented</strong> paradigm. The basic elements of their <strong>Object</strong> <strong>Oriented</strong>


<strong>Multidimensional</strong> <strong>Model</strong> are dimension classes and fact classes. They also presented<br />

cube classes as the basic structure to allow a subsequent analysis of the data stored in<br />

the system.<br />

In this paper, we address a suitable mutidimensional data model <strong>for</strong> <strong>OLAP</strong>. The<br />

main contributions are: (a) the introduction of a <strong>for</strong>mal multidimensional data model;<br />

(b) the very elegant manners of definitions of three cube operators, namely jumping,<br />

rollingUp and drillingDown; (c) the modeling of the conceptual multidimensional<br />

data model in term of classes by using UML.<br />

3. A Conceptual <strong>Data</strong> <strong>Model</strong><br />

In our approach, a multidimensional data model is constructed based on a set of<br />

dimensions = { D 1 ,..,D x },<br />

x ∈ N<br />

M = M 1 ,.., M y , y ∈ and a<br />

D , a set of measures { } N<br />

set of data cubes C = { C 1 ,.., C z },<br />

z ∈ N<br />

. The following sections <strong>for</strong>mally introduce the<br />

descriptions of dimensions with their structures, measures and data cubes.<br />

3.1. The Concepts of Dimensions<br />

First, we introduce hierarchical relationships among dimension members by means of<br />

one hierarchical domain per dimension. A hierarchical domain is a set of dimension<br />

members, organized in hierarchy of levels, corresponding to different levels of<br />

granularity. It allows us to consider a dimension schema as a partially ordered set of<br />

levels. In this concept, a hierarchy is a path along the dimension schema, beginning at<br />

the root level and ending at a leaf level. Moreover, the recursive definitions of two<br />

dimension operators, namely ancestor and descendant, provide abilities to navigate<br />

along a dimension structure. In a consequence, dimensions with any complexity in<br />

their structures can be captured with this data model.<br />

Definition 3.1.1. [Dimension Hierarchical Domain] A hierarchical domain of a<br />

dimension D is a non-empty set and denoted by dom ( D) = { all}<br />

∪ { dm1,..,<br />

dm n },<br />

where:<br />

• Each dimension member dmi<br />

is a data item within a dimension. E.g. 1999,<br />

Q1.1999, Jan.1999, and 1.Jan.1999, etc are dimension members within the<br />

dimension Time (Fig. 1).<br />

• Such that the graph G<br />

M<br />

= ( V , E)<br />

, defined as the representation over the binary<br />

relation over the dom (D)<br />

, is a tree and defined as follows:<br />

V = dom(D) ,<br />

E ⊂ dom( D) × dom(D)<br />

. ∀( dmi<br />

, dm j ) ∈ E : dmi<br />

M dm j is an edge in G . The<br />

M<br />

edge is given when there is an ordered relationship in the sense of hierarchy.<br />

• <strong>An</strong>d the two operators {+,-}: ∀ dm i ∈ dom(D) :<br />

− ( dm i ) = { dm j ∈ dom(D)<br />

: dm j M dmi}


+ ( dm i ) = { dmk<br />

∈ dom(D)<br />

: dmi<br />

M dmk}<br />

• The all or root member: ( ∃ ! all ∈ dom(D))(<br />

¬∃dm<br />

∈ dom(D)<br />

: dm M all)<br />

.<br />

• Leaf members: ∀ dm ∈ dom(D))(<br />

¬∃dm<br />

∈ dom(D),<br />

i ≠ j : dm dm ) .<br />

( i<br />

j<br />

i M j<br />

Example: Figure 1 shows a representation in tree term of the dimension Time.<br />

Hereafter, we have:<br />

dom(Time)={all,1999,Q1.1999,..,3.Mar.1999},<br />

all M 1999,1999 M Q1.1999,...,Mar.1999 M 3.Mar.1999,<br />

-(1999)=all; +(1999)={Q1.1999,W1.1999,W5.1999,W9.1999}<br />

Definition 3.1.2. [Dimension Levels] Let Levels ( D) = All ∪ { l1 ,.., lh } , h ∈ N be a finite<br />

set of levels of a dimension D, where:<br />

• The collection of subsets { dom ( l 1 ),.., dom(<br />

l h )}<br />

is a partition of dom(D),<br />

• The All or root level: ∃ ! All ∈ Levels(D)<br />

: dom(<br />

All)<br />

= { all}<br />

,<br />

• Leaf levels: { l ∈ Levels(D)<br />

∀dm<br />

∈ dom(<br />

l ) : dm a leafmember}<br />

.<br />

i<br />

j<br />

Example: The dimension Time has six levels Levels(Time)={All,Year,Quarter,<br />

Month,Week,Day}. <strong>An</strong>d:<br />

dom(All)= {all}, dom(Year)={1999}, dom(Quarter)= {Q1.1999}<br />

dom(Month)= {Jan.1999,Feb.1999,Mar.1999},<br />

dom(Week)={W1.1999,W5.1999,W9.1999},<br />

dom(Day)= {1.Jan.1999,6.Jan.1999,1.Feb.1999,3.Feb.1999,3.Mar.1999}<br />

Definition 3.1.3. [Dimension Schema] A schema of a dimension D, denoted by<br />

DSchema(D)= Levels( D) , L , is a partially ordered set of levels:<br />

• Levels(D) is a finite set of dimension levels,<br />

• <strong>An</strong>d L is an ordered relation over the levels and satisfies the following<br />

condition:<br />

l l if ∃dm<br />

∈ dom(<br />

l )) and ( ∃dm<br />

∈ dom(<br />

l )) : dm dm .<br />

i<br />

L<br />

j<br />

( t i<br />

u<br />

j<br />

All All All<br />

i<br />

j<br />

t<br />

M<br />

u<br />

Category<br />

Country<br />

Year<br />

Type<br />

State<br />

Quarter<br />

Month<br />

Week<br />

Item<br />

Product<br />

City<br />

Geography<br />

Time<br />

Day<br />

Fig. 3. Schemas of three dimensions Product, Geography and Time<br />

Example: Figure 3 is used to describe schemas of three dimensions Product,<br />

Geography, and Time.<br />

DSchema(Product)={All L Category, Category L Type, Type L Item}


DSchema(Geography)={All L Country, Country L State, State L City}<br />

DSchema(Time)={All L Year ,Year L Quarter, Quarter L Month, Month L Day ,<br />

All L Year, Year L Week, Week L Day}<br />

Definition 3.1.4. [Dimension Path] A path within a dimension schema is a linear,<br />

totally ordered list of levels and can be defined as follows:<br />

∀ l , l ∈ Levels(D) :<br />

i<br />

j<br />

{<br />

li<br />

L l j}<br />

<br />

path(<br />

li,<br />

l j ) = i L t u<br />

<br />

<br />

φ<br />

{ l l ,.., l l }<br />

L<br />

j<br />

If l <br />

i<br />

L<br />

l<br />

j<br />

Else if ∃ lt<br />

,.., lu<br />

∈ Levels(D)<br />

: li<br />

L lt<br />

,.., lu<br />

L l j<br />

Definition 3.1.5. [Dimension Hierarchy] A hierarchy is defined by a path ( All,<br />

lleaf<br />

)<br />

within the schema of a dimension D. The path begins at All (root) level and ends at a<br />

leaf level.<br />

Let H ( D) = { h1 ,.., hm }, m∈ N be a set of hierarchies of a dimension D. If m=1 then<br />

the dimension has single hierarchical structure, else the dimension has<br />

multihierarchical structure.<br />

Definition 3.1.6. [Dimension Operators] Two dimension operators (DO), namely<br />

ancestor and descendant, are defined recursively as follows:<br />

∀ li<br />

, la,<br />

ld<br />

∈ Levels(D)<br />

and ∀ dm ∈ dom( li ) ⊂ dom(D)<br />

:<br />

undefined<br />

If ( path ( l a,<br />

l i ) = φ)<br />

<br />

−<br />

−<br />

dm ∈ dom(<br />

l ∈ −<br />

<br />

a ) : dm ( dm)<br />

Else If ( la<br />

L li<br />

)<br />

ancestor(<br />

dm,<br />

la<br />

,D) = <br />

−<br />

ancestor(<br />

dm , la<br />

,D), where :<br />

Else<br />

<br />

−<br />

<br />

dm = ancestor(<br />

dm,<br />

l p ,D) : l p L li<br />

undefined<br />

If ( path ( l i , l d ) = φ)<br />

<br />

+<br />

+<br />

{ dm ∈ dom(<br />

l ∈ +<br />

<br />

d ) : dm ( dm)}<br />

Else If ( li<br />

L ld<br />

)<br />

descendant(<br />

dm,<br />

ld<br />

,D) = <br />

+<br />

descendant(<br />

dm , ld<br />

,D),where :<br />

Else<br />

<br />

+<br />

dm<br />

∈ descentdant(<br />

dm,<br />

ln,D)<br />

: li<br />

L ln<br />

Example: ancestor(Q1.1999,Year,Time)=1999,<br />

descendant(Q1.1999,Month,Time)= {Jan.1999,Feb.1999,Mar.1999},<br />

Else<br />

3.2. The Concepts of Measures<br />

In this section we introduce measures, which are the objects of analysis in the context<br />

of multidimensional data model.


Definition 3.2.1. [Measure Schema] A schema of a measure M is a tuple<br />

MSchema ( M) = Fname,<br />

O , where:<br />

• Fname is a name of a corresponding fact,<br />

• O ∈ Ω ∪{NONE, COMPOSITE} is an operation type applied to a specific fact [2]:<br />

− Ω={SUM, COUNT, MAX, MIN} is a set of aggregation functions,<br />

− COMPOSITE is an operation (e.g. average), where measures cannot be<br />

utilized in order to automatically derive higher aggregations,<br />

− NONE measures are not aggregated. In this case, the measure is the fact.<br />

Definition 3.2.2. [Measure Domain] Let N be a numerical domain where a measure<br />

value is defined (e.g. N, Z, R). The domain of a measure M is a subset of N. We<br />

denote by dom (M) ⊂ N .<br />

3.3. The Concepts of <strong>Data</strong> Cubes<br />

A multidimensional cube is constructed based on a set dimensions and a set of<br />

measures, and consists a collection of cells. Each cell is an intersection among a set of<br />

dimension members and measure data values. Furthermore, cells are grouped into<br />

granular groupbys, each of which expresses a mapping from the domains of x-tuple of<br />

dimension levels (independent variables) to y-numerical domains of y-tuple of<br />

numeric measures (dependent variables).<br />

Product<br />

Mexico<br />

USA<br />

Alcoholic<br />

10<br />

Dairy<br />

50<br />

Beverage<br />

20<br />

Baked Food 12<br />

Meat<br />

15<br />

Seafood<br />

10<br />

Geography<br />

1 2 3 4 5 6<br />

Time<br />

Fig. 4. Sale cube includes dimensions: Geography, Product and Time and a fact: Sale amount.<br />

Given x dimensions D 1 ,.., D x , x ∈ N , and y measures M 1 ,.., M y , y ∈ N .<br />

Definition 3.3.1. [Cube Schema] A cube schema is tuple<br />

CSchema(C)= Cname , DSchemas,<br />

MSchemas :<br />

• Cname is the name of a cube,<br />

• DSchemas are the schemas of x dimensions, denoted by<br />

DSchemas =< DSchema( D1 ),.., DSchema(D<br />

x ) > ,<br />

• MSchemas are the schemas of y measures, denoted by<br />

MSchemas =< MSchema M ),.., MSchema(M<br />

) > .<br />

( 1 y


Definition 3.3.2. [Cube Domain] Given a function:<br />

f : dom(D 1)<br />

× .. × dom(Dx)<br />

× dom(M 1)<br />

× .. × dom(M<br />

y)<br />

→ { true,<br />

false}<br />

,<br />

A cube domain, denoted by { } N<br />

dom ( C) = c1 ,.., ck , k ∈ is determined as follows:<br />

dom (C) = c = ( dms,<br />

fms)<br />

dms ∈ dom(D<br />

) × .. × dom(D<br />

),<br />

{ 1 x<br />

fms ∈ dom( M 1)<br />

× .. × dom(M<br />

y ) : f ( dms,<br />

fms)<br />

= true}<br />

3.4. Operating Group by, Rolling Up, Drilling Down in our model<br />

Let a cube C be constituted from x dimensions<br />

D 1<br />

,.., D x , x ∈ N<br />

, and y measures<br />

M 1 ,.., M y , y ∈ N . We define groupby and three operators, namely jumping,<br />

rollingUp and drillingDown as follows:<br />

Definition 3.4.1. [Groupby] A groupby is triple G= Gname , GSchema(G),<br />

dom(G)<br />

where:<br />

• Gname is the name of this groupby,<br />

• GSchema(G)= GLevels ( G), GMSchemas(G)<br />

:<br />

GLevels( G) =< lD 1<br />

,.., lD<br />

>∈ Levels(D1)<br />

× .. × Levels(D<br />

x ) is a x-tuple of levels of<br />

x<br />

the x dimensions D 1 ,.., D x , x ∈ N .<br />

GMSchemas ( G) =< MSchema(M 1),..,<br />

MSchema(M<br />

y ) > is a y-tuple of measure<br />

schemas of the y measures M 1 ,.., M y , y ∈ N .<br />

• dom ( G) = { c =< dms,<br />

fms >∈dom(C)<br />

| dms ∈ dom(<br />

lD 1<br />

) × .. × dom(<br />

lD<br />

x<br />

),<br />

fms ∈ dom M ) × .. × dom(M<br />

)}<br />

( 1 y<br />

Let h i be a number of levels of each dimension D i (1≤i≤x). The total set of<br />

x<br />

p ∏h i<br />

i=<br />

1<br />

groupbys over a cube C is defined as Groupbys( C) = {G1,..,G<br />

}, p = [18].<br />

Definition 3.4.2. [Cube Operators] The three basic navigational cube operators (CO),<br />

namely jumping, rollingUp and drillingDown, which are applied to navigate along a<br />

data cube C, corresponding to a dimension D i , are defined as follows:<br />

Given a current groupby G c , associated with a level l c of a dimension D, i and<br />

three other levels l j , lr<br />

, ld<br />

∈ Levels(Di<br />

) .<br />

• Jumping:<br />

jumping G , l , D ) = G =< GLevels(G<br />

), GMSchemas(G<br />

) ><br />

( c j i j<br />

j<br />

j<br />

Where:<br />

GMSchemas ( G j ) = GMSchemas(G<br />

c),<br />

GLevels ( G j )( i)<br />

= l j,<br />

GLevels( G j )( k)<br />

= GLevels(Gc)(<br />

k),<br />

∀k<br />

≠ i.<br />

• Rolling Up: ∀ dm ∈ dom( l c ) , Gr<br />

= jumping(Gc,<br />

lr<br />

, Di<br />

) :


sub<br />

sub sub<br />

rollingUp(Gc,<br />

dm,<br />

lr<br />

,Di<br />

) = Gr<br />

=< GSchema(Gr<br />

), dom(Gr<br />

) ><br />

Where:<br />

sub<br />

GSchema (Gr<br />

) = GSchema(Gr<br />

) ,<br />

sub<br />

dom( Gr<br />

) = { cr<br />

∈ dom(Gr<br />

) | ∃c<br />

∈ dom(Gc<br />

) : c . dms(<br />

i)<br />

= dm,<br />

c dms(<br />

i)<br />

= ancestor(<br />

dm,<br />

, D ) , . dms(<br />

j)<br />

= c.<br />

dms(<br />

j),<br />

∀j<br />

≠ i}<br />

r. l r i c r<br />

dm ∈ dom( l c Gd<br />

c d i<br />

sub<br />

sub<br />

sub<br />

drillingDown(Gc,<br />

dm,<br />

ld<br />

, Di<br />

) = G d =< GSchema(G<br />

d ), dom(G<br />

d ) ><br />

• Drilling Down: ∀ ) , = jumping(G<br />

, l , D ) :<br />

Where:<br />

sub<br />

GSchema (Gd<br />

) = GSchema(Gd<br />

),<br />

sub<br />

dom( Gd<br />

) = { cd<br />

∈ dom(Gd<br />

) | ∃c<br />

∈ dom(Gc)<br />

: c . dms(<br />

i)<br />

= dm,<br />

c dms(<br />

i)<br />

∈ descendant(<br />

dm,<br />

,D ), . dms(<br />

j)<br />

= c.<br />

dms(<br />

j),<br />

∀j<br />

≠ i}<br />

d . l d i<br />

c d<br />

4. <strong>Model</strong>ing <strong>Multidimensional</strong> <strong>Data</strong> <strong>Model</strong><br />

In this section, UML is used to model dimensions, measures and data cubes in context<br />

of an object oriented data model. All conceptual components, which are introduced in<br />

section 3, are mapped as classes. Figure 5 illustrates the modeling <strong>for</strong> our data model<br />

in term of class diagrams by using UML.<br />

4.1. The <strong>Model</strong>ing of Dimensions<br />

The dimension concepts, such as: dimension members, levels, dimension schemas,<br />

hierarchy and dimension, are modeled in term of class diagrams by using UML. First,<br />

a hierarchical domain of dimension members within a dimension is handled by means<br />

of the DMember class. The two dimension member operators {+} and {-} are mapped<br />

into the two methods getFathers and getChildren, built-in every instance of this class.<br />

Hereafter, The Level, DSchema, Hierarchy classes are defined to describe the<br />

concepts of level, dimension schema and dimension hierarchy. Afterwards, each<br />

instance of the Dimension class describes a dimension. The dimension operators,<br />

namely ancestor and descendant are mapped as two methods with the same names.<br />

4.1.1. DMember. Each instance of this class describes a dimension member within a<br />

hierarchical domain of a dimension.<br />

• Attributes:<br />

description (String) - That is a data item within a dimension.<br />

• Relationships:<br />

fathers (Set) - A set of referred DMember objects.<br />

children (Set) - A set of referred DMember objects<br />

• Main methods:<br />

getFathers() (return Set) - This method describes the operator {-}.<br />

getChildren() (returns Set) - This method describes the operator {+}.


HasChild<br />

+Father<br />

0..*<br />

Description : String;<br />

DMember<br />

setDescription(descM : String) : void;<br />

getDescription() : string;<br />

addChild(aCM : DMember) : boolean;<br />

removeChild(rCM : DMember) : boolean;<br />

getChildren() : set of DMember;<br />

addFather(aFM : DMember) : boolean;<br />

removeFather(rFM : DMember) : boolean;<br />

getFathers() : set of DMember;<br />

HasFather<br />

+Child 0..*<br />

+Child<br />

1..*<br />

+Father<br />

1..*<br />

refers to<br />

Cell<br />

addDMember(newDM : DMember) : boolean;<br />

removeDMember(index : int) : boolean;<br />

getDMember(index : int) : DMember;<br />

getDMem bers () : s et of D Mem ber;<br />

addMValue(newMV : MeasureValue) : boolean;<br />

removeMValue(index : int) : boolean;<br />

getMValue(index : int) : MeasureValue;<br />

getMValues() : set of MeasureValue;<br />

1..*<br />

contains<br />

IntergerV alue<br />

value : int;<br />

1..*<br />

value : type;<br />

setValue(newV : int) : void;<br />

getValue() : int;<br />

MeasureValue<br />

setValue(newV : Type) : void;<br />

getValue() : Type;<br />

value : float;<br />

floatValue<br />

setValue(newV : float) : void;<br />

getValue() : float;<br />

MS chem a<br />

Fname : String;<br />

AggFunction : String;<br />

1..*<br />

contains<br />

contains<br />

Gname : String;<br />

GSc hema<br />

contains<br />

MSchemas<br />

Lname : String;<br />

Level<br />

setLname(lname : String) : void;<br />

getLname() : String;<br />

addDM(m : DMember) : boolean;<br />

removeDM(m : DMember) : boolean;<br />

getDM(descM : String) : DMember;<br />

getDMs() : s et of DMember;<br />

1..*<br />

refers to<br />

1..*<br />

1..*<br />

1..*<br />

refers to<br />

contains<br />

refers to<br />

getGname() : String;<br />

setGname(newGname : String) : void;<br />

updateMSchemas(newMSs : MSchemas) : boolean;<br />

addLevels(aLevel : Level)<br />

rem oveLevel(index int) : Level;<br />

Dname : String;<br />

DSchema<br />

addLevel(addlevel : Level) : boolean;<br />

removeLevel(lname : String) : boolean;<br />

getLevel(lname : String) : Level;<br />

1..*<br />

contains<br />

Groupby<br />

setGSchema(gschema : GSchema) : void;<br />

getGSchema() : GSchema;<br />

addCell(c : Cell) : boolean;<br />

removeCell(c : Cell) : boolean;<br />

getCell(Dname : String, dm : DMember) : cell;<br />

contains<br />

refers to<br />

1..*<br />

refers to<br />

addMSchem a(ms : MSchema) : boolean;<br />

rem oveMSchema(ms : MSchema) : boolean;<br />

rem oveMSchema(index : int) : boolean;<br />

getMSchema(index : int) : MSchem a;<br />

Cname : String;<br />

contains<br />

CSchema<br />

getCname() : String;<br />

setCname(newName : String) : void;<br />

updateMSchemas(newMS : MSchemas) : void;<br />

updateDSchemas(newDM : DSchem as) : void;<br />

contains<br />

contains<br />

Dimension<br />

Bas icGroupby : Groupby;<br />

Cube<br />

Hname : String;<br />

Hierarchy<br />

Hierarchy() : void;<br />

setHname(hname : String) : void;<br />

insertLevel(position : int, l : Level) : boolean;<br />

removeLevel(lname : String) : boolean;<br />

getLevel(lname : String) : Level;<br />

getPreLevel(lname : String) : Level;<br />

getAfterLevel(lname : String) : Level;<br />

getLevels() : set of Level;<br />

1..*<br />

contains<br />

setDSchema(newDS : DSchema) : void;<br />

getDSchema() : DSchema;<br />

addHierarchy(newH : Hierarchy) : boolean;<br />

rem oveHierarchy(hname : String) : boolean;<br />

getHierarchy(hname : String) : Hierarchy;<br />

getHierarchies () : set of Hierarchy;<br />

addLevel(l : Level) : boolean;<br />

rem oveLevel(lname : String) : boolean;<br />

getLevel(lname : String) : Level;<br />

getLevels() : set of Level;<br />

ances tor(cDM : DMember, lname : String) : DMember;<br />

descendant(cDM : DMember, lname : String) : set of DMember;<br />

1..*<br />

refers to<br />

setCSchema(new CS : CubeSchem a) : void;<br />

getCSchema() : CubeSchema;<br />

addGroupby(newG : Groupby) : boolean;<br />

rem oveGroupby(gnam e : String) : boolean;<br />

getGroupby(gnam e : String) : Groupby;<br />

getGroupbyNames() : set of String;<br />

addDim ension(newD : Dimension) : boolean;<br />

rem oveDim(dnam e : String) : boolean;<br />

getDim(dnam e : String) : Dimension;<br />

getDimens ions () : set of Dimens ion;<br />

jum ping(lnam e : String, dnam e : String) : void;<br />

rollingUp(cDM : DMem ber, lname : String, dname : String) : void;<br />

drillingDown(cDM : DMember, lnam e : String, dnam e : String) : void;<br />

Fig.5. The modeling of the object oriented multidimensional data model


4.1.2. Level. Each instance of the Level class describes a level of a dimension.<br />

Attributes:<br />

dmembers (Set) - A set of contained DMember objects.<br />

4.1.3. DSchema. Each instance of the DSchema describes a dimension schema.<br />

• Attributes:<br />

dname (String) - That is a dimension name.<br />

• Relationships:<br />

levels (Set) - A set of Level objects.<br />

4.1.4. Hierarchy. Each instance of the Hierarchy class describes a hierarchy of levels<br />

within a dimension.<br />

• Attributes:<br />

dname (String) - That is a hierarchy name<br />

• Relationships:<br />

levels (Set) - A set of Level objects<br />

4.1.5. Dimension. Each instance of this class describes a dimension.<br />

• Attributes:<br />

dschema (DSchema) - A DSchema object.<br />

hierarchies (Set) - A set of Hierarchy objects.<br />

levels (Set) - A set of Level objects.<br />

• Main methods:<br />

ancestor(DMember cDM;lname:String) (returns DMember): ancestor operator.<br />

descendant(DMember cDM,lname:String) (Set): descendant operator.<br />

4.2. The <strong>Model</strong>ing of Measures<br />

Measure schemas and measure data values are mapped into classes. First, an<br />

MSchema describes a measure schema. Afterwards, The MValue is an abstract type<br />

that serves as a common super type <strong>for</strong> measure values. It is obvious that the two<br />

classes, namely intMValue and floatValue, are subclasses of MValue.<br />

4.2.1. MSchema. Each instance of this class describes a measure schema.<br />

• Attributes:<br />

fname (String) - That is a fact name.<br />

aggFunction (String)- <strong>An</strong> aggregation function, such as Max, Min, Sum, Count,<br />

None and Composite.<br />

4.2.2. MValue. MValue is an abstract type that serves as common super type <strong>for</strong><br />

measure values.<br />

Attributes:<br />

value (Type) – That describes a measure value.<br />

4.2.3. intMValue. The intMValue is a subclass of MValue. Each instance of this class<br />

describes an integer measure value.<br />

• Specializes:<br />

MValue: The intMValue class inherits from MValue class


• Attributes:<br />

value (int) – The value is overridden as integer.<br />

4.2.4. floatMValue. The floatMValue is a subclass of MValue. Each instance of this<br />

class describes a float measure value.<br />

• Specializes:<br />

MValue: The floatMValue class inherits from MValue class<br />

• Attributes:<br />

value (float) – The value is overridden as float.<br />

4.3. The <strong>Model</strong>ing of <strong>Multidimensional</strong> Components<br />

First, each instance of the CSchema, which contains an x-tuple of DSchemas and a y-<br />

tuple of MSchemas, describes a cube schema. Then, a Cell refers to x DMembers of x<br />

dimensions and y MValues of y measures. In addition, a GSchema contains x Levels of<br />

x dimensions and the y-tuple MSchemas. Afterwards, a Groupby contains a GSchema<br />

and a subset of Cells. As a consequence, a Cube contains a CSchema and a<br />

BasicGroupby (Groupby), is associated with a set of Dimensions, and a set of<br />

Groupbys. The three methods, i.e. jumping, rollingUp and drillingDown, are the<br />

mappings of the three operators with the same names.<br />

4.3.1. CSchema. Each instance of the CSchema describes a cube schema.<br />

• Attributes:<br />

dschemas (Set) - A set of x DSchema objects,<br />

mschemas (Set) - A set of y MSchema objects.<br />

4.3.2. Cell. Each instance of this class describes a cube cell.<br />

• Relationships:<br />

dmembers (Set) - A set of x DMember objects.<br />

mvalues (Set) - A set of y MValue objects.<br />

4.3.3. GSchema. Each instance of the GSchema class describes a groupby schema.<br />

• Relationships:<br />

levels (Set) - A set of x Level objects.<br />

mschemas (Set) - A set of y MSchema objects.<br />

4.3.4. Groupby. Each instance of this class describes a groupby.<br />

• Attributes:<br />

gschema (GSchema): A GSchema object that describes a groupby schema.<br />

cells (Set): A set of Cell objects.<br />

4.3.5. Cube. Each instance of the Cube class describes a data cube.<br />

• Attributes:<br />

cschema (CSchema) – A CSchema describes a cube schema of a cube.<br />

basicgroupby (Groupby) - A Groupby object.<br />

• Relationships:<br />

dimensions (Set) - Dimensions express <strong>for</strong> x dimensions of a cube.<br />

groupbies (Set) – A set of Groupby objects,


• Main methods:<br />

jumping(lname:String;dname:String) (void) - jumping operator.<br />

rollingUp(cDM:DMember;lname:String;dname:String) (void) - rollingUp operator.<br />

drillingDown(cDM:DMember;lname:String;dname:String) (void) - drillingDown<br />

operator.<br />

5. Conclusion and future works<br />

In this paper, we have introduced the conceptual multidimensional data model, which<br />

facilitates even sophisticated constructs based on multidimensional data units or<br />

members such as dimension members, measure data values and then cells. The model<br />

is able to represent and capture natural hierarchical relationships among dimension<br />

members. Dimensions with complexity of their structures, such as: unbalanced and<br />

multihierarchical structures, can be modeled in an elegant and consistent way.<br />

Moreover, the data model represents the relationships between dimension members<br />

and measure data values by mean of cube cells. In consequence, data cubes, which are<br />

basic components in multidimensional data analysis, and their operators are <strong>for</strong>mally<br />

introduced in a very elegant manner. We have also proposed a modeling of the<br />

conceptual multidimensional data model in term of classes by means of UML, which<br />

is an object oriented standard analysis and design notation.<br />

In context of future works, we are investigating two approaches <strong>for</strong><br />

implementation: pure object-oriented orientation and object-relational approach. With<br />

the first model, dimensions and cube are mapped into an object-oriented database in<br />

term of classes. In the other alternative, dimensions, measure schema, and cube<br />

schema are grouped into a term of metadata, which will be mapped into objectoriented<br />

database in term of classes. Some useful methods built in those classes are<br />

used to give the required Ids within those dimensions. The given Ids will be joined to<br />

the fact table, which is implemented in relational database.<br />

Acknowledgment<br />

This work is partly supported by the EU-4 th Framework Project GOAL (Geographic<br />

In<strong>for</strong>mation Online <strong>An</strong>alysis)-EU Project #977071 and by the ASEAN European<br />

Union Academic Network (ASEA-Uninet), Project EZA 894/98.<br />

References<br />

1. Agrawal, R., Gupta, A., Sarawagi, A.: <strong>Model</strong>ing <strong>Multidimensional</strong> <strong>Data</strong>bases. IBM<br />

Research Report, IBM Almaden Research Center, September 1995.<br />

2. Albrecht, J., Guenzel, H., Lehner, W.: Set-Derivability of Multidimensiona Aggregates.<br />

First International Conference on <strong>Data</strong> Warehousing and Knowledge Discovery. DaWaK'99,<br />

Florence, Italy, August 30 - September 1.


3. Buzydlowski, J. W., Song, II-Y., Hassell, L.: A Framework <strong>for</strong> <strong>Object</strong>-<strong>Oriented</strong> On-Line<br />

<strong>An</strong>alytic Processing. D<strong>OLAP</strong> 1998<br />

4. Cabibbo, L., Torlone, R.: A Logical Approach to <strong>Multidimensional</strong> <strong>Data</strong>bases. EDBT 1998<br />

5. Chaudhuri, S., Dayal, U.: <strong>An</strong> Overview of <strong>Data</strong> Warehousing and <strong>OLAP</strong> Technology.<br />

SIGMOD Record Volume 26, Number 1, September 1997.<br />

6. Codd, E. F., Codd, S.B., Salley, C. T.: Providing <strong>OLAP</strong> (On-Line <strong>An</strong>alytical Processing) to<br />

user-analysts: <strong>An</strong> IT mandate. Technical report, 1993.<br />

7. Connolly, T., Begg, C.: <strong>Data</strong>base system: a practical approach to design, implementation,<br />

and management. Addison-Wesley Longman, Inc., 1999.<br />

8. Gyssens, M., Lakshmanan, L.V.S.: A foundation <strong>for</strong> multi-dimensional databases, Proc.<br />

VLDB'97.<br />

9. Hurtado, C., Mendelzon, A., Vaisman, A.: Maintaining <strong>Data</strong> Cubes under Dimension<br />

Updates. Proc IEEE/ICDE '99.<br />

10.Kimball, R.: The <strong>Data</strong> Warehouse Lifecycle Toolkit. John Wiley & Sons, Inc., 1998.<br />

11.Lehner, W.: <strong>Model</strong>ing Large Scale <strong>OLAP</strong> Scenarios. 6th International Conference on<br />

Extending <strong>Data</strong>base Technology (EDBT'98), Valencia, Spain, 23-27, March 1998.<br />

12.Li, C., Wang, X.S.: A <strong>Data</strong> <strong>Model</strong> <strong>for</strong> Supporting On-Line <strong>An</strong>alytical Processing. CIKM<br />

1996.<br />

13.Mangisengi, O., Tjoa, A M., Wagner, R.R.: <strong>Multidimensional</strong> <strong>Model</strong>ling Approaches <strong>for</strong><br />

<strong>OLAP</strong>. Proceedings of the Ninth International <strong>Data</strong>base Conference “Heterogeneous and<br />

Internet <strong>Data</strong>bases” 1999, ISBN 962-937-046-8. Ed. J. Fong, Hong Kong, 1999<br />

14.McGuff, F., Kador, J.: Developing <strong>An</strong>alytical <strong>Data</strong>base Applications. Prentice Hall PTR,<br />

1999.<br />

15.Nguyen, T.B., Tjoa, A M., Wagner, R.R.: Conceptual <strong>Object</strong> <strong>Oriented</strong> <strong>Multidimensional</strong><br />

<strong>Data</strong> <strong>Model</strong> <strong>for</strong> <strong>OLAP</strong>. Technical Report, IFS, Vienna 1999.<br />

16.Samtani, S., Mohania, M.K., Kumar, V., Kambayashi, Y.: Recent Advances and Research<br />

Problems in <strong>Data</strong> Warehousing. ER Workshops 1998.<br />

17.Shoshani, A.: <strong>OLAP</strong> and Statistical <strong>Data</strong>bases: Similarities and Differences. Tutorials of<br />

PODS 1997.<br />

18.Shukla A., Deshpande, P., Naughton, J. F., Ramasamy, K.: Storage Estimation <strong>for</strong><br />

<strong>Multidimensional</strong> Aggregates in the Presence of Hierarchies. VLDB 1996: 522-531<br />

19.Thomsen, E.: <strong>OLAP</strong> solutions: Building <strong>Multidimensional</strong> In<strong>for</strong>mation Systems. John<br />

Wiley& Sons, Inc., 1997.<br />

20.Trujillo, J., Palomar, M.: <strong>An</strong> <strong>Object</strong> <strong>Oriented</strong> Approach to <strong>Multidimensional</strong> <strong>Data</strong>base.<br />

Conceptual <strong>Model</strong>ing (OOMD), D<strong>OLAP</strong> 1998.<br />

21.Vassiliadis, P.: <strong>Model</strong>ing <strong>Multidimensional</strong> <strong>Data</strong>bases, Cubes and Cube operations. In Proc.<br />

10th Scientific and Statistical <strong>Data</strong>base Management Conference (SSDBM '98), Capri, Italy,<br />

June 1998.<br />

22.Wang, M., Iyer, B.: Efficient roll-up and drill-down analysis in relational database. In 1997<br />

SIGMOD Workshop on Research Issues on <strong>Data</strong> Mining and Knowledge Discovery, 1997.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!