An Object Oriented Multidimensional Data Model for OLAP - CiteSeerX
An Object Oriented Multidimensional Data Model for OLAP - CiteSeerX
An Object Oriented Multidimensional Data Model for OLAP - CiteSeerX
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>An</strong> <strong>Object</strong> <strong>Oriented</strong> <strong>Multidimensional</strong> <strong>Data</strong> <strong>Model</strong> <strong>for</strong><br />
<strong>OLAP</strong><br />
Thanh Binh Nguyen, A Min Tjoa, and Roland Wagner<br />
Institute of Software Technology (E188), Vienna University of Technology<br />
Favoritenstrasse 9-11/188, A-1040 Vienna, Austria<br />
{binh,tjoa}@ifs.tuwien.ac.at<br />
Institute of Applied Knowledge Processing, University of Linz<br />
Altenberger Strasse 69, A-4040 Linz, Austria<br />
wagner@ifs.uni-linz.ac.at<br />
Abstract. Online <strong>An</strong>alytical Processing (<strong>OLAP</strong>) data is frequently organized in<br />
the <strong>for</strong>m of multidimensional data cubes each of which is used to examine a set<br />
of data values, called measures, associated with multiple dimensions and their<br />
multiple levels. In this paper, we first propose a conceptual multidimensional<br />
data model, which is able to represent and capture natural hierarchical<br />
relationships among members within a dimension as well as the relationships<br />
between dimension members and measure data values. Hereafter, dimensions<br />
and data cubes with their operators are <strong>for</strong>mally introduced. Afterward, we use<br />
UML (Unified <strong>Model</strong>ing Language) to model the conceptual multidimensional<br />
model in the context of object oriented databases.<br />
1. Introduction<br />
<strong>Data</strong> warehouses and <strong>OLAP</strong> are essential elements of decision support [5], they<br />
enable business decision makers to creatively approach, analyze and understand<br />
business problems [16]. While data warehouses are built to store very large amounts<br />
of integrated data used to assist the decision-making process [9], the concept of<br />
<strong>OLAP</strong>, which is first <strong>for</strong>mulated in 1993 by [6] to enable business decision makers to<br />
work with data warehouses, supports dynamic synthesis, analysis, and consolidation<br />
of large volumes of multidimensional data [7]. <strong>OLAP</strong> systems organize data using the<br />
multidimensional paradigm in the <strong>for</strong>m of data cubes, each of which is a combination<br />
of multiple dimensions with multiple levels per dimension. Summarized data is preaggregated<br />
and stored with the main purpose to explore the relationship between<br />
independent, static variables, dimensions, and dependent, dynamic variables,<br />
measures [3]. Moreover, dimensions always have structures and are linguistic<br />
categories that describe different ways of looking at the in<strong>for</strong>mation [4]. These<br />
dimensions contain one or more natural hierarchies, together with other attributes that<br />
do not have a hierarchy’s relationship to any of the attributes in the dimensions [10].<br />
Having and handling the predefined hierarchy or hierarchies within dimensions<br />
provide the foundation of two typical operations like rolling up and drilling down.<br />
Because unbalanced and multiple hierarchical structures (Fig. 1,2) are the common
structures of dimensions, the two current <strong>OLAP</strong> technologies, namely R<strong>OLAP</strong> and<br />
M<strong>OLAP</strong>, have limitations in the handling of dimensions with these structures [15].<br />
all<br />
All<br />
1999<br />
Year<br />
Q1.1999<br />
Quater<br />
Jan.1999<br />
Feb.1999<br />
Mar.1999<br />
W1.1999<br />
W5.1999 W9.1999<br />
Month<br />
Week<br />
Day<br />
1.Jan.1999 6.Jan.1999 1.Feb.1999 3.Feb.1999 3.Mar.1999<br />
Fig. 1. <strong>An</strong> instance of the dimension Time with unbalanced and<br />
multiple hierarchical structure<br />
Fig. 2. A schema of the<br />
dimension Time with<br />
multihierarchical<br />
structure.<br />
R<strong>OLAP</strong> (Relational <strong>OLAP</strong>) products are set on top of existing relational database<br />
management systems (RDBMS), which are well standardized and meet the needs of<br />
storing large amounts of data. Dimensions and facts are mapped into relational tables,<br />
called fact and dimension tables, organized as Star Schema and/or Snowflake Schema<br />
[10]. There<strong>for</strong>e in many cases, R<strong>OLAP</strong> products are not suitable <strong>for</strong> handling<br />
dimensions with multihierarchical and unbalanced structures. Furthermore, existing<br />
relational query languages (e.g. SQL) are not sufficiently powerful or flexible enough<br />
to support true <strong>OLAP</strong> capabilities [19]. [3] clearly demonstrated the mismatch<br />
between multidimensional operations and SQL.<br />
Although M<strong>OLAP</strong> (<strong>Multidimensional</strong> <strong>OLAP</strong>) easily supports dimensions with<br />
multiple and unbalanced hierarchical structures and M<strong>OLAP</strong> queries are very<br />
powerful and flexible in terms of <strong>OLAP</strong> processing [14], there are still several<br />
challenges <strong>for</strong> these products. First, the underlying data structures are limited in their<br />
ability to support multiple subject areas and to provide access to detailed data.<br />
Navigation and analysis of data is limited because the data is designed according to<br />
previously determined requirements [7]. In addition, with products that require<br />
complete pre-calculation, the dimensional explosion could result in physical database<br />
that is unmanageable [14].<br />
The first goal of this paper is the introduction of a conceptual multidimensional<br />
data model that facilitates a precise rigorous conceptualization <strong>for</strong> <strong>OLAP</strong>. First, the<br />
model is able to represent and capture natural hierarchical relationships among<br />
members within a dimension. There<strong>for</strong>e, dimensions with complex structures, such<br />
as: unbalanced and multihierarchical structures, can be handled. Moreover, the data<br />
model is able to represent the relationships between dimension members and measure<br />
data values by mean of cube cells. Hereafter, the data cubes, which are basic<br />
components in multidimensional data analysis, are <strong>for</strong>mally introduced. Furthermore,<br />
cube operators (e.g. jumping, rollingUp and drillingDown) are defined in a very<br />
elegant manner.
The second goal is the modeling of the conceptual multidimensional data model in<br />
term of classes by using UML. Based on the <strong>for</strong>mal representation of the class<br />
specifications in UML, the design and implementation of the data model <strong>for</strong> object<br />
oriented databases are straight<strong>for</strong>ward.<br />
The remainder of this paper is organized as follows. In section 2, we discuss about<br />
related works. Then in section 3, we introduce a conceptual data model that will be<br />
mapped into object-oriented database by means of UML in section 4. The paper<br />
concludes with section 5, which presents our current and future works.<br />
2. Related works<br />
Since Codd’s [6] <strong>for</strong>mulated the term Online <strong>An</strong>alytical Processing (<strong>OLAP</strong>) in 1993,<br />
many commercial products, like Arborsoft (now Hyperion) Essbase, Cognos<br />
Powerplay or MicroStrategy’s DSS Agent have been introduced on the market [2].<br />
But un<strong>for</strong>tunately, sound concepts were not available at the time of the commercial<br />
products being developed. The scientific community struggles hard to deliver a<br />
common basis <strong>for</strong> multidimensional data models ([1], [4], [8], [11], [12], [13], [21]).<br />
The data models presented so far differ in expressive power, complexity and<br />
<strong>for</strong>malism. In the followings, some research works in the field of data warehousing<br />
systems and <strong>OLAP</strong> tools are summarized.<br />
In [12] a multidimensional data model is introduced based on relational elements.<br />
Dimensions are modeled as “dimension relations”, practically annotating attributes<br />
with dimension names. The cubes are modeled as functions from the Cartesian<br />
product of the dimensions to the measure and are mapped to “grouping relations”<br />
through an applicability definition.<br />
In [8] n-dimensional tables are defined and a relational mapping is provided<br />
through the notation of completion. <strong>Multidimensional</strong> database are considered to be<br />
composed from set of tables <strong>for</strong>ming denormalized star schemata. Attribute<br />
hierarchies are modeled through the introduction of functional dependencies in the<br />
attributes of dimension tables.<br />
[4] modeled a multidimensional database through the notations of dimensions and<br />
f-tables. Dimensions are constructed from hierarchies of dimension levels, whereas f-<br />
tables are repositories <strong>for</strong> the factual data. <strong>Data</strong> are characterized from a set of roll-up<br />
functions, mapping the instance of a dimension level to instances of other dimension<br />
level.<br />
In statistical databases, [17] presented a comparison of work done in statistical and<br />
multidimensional databases. The comparison was made with respect to application<br />
areas, conceptual modeling, data structure representation, operations, physical<br />
organization aspects and privacy issues.<br />
In [3], a framework <strong>for</strong> <strong>Object</strong>-<strong>Oriented</strong> <strong>OLAP</strong> is introduced. Two major physical<br />
implementations exist today: R<strong>OLAP</strong> and M<strong>OLAP</strong> and their advantages and<br />
disadvantages due to physical implementation were introduced. The paper also<br />
presented another physical implementation called O3LAP model.<br />
[20] took the concepts and basic ideas of the classical multidimensional model<br />
based on the <strong>Object</strong>-<strong>Oriented</strong> paradigm. The basic elements of their <strong>Object</strong> <strong>Oriented</strong>
<strong>Multidimensional</strong> <strong>Model</strong> are dimension classes and fact classes. They also presented<br />
cube classes as the basic structure to allow a subsequent analysis of the data stored in<br />
the system.<br />
In this paper, we address a suitable mutidimensional data model <strong>for</strong> <strong>OLAP</strong>. The<br />
main contributions are: (a) the introduction of a <strong>for</strong>mal multidimensional data model;<br />
(b) the very elegant manners of definitions of three cube operators, namely jumping,<br />
rollingUp and drillingDown; (c) the modeling of the conceptual multidimensional<br />
data model in term of classes by using UML.<br />
3. A Conceptual <strong>Data</strong> <strong>Model</strong><br />
In our approach, a multidimensional data model is constructed based on a set of<br />
dimensions = { D 1 ,..,D x },<br />
x ∈ N<br />
M = M 1 ,.., M y , y ∈ and a<br />
D , a set of measures { } N<br />
set of data cubes C = { C 1 ,.., C z },<br />
z ∈ N<br />
. The following sections <strong>for</strong>mally introduce the<br />
descriptions of dimensions with their structures, measures and data cubes.<br />
3.1. The Concepts of Dimensions<br />
First, we introduce hierarchical relationships among dimension members by means of<br />
one hierarchical domain per dimension. A hierarchical domain is a set of dimension<br />
members, organized in hierarchy of levels, corresponding to different levels of<br />
granularity. It allows us to consider a dimension schema as a partially ordered set of<br />
levels. In this concept, a hierarchy is a path along the dimension schema, beginning at<br />
the root level and ending at a leaf level. Moreover, the recursive definitions of two<br />
dimension operators, namely ancestor and descendant, provide abilities to navigate<br />
along a dimension structure. In a consequence, dimensions with any complexity in<br />
their structures can be captured with this data model.<br />
Definition 3.1.1. [Dimension Hierarchical Domain] A hierarchical domain of a<br />
dimension D is a non-empty set and denoted by dom ( D) = { all}<br />
∪ { dm1,..,<br />
dm n },<br />
where:<br />
• Each dimension member dmi<br />
is a data item within a dimension. E.g. 1999,<br />
Q1.1999, Jan.1999, and 1.Jan.1999, etc are dimension members within the<br />
dimension Time (Fig. 1).<br />
• Such that the graph G<br />
M<br />
= ( V , E)<br />
, defined as the representation over the binary<br />
relation over the dom (D)<br />
, is a tree and defined as follows:<br />
V = dom(D) ,<br />
E ⊂ dom( D) × dom(D)<br />
. ∀( dmi<br />
, dm j ) ∈ E : dmi<br />
M dm j is an edge in G . The<br />
M<br />
edge is given when there is an ordered relationship in the sense of hierarchy.<br />
• <strong>An</strong>d the two operators {+,-}: ∀ dm i ∈ dom(D) :<br />
− ( dm i ) = { dm j ∈ dom(D)<br />
: dm j M dmi}
+ ( dm i ) = { dmk<br />
∈ dom(D)<br />
: dmi<br />
M dmk}<br />
• The all or root member: ( ∃ ! all ∈ dom(D))(<br />
¬∃dm<br />
∈ dom(D)<br />
: dm M all)<br />
.<br />
• Leaf members: ∀ dm ∈ dom(D))(<br />
¬∃dm<br />
∈ dom(D),<br />
i ≠ j : dm dm ) .<br />
( i<br />
j<br />
i M j<br />
Example: Figure 1 shows a representation in tree term of the dimension Time.<br />
Hereafter, we have:<br />
dom(Time)={all,1999,Q1.1999,..,3.Mar.1999},<br />
all M 1999,1999 M Q1.1999,...,Mar.1999 M 3.Mar.1999,<br />
-(1999)=all; +(1999)={Q1.1999,W1.1999,W5.1999,W9.1999}<br />
Definition 3.1.2. [Dimension Levels] Let Levels ( D) = All ∪ { l1 ,.., lh } , h ∈ N be a finite<br />
set of levels of a dimension D, where:<br />
• The collection of subsets { dom ( l 1 ),.., dom(<br />
l h )}<br />
is a partition of dom(D),<br />
• The All or root level: ∃ ! All ∈ Levels(D)<br />
: dom(<br />
All)<br />
= { all}<br />
,<br />
• Leaf levels: { l ∈ Levels(D)<br />
∀dm<br />
∈ dom(<br />
l ) : dm a leafmember}<br />
.<br />
i<br />
j<br />
Example: The dimension Time has six levels Levels(Time)={All,Year,Quarter,<br />
Month,Week,Day}. <strong>An</strong>d:<br />
dom(All)= {all}, dom(Year)={1999}, dom(Quarter)= {Q1.1999}<br />
dom(Month)= {Jan.1999,Feb.1999,Mar.1999},<br />
dom(Week)={W1.1999,W5.1999,W9.1999},<br />
dom(Day)= {1.Jan.1999,6.Jan.1999,1.Feb.1999,3.Feb.1999,3.Mar.1999}<br />
Definition 3.1.3. [Dimension Schema] A schema of a dimension D, denoted by<br />
DSchema(D)= Levels( D) , L , is a partially ordered set of levels:<br />
• Levels(D) is a finite set of dimension levels,<br />
• <strong>An</strong>d L is an ordered relation over the levels and satisfies the following<br />
condition:<br />
l l if ∃dm<br />
∈ dom(<br />
l )) and ( ∃dm<br />
∈ dom(<br />
l )) : dm dm .<br />
i<br />
L<br />
j<br />
( t i<br />
u<br />
j<br />
All All All<br />
i<br />
j<br />
t<br />
M<br />
u<br />
Category<br />
Country<br />
Year<br />
Type<br />
State<br />
Quarter<br />
Month<br />
Week<br />
Item<br />
Product<br />
City<br />
Geography<br />
Time<br />
Day<br />
Fig. 3. Schemas of three dimensions Product, Geography and Time<br />
Example: Figure 3 is used to describe schemas of three dimensions Product,<br />
Geography, and Time.<br />
DSchema(Product)={All L Category, Category L Type, Type L Item}
DSchema(Geography)={All L Country, Country L State, State L City}<br />
DSchema(Time)={All L Year ,Year L Quarter, Quarter L Month, Month L Day ,<br />
All L Year, Year L Week, Week L Day}<br />
Definition 3.1.4. [Dimension Path] A path within a dimension schema is a linear,<br />
totally ordered list of levels and can be defined as follows:<br />
∀ l , l ∈ Levels(D) :<br />
i<br />
j<br />
{<br />
li<br />
L l j}<br />
<br />
path(<br />
li,<br />
l j ) = i L t u<br />
<br />
<br />
φ<br />
{ l l ,.., l l }<br />
L<br />
j<br />
If l <br />
i<br />
L<br />
l<br />
j<br />
Else if ∃ lt<br />
,.., lu<br />
∈ Levels(D)<br />
: li<br />
L lt<br />
,.., lu<br />
L l j<br />
Definition 3.1.5. [Dimension Hierarchy] A hierarchy is defined by a path ( All,<br />
lleaf<br />
)<br />
within the schema of a dimension D. The path begins at All (root) level and ends at a<br />
leaf level.<br />
Let H ( D) = { h1 ,.., hm }, m∈ N be a set of hierarchies of a dimension D. If m=1 then<br />
the dimension has single hierarchical structure, else the dimension has<br />
multihierarchical structure.<br />
Definition 3.1.6. [Dimension Operators] Two dimension operators (DO), namely<br />
ancestor and descendant, are defined recursively as follows:<br />
∀ li<br />
, la,<br />
ld<br />
∈ Levels(D)<br />
and ∀ dm ∈ dom( li ) ⊂ dom(D)<br />
:<br />
undefined<br />
If ( path ( l a,<br />
l i ) = φ)<br />
<br />
−<br />
−<br />
dm ∈ dom(<br />
l ∈ −<br />
<br />
a ) : dm ( dm)<br />
Else If ( la<br />
L li<br />
)<br />
ancestor(<br />
dm,<br />
la<br />
,D) = <br />
−<br />
ancestor(<br />
dm , la<br />
,D), where :<br />
Else<br />
<br />
−<br />
<br />
dm = ancestor(<br />
dm,<br />
l p ,D) : l p L li<br />
undefined<br />
If ( path ( l i , l d ) = φ)<br />
<br />
+<br />
+<br />
{ dm ∈ dom(<br />
l ∈ +<br />
<br />
d ) : dm ( dm)}<br />
Else If ( li<br />
L ld<br />
)<br />
descendant(<br />
dm,<br />
ld<br />
,D) = <br />
+<br />
descendant(<br />
dm , ld<br />
,D),where :<br />
Else<br />
<br />
+<br />
dm<br />
∈ descentdant(<br />
dm,<br />
ln,D)<br />
: li<br />
L ln<br />
Example: ancestor(Q1.1999,Year,Time)=1999,<br />
descendant(Q1.1999,Month,Time)= {Jan.1999,Feb.1999,Mar.1999},<br />
Else<br />
3.2. The Concepts of Measures<br />
In this section we introduce measures, which are the objects of analysis in the context<br />
of multidimensional data model.
Definition 3.2.1. [Measure Schema] A schema of a measure M is a tuple<br />
MSchema ( M) = Fname,<br />
O , where:<br />
• Fname is a name of a corresponding fact,<br />
• O ∈ Ω ∪{NONE, COMPOSITE} is an operation type applied to a specific fact [2]:<br />
− Ω={SUM, COUNT, MAX, MIN} is a set of aggregation functions,<br />
− COMPOSITE is an operation (e.g. average), where measures cannot be<br />
utilized in order to automatically derive higher aggregations,<br />
− NONE measures are not aggregated. In this case, the measure is the fact.<br />
Definition 3.2.2. [Measure Domain] Let N be a numerical domain where a measure<br />
value is defined (e.g. N, Z, R). The domain of a measure M is a subset of N. We<br />
denote by dom (M) ⊂ N .<br />
3.3. The Concepts of <strong>Data</strong> Cubes<br />
A multidimensional cube is constructed based on a set dimensions and a set of<br />
measures, and consists a collection of cells. Each cell is an intersection among a set of<br />
dimension members and measure data values. Furthermore, cells are grouped into<br />
granular groupbys, each of which expresses a mapping from the domains of x-tuple of<br />
dimension levels (independent variables) to y-numerical domains of y-tuple of<br />
numeric measures (dependent variables).<br />
Product<br />
Mexico<br />
USA<br />
Alcoholic<br />
10<br />
Dairy<br />
50<br />
Beverage<br />
20<br />
Baked Food 12<br />
Meat<br />
15<br />
Seafood<br />
10<br />
Geography<br />
1 2 3 4 5 6<br />
Time<br />
Fig. 4. Sale cube includes dimensions: Geography, Product and Time and a fact: Sale amount.<br />
Given x dimensions D 1 ,.., D x , x ∈ N , and y measures M 1 ,.., M y , y ∈ N .<br />
Definition 3.3.1. [Cube Schema] A cube schema is tuple<br />
CSchema(C)= Cname , DSchemas,<br />
MSchemas :<br />
• Cname is the name of a cube,<br />
• DSchemas are the schemas of x dimensions, denoted by<br />
DSchemas =< DSchema( D1 ),.., DSchema(D<br />
x ) > ,<br />
• MSchemas are the schemas of y measures, denoted by<br />
MSchemas =< MSchema M ),.., MSchema(M<br />
) > .<br />
( 1 y
Definition 3.3.2. [Cube Domain] Given a function:<br />
f : dom(D 1)<br />
× .. × dom(Dx)<br />
× dom(M 1)<br />
× .. × dom(M<br />
y)<br />
→ { true,<br />
false}<br />
,<br />
A cube domain, denoted by { } N<br />
dom ( C) = c1 ,.., ck , k ∈ is determined as follows:<br />
dom (C) = c = ( dms,<br />
fms)<br />
dms ∈ dom(D<br />
) × .. × dom(D<br />
),<br />
{ 1 x<br />
fms ∈ dom( M 1)<br />
× .. × dom(M<br />
y ) : f ( dms,<br />
fms)<br />
= true}<br />
3.4. Operating Group by, Rolling Up, Drilling Down in our model<br />
Let a cube C be constituted from x dimensions<br />
D 1<br />
,.., D x , x ∈ N<br />
, and y measures<br />
M 1 ,.., M y , y ∈ N . We define groupby and three operators, namely jumping,<br />
rollingUp and drillingDown as follows:<br />
Definition 3.4.1. [Groupby] A groupby is triple G= Gname , GSchema(G),<br />
dom(G)<br />
where:<br />
• Gname is the name of this groupby,<br />
• GSchema(G)= GLevels ( G), GMSchemas(G)<br />
:<br />
GLevels( G) =< lD 1<br />
,.., lD<br />
>∈ Levels(D1)<br />
× .. × Levels(D<br />
x ) is a x-tuple of levels of<br />
x<br />
the x dimensions D 1 ,.., D x , x ∈ N .<br />
GMSchemas ( G) =< MSchema(M 1),..,<br />
MSchema(M<br />
y ) > is a y-tuple of measure<br />
schemas of the y measures M 1 ,.., M y , y ∈ N .<br />
• dom ( G) = { c =< dms,<br />
fms >∈dom(C)<br />
| dms ∈ dom(<br />
lD 1<br />
) × .. × dom(<br />
lD<br />
x<br />
),<br />
fms ∈ dom M ) × .. × dom(M<br />
)}<br />
( 1 y<br />
Let h i be a number of levels of each dimension D i (1≤i≤x). The total set of<br />
x<br />
p ∏h i<br />
i=<br />
1<br />
groupbys over a cube C is defined as Groupbys( C) = {G1,..,G<br />
}, p = [18].<br />
Definition 3.4.2. [Cube Operators] The three basic navigational cube operators (CO),<br />
namely jumping, rollingUp and drillingDown, which are applied to navigate along a<br />
data cube C, corresponding to a dimension D i , are defined as follows:<br />
Given a current groupby G c , associated with a level l c of a dimension D, i and<br />
three other levels l j , lr<br />
, ld<br />
∈ Levels(Di<br />
) .<br />
• Jumping:<br />
jumping G , l , D ) = G =< GLevels(G<br />
), GMSchemas(G<br />
) ><br />
( c j i j<br />
j<br />
j<br />
Where:<br />
GMSchemas ( G j ) = GMSchemas(G<br />
c),<br />
GLevels ( G j )( i)<br />
= l j,<br />
GLevels( G j )( k)<br />
= GLevels(Gc)(<br />
k),<br />
∀k<br />
≠ i.<br />
• Rolling Up: ∀ dm ∈ dom( l c ) , Gr<br />
= jumping(Gc,<br />
lr<br />
, Di<br />
) :
sub<br />
sub sub<br />
rollingUp(Gc,<br />
dm,<br />
lr<br />
,Di<br />
) = Gr<br />
=< GSchema(Gr<br />
), dom(Gr<br />
) ><br />
Where:<br />
sub<br />
GSchema (Gr<br />
) = GSchema(Gr<br />
) ,<br />
sub<br />
dom( Gr<br />
) = { cr<br />
∈ dom(Gr<br />
) | ∃c<br />
∈ dom(Gc<br />
) : c . dms(<br />
i)<br />
= dm,<br />
c dms(<br />
i)<br />
= ancestor(<br />
dm,<br />
, D ) , . dms(<br />
j)<br />
= c.<br />
dms(<br />
j),<br />
∀j<br />
≠ i}<br />
r. l r i c r<br />
dm ∈ dom( l c Gd<br />
c d i<br />
sub<br />
sub<br />
sub<br />
drillingDown(Gc,<br />
dm,<br />
ld<br />
, Di<br />
) = G d =< GSchema(G<br />
d ), dom(G<br />
d ) ><br />
• Drilling Down: ∀ ) , = jumping(G<br />
, l , D ) :<br />
Where:<br />
sub<br />
GSchema (Gd<br />
) = GSchema(Gd<br />
),<br />
sub<br />
dom( Gd<br />
) = { cd<br />
∈ dom(Gd<br />
) | ∃c<br />
∈ dom(Gc)<br />
: c . dms(<br />
i)<br />
= dm,<br />
c dms(<br />
i)<br />
∈ descendant(<br />
dm,<br />
,D ), . dms(<br />
j)<br />
= c.<br />
dms(<br />
j),<br />
∀j<br />
≠ i}<br />
d . l d i<br />
c d<br />
4. <strong>Model</strong>ing <strong>Multidimensional</strong> <strong>Data</strong> <strong>Model</strong><br />
In this section, UML is used to model dimensions, measures and data cubes in context<br />
of an object oriented data model. All conceptual components, which are introduced in<br />
section 3, are mapped as classes. Figure 5 illustrates the modeling <strong>for</strong> our data model<br />
in term of class diagrams by using UML.<br />
4.1. The <strong>Model</strong>ing of Dimensions<br />
The dimension concepts, such as: dimension members, levels, dimension schemas,<br />
hierarchy and dimension, are modeled in term of class diagrams by using UML. First,<br />
a hierarchical domain of dimension members within a dimension is handled by means<br />
of the DMember class. The two dimension member operators {+} and {-} are mapped<br />
into the two methods getFathers and getChildren, built-in every instance of this class.<br />
Hereafter, The Level, DSchema, Hierarchy classes are defined to describe the<br />
concepts of level, dimension schema and dimension hierarchy. Afterwards, each<br />
instance of the Dimension class describes a dimension. The dimension operators,<br />
namely ancestor and descendant are mapped as two methods with the same names.<br />
4.1.1. DMember. Each instance of this class describes a dimension member within a<br />
hierarchical domain of a dimension.<br />
• Attributes:<br />
description (String) - That is a data item within a dimension.<br />
• Relationships:<br />
fathers (Set) - A set of referred DMember objects.<br />
children (Set) - A set of referred DMember objects<br />
• Main methods:<br />
getFathers() (return Set) - This method describes the operator {-}.<br />
getChildren() (returns Set) - This method describes the operator {+}.
HasChild<br />
+Father<br />
0..*<br />
Description : String;<br />
DMember<br />
setDescription(descM : String) : void;<br />
getDescription() : string;<br />
addChild(aCM : DMember) : boolean;<br />
removeChild(rCM : DMember) : boolean;<br />
getChildren() : set of DMember;<br />
addFather(aFM : DMember) : boolean;<br />
removeFather(rFM : DMember) : boolean;<br />
getFathers() : set of DMember;<br />
HasFather<br />
+Child 0..*<br />
+Child<br />
1..*<br />
+Father<br />
1..*<br />
refers to<br />
Cell<br />
addDMember(newDM : DMember) : boolean;<br />
removeDMember(index : int) : boolean;<br />
getDMember(index : int) : DMember;<br />
getDMem bers () : s et of D Mem ber;<br />
addMValue(newMV : MeasureValue) : boolean;<br />
removeMValue(index : int) : boolean;<br />
getMValue(index : int) : MeasureValue;<br />
getMValues() : set of MeasureValue;<br />
1..*<br />
contains<br />
IntergerV alue<br />
value : int;<br />
1..*<br />
value : type;<br />
setValue(newV : int) : void;<br />
getValue() : int;<br />
MeasureValue<br />
setValue(newV : Type) : void;<br />
getValue() : Type;<br />
value : float;<br />
floatValue<br />
setValue(newV : float) : void;<br />
getValue() : float;<br />
MS chem a<br />
Fname : String;<br />
AggFunction : String;<br />
1..*<br />
contains<br />
contains<br />
Gname : String;<br />
GSc hema<br />
contains<br />
MSchemas<br />
Lname : String;<br />
Level<br />
setLname(lname : String) : void;<br />
getLname() : String;<br />
addDM(m : DMember) : boolean;<br />
removeDM(m : DMember) : boolean;<br />
getDM(descM : String) : DMember;<br />
getDMs() : s et of DMember;<br />
1..*<br />
refers to<br />
1..*<br />
1..*<br />
1..*<br />
refers to<br />
contains<br />
refers to<br />
getGname() : String;<br />
setGname(newGname : String) : void;<br />
updateMSchemas(newMSs : MSchemas) : boolean;<br />
addLevels(aLevel : Level)<br />
rem oveLevel(index int) : Level;<br />
Dname : String;<br />
DSchema<br />
addLevel(addlevel : Level) : boolean;<br />
removeLevel(lname : String) : boolean;<br />
getLevel(lname : String) : Level;<br />
1..*<br />
contains<br />
Groupby<br />
setGSchema(gschema : GSchema) : void;<br />
getGSchema() : GSchema;<br />
addCell(c : Cell) : boolean;<br />
removeCell(c : Cell) : boolean;<br />
getCell(Dname : String, dm : DMember) : cell;<br />
contains<br />
refers to<br />
1..*<br />
refers to<br />
addMSchem a(ms : MSchema) : boolean;<br />
rem oveMSchema(ms : MSchema) : boolean;<br />
rem oveMSchema(index : int) : boolean;<br />
getMSchema(index : int) : MSchem a;<br />
Cname : String;<br />
contains<br />
CSchema<br />
getCname() : String;<br />
setCname(newName : String) : void;<br />
updateMSchemas(newMS : MSchemas) : void;<br />
updateDSchemas(newDM : DSchem as) : void;<br />
contains<br />
contains<br />
Dimension<br />
Bas icGroupby : Groupby;<br />
Cube<br />
Hname : String;<br />
Hierarchy<br />
Hierarchy() : void;<br />
setHname(hname : String) : void;<br />
insertLevel(position : int, l : Level) : boolean;<br />
removeLevel(lname : String) : boolean;<br />
getLevel(lname : String) : Level;<br />
getPreLevel(lname : String) : Level;<br />
getAfterLevel(lname : String) : Level;<br />
getLevels() : set of Level;<br />
1..*<br />
contains<br />
setDSchema(newDS : DSchema) : void;<br />
getDSchema() : DSchema;<br />
addHierarchy(newH : Hierarchy) : boolean;<br />
rem oveHierarchy(hname : String) : boolean;<br />
getHierarchy(hname : String) : Hierarchy;<br />
getHierarchies () : set of Hierarchy;<br />
addLevel(l : Level) : boolean;<br />
rem oveLevel(lname : String) : boolean;<br />
getLevel(lname : String) : Level;<br />
getLevels() : set of Level;<br />
ances tor(cDM : DMember, lname : String) : DMember;<br />
descendant(cDM : DMember, lname : String) : set of DMember;<br />
1..*<br />
refers to<br />
setCSchema(new CS : CubeSchem a) : void;<br />
getCSchema() : CubeSchema;<br />
addGroupby(newG : Groupby) : boolean;<br />
rem oveGroupby(gnam e : String) : boolean;<br />
getGroupby(gnam e : String) : Groupby;<br />
getGroupbyNames() : set of String;<br />
addDim ension(newD : Dimension) : boolean;<br />
rem oveDim(dnam e : String) : boolean;<br />
getDim(dnam e : String) : Dimension;<br />
getDimens ions () : set of Dimens ion;<br />
jum ping(lnam e : String, dnam e : String) : void;<br />
rollingUp(cDM : DMem ber, lname : String, dname : String) : void;<br />
drillingDown(cDM : DMember, lnam e : String, dnam e : String) : void;<br />
Fig.5. The modeling of the object oriented multidimensional data model
4.1.2. Level. Each instance of the Level class describes a level of a dimension.<br />
Attributes:<br />
dmembers (Set) - A set of contained DMember objects.<br />
4.1.3. DSchema. Each instance of the DSchema describes a dimension schema.<br />
• Attributes:<br />
dname (String) - That is a dimension name.<br />
• Relationships:<br />
levels (Set) - A set of Level objects.<br />
4.1.4. Hierarchy. Each instance of the Hierarchy class describes a hierarchy of levels<br />
within a dimension.<br />
• Attributes:<br />
dname (String) - That is a hierarchy name<br />
• Relationships:<br />
levels (Set) - A set of Level objects<br />
4.1.5. Dimension. Each instance of this class describes a dimension.<br />
• Attributes:<br />
dschema (DSchema) - A DSchema object.<br />
hierarchies (Set) - A set of Hierarchy objects.<br />
levels (Set) - A set of Level objects.<br />
• Main methods:<br />
ancestor(DMember cDM;lname:String) (returns DMember): ancestor operator.<br />
descendant(DMember cDM,lname:String) (Set): descendant operator.<br />
4.2. The <strong>Model</strong>ing of Measures<br />
Measure schemas and measure data values are mapped into classes. First, an<br />
MSchema describes a measure schema. Afterwards, The MValue is an abstract type<br />
that serves as a common super type <strong>for</strong> measure values. It is obvious that the two<br />
classes, namely intMValue and floatValue, are subclasses of MValue.<br />
4.2.1. MSchema. Each instance of this class describes a measure schema.<br />
• Attributes:<br />
fname (String) - That is a fact name.<br />
aggFunction (String)- <strong>An</strong> aggregation function, such as Max, Min, Sum, Count,<br />
None and Composite.<br />
4.2.2. MValue. MValue is an abstract type that serves as common super type <strong>for</strong><br />
measure values.<br />
Attributes:<br />
value (Type) – That describes a measure value.<br />
4.2.3. intMValue. The intMValue is a subclass of MValue. Each instance of this class<br />
describes an integer measure value.<br />
• Specializes:<br />
MValue: The intMValue class inherits from MValue class
• Attributes:<br />
value (int) – The value is overridden as integer.<br />
4.2.4. floatMValue. The floatMValue is a subclass of MValue. Each instance of this<br />
class describes a float measure value.<br />
• Specializes:<br />
MValue: The floatMValue class inherits from MValue class<br />
• Attributes:<br />
value (float) – The value is overridden as float.<br />
4.3. The <strong>Model</strong>ing of <strong>Multidimensional</strong> Components<br />
First, each instance of the CSchema, which contains an x-tuple of DSchemas and a y-<br />
tuple of MSchemas, describes a cube schema. Then, a Cell refers to x DMembers of x<br />
dimensions and y MValues of y measures. In addition, a GSchema contains x Levels of<br />
x dimensions and the y-tuple MSchemas. Afterwards, a Groupby contains a GSchema<br />
and a subset of Cells. As a consequence, a Cube contains a CSchema and a<br />
BasicGroupby (Groupby), is associated with a set of Dimensions, and a set of<br />
Groupbys. The three methods, i.e. jumping, rollingUp and drillingDown, are the<br />
mappings of the three operators with the same names.<br />
4.3.1. CSchema. Each instance of the CSchema describes a cube schema.<br />
• Attributes:<br />
dschemas (Set) - A set of x DSchema objects,<br />
mschemas (Set) - A set of y MSchema objects.<br />
4.3.2. Cell. Each instance of this class describes a cube cell.<br />
• Relationships:<br />
dmembers (Set) - A set of x DMember objects.<br />
mvalues (Set) - A set of y MValue objects.<br />
4.3.3. GSchema. Each instance of the GSchema class describes a groupby schema.<br />
• Relationships:<br />
levels (Set) - A set of x Level objects.<br />
mschemas (Set) - A set of y MSchema objects.<br />
4.3.4. Groupby. Each instance of this class describes a groupby.<br />
• Attributes:<br />
gschema (GSchema): A GSchema object that describes a groupby schema.<br />
cells (Set): A set of Cell objects.<br />
4.3.5. Cube. Each instance of the Cube class describes a data cube.<br />
• Attributes:<br />
cschema (CSchema) – A CSchema describes a cube schema of a cube.<br />
basicgroupby (Groupby) - A Groupby object.<br />
• Relationships:<br />
dimensions (Set) - Dimensions express <strong>for</strong> x dimensions of a cube.<br />
groupbies (Set) – A set of Groupby objects,
• Main methods:<br />
jumping(lname:String;dname:String) (void) - jumping operator.<br />
rollingUp(cDM:DMember;lname:String;dname:String) (void) - rollingUp operator.<br />
drillingDown(cDM:DMember;lname:String;dname:String) (void) - drillingDown<br />
operator.<br />
5. Conclusion and future works<br />
In this paper, we have introduced the conceptual multidimensional data model, which<br />
facilitates even sophisticated constructs based on multidimensional data units or<br />
members such as dimension members, measure data values and then cells. The model<br />
is able to represent and capture natural hierarchical relationships among dimension<br />
members. Dimensions with complexity of their structures, such as: unbalanced and<br />
multihierarchical structures, can be modeled in an elegant and consistent way.<br />
Moreover, the data model represents the relationships between dimension members<br />
and measure data values by mean of cube cells. In consequence, data cubes, which are<br />
basic components in multidimensional data analysis, and their operators are <strong>for</strong>mally<br />
introduced in a very elegant manner. We have also proposed a modeling of the<br />
conceptual multidimensional data model in term of classes by means of UML, which<br />
is an object oriented standard analysis and design notation.<br />
In context of future works, we are investigating two approaches <strong>for</strong><br />
implementation: pure object-oriented orientation and object-relational approach. With<br />
the first model, dimensions and cube are mapped into an object-oriented database in<br />
term of classes. In the other alternative, dimensions, measure schema, and cube<br />
schema are grouped into a term of metadata, which will be mapped into objectoriented<br />
database in term of classes. Some useful methods built in those classes are<br />
used to give the required Ids within those dimensions. The given Ids will be joined to<br />
the fact table, which is implemented in relational database.<br />
Acknowledgment<br />
This work is partly supported by the EU-4 th Framework Project GOAL (Geographic<br />
In<strong>for</strong>mation Online <strong>An</strong>alysis)-EU Project #977071 and by the ASEAN European<br />
Union Academic Network (ASEA-Uninet), Project EZA 894/98.<br />
References<br />
1. Agrawal, R., Gupta, A., Sarawagi, A.: <strong>Model</strong>ing <strong>Multidimensional</strong> <strong>Data</strong>bases. IBM<br />
Research Report, IBM Almaden Research Center, September 1995.<br />
2. Albrecht, J., Guenzel, H., Lehner, W.: Set-Derivability of Multidimensiona Aggregates.<br />
First International Conference on <strong>Data</strong> Warehousing and Knowledge Discovery. DaWaK'99,<br />
Florence, Italy, August 30 - September 1.
3. Buzydlowski, J. W., Song, II-Y., Hassell, L.: A Framework <strong>for</strong> <strong>Object</strong>-<strong>Oriented</strong> On-Line<br />
<strong>An</strong>alytic Processing. D<strong>OLAP</strong> 1998<br />
4. Cabibbo, L., Torlone, R.: A Logical Approach to <strong>Multidimensional</strong> <strong>Data</strong>bases. EDBT 1998<br />
5. Chaudhuri, S., Dayal, U.: <strong>An</strong> Overview of <strong>Data</strong> Warehousing and <strong>OLAP</strong> Technology.<br />
SIGMOD Record Volume 26, Number 1, September 1997.<br />
6. Codd, E. F., Codd, S.B., Salley, C. T.: Providing <strong>OLAP</strong> (On-Line <strong>An</strong>alytical Processing) to<br />
user-analysts: <strong>An</strong> IT mandate. Technical report, 1993.<br />
7. Connolly, T., Begg, C.: <strong>Data</strong>base system: a practical approach to design, implementation,<br />
and management. Addison-Wesley Longman, Inc., 1999.<br />
8. Gyssens, M., Lakshmanan, L.V.S.: A foundation <strong>for</strong> multi-dimensional databases, Proc.<br />
VLDB'97.<br />
9. Hurtado, C., Mendelzon, A., Vaisman, A.: Maintaining <strong>Data</strong> Cubes under Dimension<br />
Updates. Proc IEEE/ICDE '99.<br />
10.Kimball, R.: The <strong>Data</strong> Warehouse Lifecycle Toolkit. John Wiley & Sons, Inc., 1998.<br />
11.Lehner, W.: <strong>Model</strong>ing Large Scale <strong>OLAP</strong> Scenarios. 6th International Conference on<br />
Extending <strong>Data</strong>base Technology (EDBT'98), Valencia, Spain, 23-27, March 1998.<br />
12.Li, C., Wang, X.S.: A <strong>Data</strong> <strong>Model</strong> <strong>for</strong> Supporting On-Line <strong>An</strong>alytical Processing. CIKM<br />
1996.<br />
13.Mangisengi, O., Tjoa, A M., Wagner, R.R.: <strong>Multidimensional</strong> <strong>Model</strong>ling Approaches <strong>for</strong><br />
<strong>OLAP</strong>. Proceedings of the Ninth International <strong>Data</strong>base Conference “Heterogeneous and<br />
Internet <strong>Data</strong>bases” 1999, ISBN 962-937-046-8. Ed. J. Fong, Hong Kong, 1999<br />
14.McGuff, F., Kador, J.: Developing <strong>An</strong>alytical <strong>Data</strong>base Applications. Prentice Hall PTR,<br />
1999.<br />
15.Nguyen, T.B., Tjoa, A M., Wagner, R.R.: Conceptual <strong>Object</strong> <strong>Oriented</strong> <strong>Multidimensional</strong><br />
<strong>Data</strong> <strong>Model</strong> <strong>for</strong> <strong>OLAP</strong>. Technical Report, IFS, Vienna 1999.<br />
16.Samtani, S., Mohania, M.K., Kumar, V., Kambayashi, Y.: Recent Advances and Research<br />
Problems in <strong>Data</strong> Warehousing. ER Workshops 1998.<br />
17.Shoshani, A.: <strong>OLAP</strong> and Statistical <strong>Data</strong>bases: Similarities and Differences. Tutorials of<br />
PODS 1997.<br />
18.Shukla A., Deshpande, P., Naughton, J. F., Ramasamy, K.: Storage Estimation <strong>for</strong><br />
<strong>Multidimensional</strong> Aggregates in the Presence of Hierarchies. VLDB 1996: 522-531<br />
19.Thomsen, E.: <strong>OLAP</strong> solutions: Building <strong>Multidimensional</strong> In<strong>for</strong>mation Systems. John<br />
Wiley& Sons, Inc., 1997.<br />
20.Trujillo, J., Palomar, M.: <strong>An</strong> <strong>Object</strong> <strong>Oriented</strong> Approach to <strong>Multidimensional</strong> <strong>Data</strong>base.<br />
Conceptual <strong>Model</strong>ing (OOMD), D<strong>OLAP</strong> 1998.<br />
21.Vassiliadis, P.: <strong>Model</strong>ing <strong>Multidimensional</strong> <strong>Data</strong>bases, Cubes and Cube operations. In Proc.<br />
10th Scientific and Statistical <strong>Data</strong>base Management Conference (SSDBM '98), Capri, Italy,<br />
June 1998.<br />
22.Wang, M., Iyer, B.: Efficient roll-up and drill-down analysis in relational database. In 1997<br />
SIGMOD Workshop on Research Issues on <strong>Data</strong> Mining and Knowledge Discovery, 1997.