27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Gray et al. [2] present the problems of the standard SQL<br />

group by operator and defines the data cube operator, which<br />

generalizes histogram, cross-tabulation, roll-up, drill down,<br />

and sub-total constructs. An N-dimensional cube is a set<br />

of points, which are the aggregate of a particular set of<br />

attribute values. And given that computing multidimensional<br />

aggregates is the performance bottleneck for OLAP,<br />

Agarwal et al. [3] proposed faster algorithms for computing<br />

sets of group bys, specially the cube operator. Extending<br />

sort-based and hash-based grouping methods with optimizations<br />

like using pre-computed group by for computing<br />

others group-by.<br />

Harinarayan et al. [4] investigated the issue of which<br />

cells from the data cube to materialize when working with<br />

large data cubes. They used a lattice framework to express<br />

dependencies among views, and a greedy algorithm<br />

to determine a good set of these views to be materialized,<br />

based on the tradeoffs between the space used and the average<br />

time to answer a query on a proposed benchmarking<br />

database.<br />

The standard method for optimizing OLAP queries execution<br />

is often precomputing some of the queries into subcubes,<br />

and then to build indexes on theses summary tables.<br />

Grupta et al. [5] were pioneers in proposing automation of<br />

the selection of summary tables and its indexes with nearoptimal<br />

algorithms. Going further, and using a logical reconstruction<br />

of multidimensional schema design, multidimensional<br />

forms were proposed by [6] to ensure summarizability<br />

within the whole application schema, achieving<br />

sparsity reduction of the underlying data cube and reasoning<br />

about the quality of conceptual data warehouse schema.<br />

The application of production rules in OLAP systems<br />

have not been deeply explored, with only a few recent papers<br />

being published. Vasilecas et al. [7] proposes a backward<br />

chaining approach to transform production rules into<br />

executable MDX instructions 1 , representing rules in XML<br />

format and automating decisions and decision support according<br />

to the internal and external influences. Prat et al. [8]<br />

argue that multidimensional models poorly represent aggregation<br />

knowledge, and propose production rules to better<br />

represent this knowledge of how aggregation may be performed<br />

on a given data cube based on the additive nature<br />

(non-, semi-, fully-additive) of the attributes involved, and<br />

minimizing the risk of introducing errors during aggregation,<br />

since this errors may accumulate consequently leading<br />

into awful analysis.<br />

3. Foundations<br />

The proposed method relies on using production rules to<br />

replace query languages, anticipating the summarization of<br />

1 MDX is a standard query language for OLAP.<br />

data into a dimensional structure, providing a subset of all<br />

OLAP features. OLAP was created as a database processing<br />

solution to solve issues not tackled by relational database<br />

systems. Production rules are a way to represent knowledge<br />

about reasoning on data. In this section, we present<br />

a general introduction to these two foundational concepts,<br />

before discussing our solution.<br />

3.1. OLAP<br />

OLAP applications help analysts and executives to gain<br />

insight into the performance of an enterprise through fast<br />

access to a wide variety of data, organized to reflect the<br />

multidimensional nature of the information [9]. However,<br />

this type of analysis can not be done directly from operational<br />

databases [1], given that they are prepared only for<br />

on-line transactional process. Performing OLAP operations<br />

requires detaching data from database, transforming, integrating<br />

and loading it into a multidimensional one.<br />

Multidimensional databases are used in data warehousing<br />

to support OLAP operations and separate structural aspects<br />

and contents. Relational database technology is better<br />

suited for transaction management and ad hoc querying<br />

[10]. Data warehouses maintain data from operational<br />

databases, transformed and integrated accordingly to the<br />

demands of analysts. The data is usually organized into<br />

a relational model of multidimensional data, called a star<br />

schema [11], shown in Figure 1, where tables are separated<br />

into dimension tables, which contains identifying information<br />

about the dimensions themselves, and a fact table, that<br />

correlates dimension and information of interest.<br />

Dimension Table<br />

Dimension Table<br />

Dimension Table<br />

Fact Table<br />

Dimension Table<br />

Dimension Table<br />

Figure 1. In the star schema dimension tables<br />

contains identifying information about the dimensions<br />

itself and the fact table correlates<br />

them and informations of interest to analysts<br />

The conversion of data from a relational database into a<br />

multidimensional schema is not an easy task, and the procedure<br />

normally has to be assisted by an expert in business<br />

modeling. Some organizations opt for data marts instead,<br />

713

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!