11.03.2014 Views

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

70 4. Answering Basic Aggregate Queries Using <strong>Pre</strong>-Aggregated Data<br />

Cost of aggregating sub-partitions of the closest dominant pre-aggregate<br />

The cost C agg can be calculated as follows:<br />

|SP |<br />

∑<br />

C agg (p cd ) = C dec (p cd ) + C r (s i ), (4.12)<br />

where C dec is the cost of decomposing p cd in<strong>to</strong> a set SP of sub-partitions, and C r is the<br />

cost of aggregating each resulting sub-partition s ∈ SP from raw data.<br />

4.3 Implementation<br />

This section describes the application of a query optimization technique that transforms<br />

an input query written in terms of arrays so that it can be executed faster using<br />

pre-aggregated data. The query processing module of an array database management<br />

system (RasDaMan) has been extended with our pre-aggregation framework for query<br />

rewriting, and has been implemented as part of the optimization and evaluation phases.<br />

As discussed earlier in this chapter, there are two problems related <strong>to</strong> the computation<br />

of an aggregate query using pre-aggregated data. First, we must find all pre-aggregates<br />

that can be used <strong>to</strong> compute an aggregate query, including those that provide partial<br />

answers. Next, from all candidate pre-aggregates, we must find the one that minimizes<br />

the execution time (or cost) for computing the query. Our solution is based on an existing<br />

approach for answering queries using views in <strong>OLAP</strong> applications. Halevy et<br />

al. [95] showed that all possible rewritings of a query can be obtained by considering<br />

containment mappings from the bodies of the views <strong>to</strong> the body of the query. They<br />

also showed that such characterization is a NP-complete problem.<br />

The QUERYCOMPUTATION procedure returns the result of a query or an execution<br />

plan for a given query Q. An execution plan is an indica<strong>to</strong>r of the kind of data that<br />

must be used <strong>to</strong> compute the query. It returns a raw indica<strong>to</strong>r if the query must be<br />

computed from the original data. Other valid indica<strong>to</strong>rs include IP AS, OP AS, and<br />

DP AS, which indicate that the query will be answered using one or more partial<br />

pre-aggregates.<br />

The input of the algorithm is a query tree Q t of an aggregate query. The algorithm<br />

first verifies if the conditions for a PERFECT-MATCHING between the query and the<br />

pre-aggregated queries are satisfied. If a perfect-matching is found, it returns the result<br />

of the pre-aggregated query. Otherwise, the algorithm verifies if the conditions for a<br />

PARTIALMATCHING between the query and set of pre-aggregate queries are satisfied.<br />

Then, the algorithm makes use of our cost model <strong>to</strong> determine the cost of using preaggregates<br />

that satisfy partial-matching conditions for the computation of the query,<br />

and the cost of computing the query using the original data. Finally, the algorithm<br />

picks the plan with least cost in terms of execution time. The algorithm makes use of<br />

the following auxiliary procedures:<br />

• DECOMPOSEQUERY(Q t ) examines the nodes of the query tree Q t and generates<br />

a standardized representation S qt that can be manipulated via SQL statements.<br />

i=0

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!