11.03.2014 Views

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

104 6. Conclusion<br />

query using pre-aggregated data is influenced by the structural characteristics of the<br />

query and the pre-aggregate. Thus, by comparing query tree structures between the<br />

two, one can determine if the pre-aggregated result contributes fully or partially <strong>to</strong><br />

the final answer of the query. The best case occurs when there is full-matching between<br />

the query and the pre-aggregate, since the time taken <strong>to</strong> compute the query is<br />

reduced <strong>to</strong> the time it takes <strong>to</strong> retrieve the result. However, in the case of partialmatching,<br />

several pre-aggregates can be considered for computing the answer of a<br />

query. The decision has <strong>to</strong> be made, therefore, as <strong>to</strong> which pre-aggregates provide the<br />

best performance in terms of execution time. To this end, we distinguished between<br />

different pre-aggregates and presented a cost-model <strong>to</strong> calculate the cost of using each<br />

qualifying pre-aggregate. Then we presented an algorithm that selects the best execution<br />

plan for evaluating a query considering pre-aggregated data. Tests performed on<br />

real-life raster image datasets showed that our distinction between different types of<br />

pre-aggregates is useful <strong>to</strong> determine the pre-aggregate providing the highest benefit<br />

(in terms of execution time) for computing a given query.<br />

We then described the issues of attempting <strong>to</strong> generalize our pre-aggregation framework<br />

<strong>to</strong> support more complex aggregate operations, and justified our decision <strong>to</strong> focus<br />

on one particular operation: scaling. Traditionally, 2D scaling operations have<br />

been performed using image pyramids. Practice shows that pyramids are typically<br />

constructed in scale levels of powers of 2, thus yielding scale vec<strong>to</strong>rs 2, 4, 6, 8, 16, 32, 64,<br />

128, 256, and 512. The materialization of the pyramid requires an estimated 33% additional<br />

s<strong>to</strong>rage space. Our pre-aggregation selection algorithm is similar <strong>to</strong> the pyramid<br />

approach in that it selects a set of queries for materialization, where each level corresponds<br />

<strong>to</strong> a scaling operation with a defined scale fac<strong>to</strong>r. However, the selection of<br />

such queries is not restricted <strong>to</strong> a fixed number of levels interleveled by a power of two.<br />

Instead, our selection algorithm considers the frequency of each query in the workload,<br />

and how the results of each individual query can help <strong>to</strong> reduce the overall cost<br />

of computing the workload. We compared the performance of our pre-aggregation algorithm<br />

against that of image pyramids: results showed that for workloads with scale<br />

vec<strong>to</strong>rs uniformly distributed our algorithm computes the workload 36% cheaper than<br />

image pyramids, and requires 7% additional space than image pyramids. For scale<br />

vec<strong>to</strong>rs following a Poisson distribution, our algorithm computes the workload at a<br />

cost 55% cheaper than when using the pyramids approach. Further, our algorithm<br />

can be applied <strong>to</strong> datasets of higher dimensions, a feature not supported by traditional<br />

image pyramids.<br />

6.1 Future Work<br />

There are natural extensions <strong>to</strong> this work that would help expand and strengthen the<br />

results. One area of further work is in adding self-management capabilities so that the<br />

DBMS maintains statistics about each scaling operation appearing within the incoming<br />

queries and, at some suitable time, adjust the pre-aggregate set accordingly. <strong>OLAP</strong><br />

dynamic pre-aggregation addresses a similar problem. Another area is in applying the<br />

results studied here <strong>to</strong> the many real-world situations where data cubes contain one or

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!