Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University
Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University
Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
104 6. Conclusion<br />
query using pre-aggregated data is influenced by the structural characteristics of the<br />
query and the pre-aggregate. Thus, by comparing query tree structures between the<br />
two, one can determine if the pre-aggregated result contributes fully or partially <strong>to</strong><br />
the final answer of the query. The best case occurs when there is full-matching between<br />
the query and the pre-aggregate, since the time taken <strong>to</strong> compute the query is<br />
reduced <strong>to</strong> the time it takes <strong>to</strong> retrieve the result. However, in the case of partialmatching,<br />
several pre-aggregates can be considered for computing the answer of a<br />
query. The decision has <strong>to</strong> be made, therefore, as <strong>to</strong> which pre-aggregates provide the<br />
best performance in terms of execution time. To this end, we distinguished between<br />
different pre-aggregates and presented a cost-model <strong>to</strong> calculate the cost of using each<br />
qualifying pre-aggregate. Then we presented an algorithm that selects the best execution<br />
plan for evaluating a query considering pre-aggregated data. Tests performed on<br />
real-life raster image datasets showed that our distinction between different types of<br />
pre-aggregates is useful <strong>to</strong> determine the pre-aggregate providing the highest benefit<br />
(in terms of execution time) for computing a given query.<br />
We then described the issues of attempting <strong>to</strong> generalize our pre-aggregation framework<br />
<strong>to</strong> support more complex aggregate operations, and justified our decision <strong>to</strong> focus<br />
on one particular operation: scaling. Traditionally, 2D scaling operations have<br />
been performed using image pyramids. Practice shows that pyramids are typically<br />
constructed in scale levels of powers of 2, thus yielding scale vec<strong>to</strong>rs 2, 4, 6, 8, 16, 32, 64,<br />
128, 256, and 512. The materialization of the pyramid requires an estimated 33% additional<br />
s<strong>to</strong>rage space. Our pre-aggregation selection algorithm is similar <strong>to</strong> the pyramid<br />
approach in that it selects a set of queries for materialization, where each level corresponds<br />
<strong>to</strong> a scaling operation with a defined scale fac<strong>to</strong>r. However, the selection of<br />
such queries is not restricted <strong>to</strong> a fixed number of levels interleveled by a power of two.<br />
Instead, our selection algorithm considers the frequency of each query in the workload,<br />
and how the results of each individual query can help <strong>to</strong> reduce the overall cost<br />
of computing the workload. We compared the performance of our pre-aggregation algorithm<br />
against that of image pyramids: results showed that for workloads with scale<br />
vec<strong>to</strong>rs uniformly distributed our algorithm computes the workload 36% cheaper than<br />
image pyramids, and requires 7% additional space than image pyramids. For scale<br />
vec<strong>to</strong>rs following a Poisson distribution, our algorithm computes the workload at a<br />
cost 55% cheaper than when using the pyramids approach. Further, our algorithm<br />
can be applied <strong>to</strong> datasets of higher dimensions, a feature not supported by traditional<br />
image pyramids.<br />
6.1 Future Work<br />
There are natural extensions <strong>to</strong> this work that would help expand and strengthen the<br />
results. One area of further work is in adding self-management capabilities so that the<br />
DBMS maintains statistics about each scaling operation appearing within the incoming<br />
queries and, at some suitable time, adjust the pre-aggregate set accordingly. <strong>OLAP</strong><br />
dynamic pre-aggregation addresses a similar problem. Another area is in applying the<br />
results studied here <strong>to</strong> the many real-world situations where data cubes contain one or