Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University
Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University
Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
5.4 Answering Scaling Operations Using <strong>Pre</strong>-Aggregated Data 83<br />
Algorithm 3 PRE-AGGREGATESSELECTION<br />
Require: A workload Q, and a s<strong>to</strong>rage space constraint c<br />
1: P = {<strong>to</strong>p scaling operation}<br />
2: while (c > 0 and |P | != |Q| ) do<br />
3: p = highestBenefit(Q, P )<br />
4: if (c - |p| > 0) then<br />
5: c = c - |p|<br />
6: P = P ∪ p<br />
7: end if<br />
8: else c = 0<br />
9: return P<br />
of selected pre-aggregates and n is the number of vertices in the lattice), which arises<br />
from the cost of sorting the pre-aggregates by benefit per unit size.<br />
5.3.1 Complexity Analysis<br />
Let m be the number of queries in the lattice. Suppose we have no queries selected<br />
except for the <strong>to</strong>p query, which is manda<strong>to</strong>ry. The time <strong>to</strong> answer a given query in the<br />
workload is the time taken <strong>to</strong> compute the query using the <strong>to</strong>p query and calculating<br />
it according <strong>to</strong> our cost model. We denote this time by T o . Suppose that in addition<br />
<strong>to</strong> the <strong>to</strong>p query, we choose a set of queries P . Denote the average time <strong>to</strong> answer a<br />
query by T p . The benefit of the set of queries P is the reduction in average time <strong>to</strong><br />
answer a query, that is, T o − T p . Thus, minimizing the average time <strong>to</strong> answer a query<br />
is equivalent <strong>to</strong> maximizing the benefit of a set of queries.<br />
Let p 1 , p 2 , ..., p k be the k queries selected by the PRE-AGGREGATESSELECTION<br />
algorithm. Let b i be the benefit achieved by the selection of p i , for i = 1, 2, ..., k.<br />
That is, b i is the benefit of p i , with respect <strong>to</strong> the set consisting of the <strong>to</strong>p query and<br />
p 1 , p 2 , ..., p i−1 . Let P = p 1 , p 2 , ..., p k .<br />
Let O = o 1 , o 2 , ..., o k be an optimal set of k queries, i.e., those queries giving<br />
the maximum benefit. Let m i be the benefit achieved by the selection of o i , for i =<br />
1, 2, ..., k. That is, m i is the benefit of o i , with respect <strong>to</strong> the set consisting of the <strong>to</strong>p<br />
query and o 1 , o 2 , ..., o i−1 .<br />
Harinarayan et al [92] proved that the benefit of the greedy algorithm can never<br />
be less than (e-1)/e = 0.63 times the benefit of the optimum choice of pre-aggregated<br />
queries.<br />
5.4 Answering Scaling Operations Using <strong>Pre</strong>-Aggregated Data<br />
We say that a pre-aggregate p answers query q if there exists some other query q ′<br />
which when executed on the result of p, provides the result of q. The result can be<br />
either exact with respect <strong>to</strong> q (q ′ ◦ p ≡ q), or only an approximation (q ′ ◦ p ≈ q).<br />
In practice, the result is often an approximation because of the effect of resampling<br />
the original dataset. The same effect is observed in the traditional image pyramids