11.03.2014 Views

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.4 Answering Scaling Operations Using <strong>Pre</strong>-Aggregated Data 83<br />

Algorithm 3 PRE-AGGREGATESSELECTION<br />

Require: A workload Q, and a s<strong>to</strong>rage space constraint c<br />

1: P = {<strong>to</strong>p scaling operation}<br />

2: while (c > 0 and |P | != |Q| ) do<br />

3: p = highestBenefit(Q, P )<br />

4: if (c - |p| > 0) then<br />

5: c = c - |p|<br />

6: P = P ∪ p<br />

7: end if<br />

8: else c = 0<br />

9: return P<br />

of selected pre-aggregates and n is the number of vertices in the lattice), which arises<br />

from the cost of sorting the pre-aggregates by benefit per unit size.<br />

5.3.1 Complexity Analysis<br />

Let m be the number of queries in the lattice. Suppose we have no queries selected<br />

except for the <strong>to</strong>p query, which is manda<strong>to</strong>ry. The time <strong>to</strong> answer a given query in the<br />

workload is the time taken <strong>to</strong> compute the query using the <strong>to</strong>p query and calculating<br />

it according <strong>to</strong> our cost model. We denote this time by T o . Suppose that in addition<br />

<strong>to</strong> the <strong>to</strong>p query, we choose a set of queries P . Denote the average time <strong>to</strong> answer a<br />

query by T p . The benefit of the set of queries P is the reduction in average time <strong>to</strong><br />

answer a query, that is, T o − T p . Thus, minimizing the average time <strong>to</strong> answer a query<br />

is equivalent <strong>to</strong> maximizing the benefit of a set of queries.<br />

Let p 1 , p 2 , ..., p k be the k queries selected by the PRE-AGGREGATESSELECTION<br />

algorithm. Let b i be the benefit achieved by the selection of p i , for i = 1, 2, ..., k.<br />

That is, b i is the benefit of p i , with respect <strong>to</strong> the set consisting of the <strong>to</strong>p query and<br />

p 1 , p 2 , ..., p i−1 . Let P = p 1 , p 2 , ..., p k .<br />

Let O = o 1 , o 2 , ..., o k be an optimal set of k queries, i.e., those queries giving<br />

the maximum benefit. Let m i be the benefit achieved by the selection of o i , for i =<br />

1, 2, ..., k. That is, m i is the benefit of o i , with respect <strong>to</strong> the set consisting of the <strong>to</strong>p<br />

query and o 1 , o 2 , ..., o i−1 .<br />

Harinarayan et al [92] proved that the benefit of the greedy algorithm can never<br />

be less than (e-1)/e = 0.63 times the benefit of the optimum choice of pre-aggregated<br />

queries.<br />

5.4 Answering Scaling Operations Using <strong>Pre</strong>-Aggregated Data<br />

We say that a pre-aggregate p answers query q if there exists some other query q ′<br />

which when executed on the result of p, provides the result of q. The result can be<br />

either exact with respect <strong>to</strong> q (q ′ ◦ p ≡ q), or only an approximation (q ′ ◦ p ≈ q).<br />

In practice, the result is often an approximation because of the effect of resampling<br />

the original dataset. The same effect is observed in the traditional image pyramids

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!