Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

More documents

Recommendations

Info

82 5. Pre-Aggregation Support Beyond Basic Aggregate Operations 5.3 Pre-Aggregates Selection Pre-aggregating all distinct scaling operations in the workload is not always possible because of space limitations. This is similar to the problem of selecting views for materialization in OLAP. One approach for finding the optimal set of scaling operations to pre-compute consists of enumerating all possible combinations and finding the one that yields the minimum average query cost, or the maximum benefit. Finding the optimal set of pre-aggreates in this way has a complexity of O(2 n ) where n is the number of queries in the workload. If the number of scaling operations on a given raster object is 50, there are 2 50 possible pre-aggregates for that object. Therefore, computing the optimal set of aggregates exhaustively is not feasible. In fact, it is an NP-hard problem [92, 17]. We therefore consider the selection of pre-aggregates as an optimization problem where the input includes multidimensional datasets, a query workload, and an upper bound on available disk space. The output is a set of queries that minimizes the total cost of evaluating the query workload depending on the storage limit. We present an algorithm that uses the benefit per unit space of a scaling operation. We model the expected queries by a query workload, which is a set of scaling operations: Q = {q i |0 to 1: ( n∑ q i ) (5.5) i=1 Based on this setup we study different workload patterns. The PRE-AGGREGATESSELECTION procedure returns a set P = {p i |0 to be pre-aggregated. Input is a workload Q and a storage space constraint S. The workload contains a number of queries, each corresponding to a scaling operation as defined in Eq. 5.1. Frequency, storage space, and benefit per unit space are calculated for each distinct query in the workload. When calculating the benefit, we assume that each query is evaluated using the root (top) node, which is the first selected pre-aggregate, p 1 . The second chosen pre-aggregate p 2 is the one with highest benefit per unit space. The algorithm recalculates the benefit of each scaling operation given that they are computed either from the root, if the scaling operation is above p 1 , or from p 2 otherwise. Subsequent selections are performed in a similar manner. The benefit is recalculated each time a scaling operation is selected for pre-aggregation. The algorithm stops selecting pre-aggregates when the storage space constraint is reached, or when there are no more queries in the workload to be considered for pre-aggregation, i.e., all scaling operations in the workload have already been selected for pre-aggregation. The function highestBenefit(Q) returns the scaling operation with highest benefit per unit space in Q. Complexity of the algorithm is O(k · n 2 ) (k is the number
5.4 Answering Scaling Operations Using Pre-Aggregated Data 83 Algorithm 3 PRE-AGGREGATESSELECTION Require: A workload Q, and a storage space constraint c 1: P = {top scaling operation} 2: while (c > 0 and |P | != |Q| ) do 3: p = highestBenefit(Q, P ) 4: if (c - |p| > 0) then 5: c = c - |p| 6: P = P ∪ p 7: end if 8: else c = 0 9: return P of selected pre-aggregates and n is the number of vertices in the lattice), which arises from the cost of sorting the pre-aggregates by benefit per unit size. 5.3.1 Complexity Analysis Let m be the number of queries in the lattice. Suppose we have no queries selected except for the top query, which is mandatory. The time to answer a given query in the workload is the time taken to compute the query using the top query and calculating it according to our cost model. We denote this time by T o . Suppose that in addition to the top query, we choose a set of queries P . Denote the average time to answer a query by T p . The benefit of the set of queries P is the reduction in average time to answer a query, that is, T o − T p . Thus, minimizing the average time to answer a query is equivalent to maximizing the benefit of a set of queries. Let p 1 , p 2 , ..., p k be the k queries selected by the PRE-AGGREGATESSELECTION algorithm. Let b i be the benefit achieved by the selection of p i , for i = 1, 2, ..., k. That is, b i is the benefit of p i , with respect to the set consisting of the top query and p 1 , p 2 , ..., p i−1 . Let P = p 1 , p 2 , ..., p k . Let O = o 1 , o 2 , ..., o k be an optimal set of k queries, i.e., those queries giving the maximum benefit. Let m i be the benefit achieved by the selection of o i , for i = 1, 2, ..., k. That is, m i is the benefit of o i , with respect to the set consisting of the top query and o 1 , o 2 , ..., o i−1 . Harinarayan et al [92] proved that the benefit of the greedy algorithm can never be less than (e-1)/e = 0.63 times the benefit of the optimum choice of pre-aggregated queries. 5.4 Answering Scaling Operations Using Pre-Aggregated Data We say that a pre-aggregate p answers query q if there exists some other query q ′ which when executed on the result of p, provides the result of q. The result can be either exact with respect to q (q ′ ◦ p ≡ q), or only an approximation (q ′ ◦ p ≈ q). In practice, the result is often an approximation because of the effect of resampling the original dataset. The same effect is observed in the traditional image pyramids
Page 1:
Applying OLAP Pre-Aggregation Techn
Page 5 and 6:
Acknowledgments I would like to exp
Page 7 and 8:
Abstract Large multidimensional arr
Page 9 and 10:
Contents 1 Introduction and Problem
Page 11 and 12:
List of Figures 2.1 3D Array . . .
Page 13 and 14:
List of Tables 3.1 UNO and FAO Suit
Page 15 and 16:
Chapter 1 Introduction and Problem
Page 17 and 18:
Relevant and complementary question
Page 19 and 20:
1.2 Publications Related to this Th
Page 21 and 22:
Chapter 2 Background and Related Wo
Page 23 and 24:
2.1 Array Databases 17 Figure 2.2 s
Page 25 and 26:
2.1 Array Databases 19 toward the s
Page 27 and 28:
2.1 Array Databases 21 • Bilinear
Page 29 and 30:
2.1 Array Databases 23 given image
Page 31 and 32:
2.2 On-Line Analytical Processing (
Page 33 and 34:
Page 35 and 36:
Page 37 and 38: 2.2 On-Line Analytical Processing (
Page 39 and 40: 2.3 Discussion 33 spatial-vector da
Page 41 and 42: 2.3 Discussion 35 • Both applicat
Page 43 and 44: Chapter 3 Fundamental Geo-Raster Op
Page 45 and 46: 3.2 Geo-Raster Operations 39 3.1.2
Page 47 and 48: 3.2 Geo-Raster Operations 41 multip
Page 49 and 50: 3.2 Geo-Raster Operations 43 Table
Page 51 and 52: 3.2 Geo-Raster Operations 45 turn i
Page 53 and 54: 3.2 Geo-Raster Operations 47 (a) Or
Page 55 and 56: 3.2 Geo-Raster Operations 49 Query
Page 57 and 58: 3.2 Geo-Raster Operations 51 contai
Page 59 and 60: 3.2 Geo-Raster Operations 53 is the
Page 61 and 62: 3.2 Geo-Raster Operations 55 3.2.4
Page 63 and 64: 3.2 Geo-Raster Operations 57 As in
Page 65 and 66: 3.2 Geo-Raster Operations 59 Local
Page 67 and 68: 3.3 Summary 61 Slicing The slicing
Page 69 and 70: Chapter 4 Answering Basic Aggregate
Page 71 and 72: 4.1 Framework 65 pre-aggregated res
Page 73 and 74: 4.2 Cost Model 67 By partitioning t
Page 75 and 76: 4.2 Cost Model 69 Cost of independe
Page 77 and 78: 4.3 Implementation 71 Algorithm 1 Q
Page 79 and 80: 4.4 Experimental Results 73 Query E
Page 81 and 82: 4.5 Summary 75 pre-aggregates: inde
Page 83 and 84: Chapter 5 Pre-Aggregation Support B
Page 85 and 86: 5.2 Conceptual Framework 79 Figure
Page 87: 5.2 Conceptual Framework 81 Benefit
Page 91 and 92: 5.5 Experimental Results 85 Algorit
Page 93 and 94: 5.5 Experimental Results 87 (a) Que
Page 95 and 96: 5.5 Experimental Results 89 (a) Sel
Page 97 and 98: 5.5 Experimental Results 91 vectors
Page 99 and 100: 5.5 Experimental Results 93 root no
Page 101 and 102: 5.5 Experimental Results 95 Figure
Page 107 and 108: 5.6 Summary 101 we considered non-u
Page 109 and 110: Chapter 6 Conclusion One of the big
Page 111 and 112: 6.1 Future Work 105 more non-spatio
Page 113 and 114: Bibliography [1] Blakeley J. A., La
Page 115 and 116: BIBLIOGRAPHY 109 [22] Moon B., Vega
Page 117 and 118: BIBLIOGRAPHY 111 [47] ESRI Inc. Arc
Page 119 and 120: BIBLIOGRAPHY 113 [73] Stefanovic N.
Page 121: BIBLIOGRAPHY 115 [97] Kotidis Y. an
show all

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

Create successful ePaper yourself

Delete template?

Save as template?