Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

More documents

Recommendations

Info

70 4. Answering Basic Aggregate Queries Using Pre-Aggregated Data Cost of aggregating sub-partitions of the closest dominant pre-aggregate The cost C agg can be calculated as follows: |SP | ∑ C agg (p cd ) = C dec (p cd ) + C r (s i ), (4.12) where C dec is the cost of decomposing p cd into a set SP of sub-partitions, and C r is the cost of aggregating each resulting sub-partition s ∈ SP from raw data. 4.3 Implementation This section describes the application of a query optimization technique that transforms an input query written in terms of arrays so that it can be executed faster using pre-aggregated data. The query processing module of an array database management system (RasDaMan) has been extended with our pre-aggregation framework for query rewriting, and has been implemented as part of the optimization and evaluation phases. As discussed earlier in this chapter, there are two problems related to the computation of an aggregate query using pre-aggregated data. First, we must find all pre-aggregates that can be used to compute an aggregate query, including those that provide partial answers. Next, from all candidate pre-aggregates, we must find the one that minimizes the execution time (or cost) for computing the query. Our solution is based on an existing approach for answering queries using views in OLAP applications. Halevy et al. [95] showed that all possible rewritings of a query can be obtained by considering containment mappings from the bodies of the views to the body of the query. They also showed that such characterization is a NP-complete problem. The QUERYCOMPUTATION procedure returns the result of a query or an execution plan for a given query Q. An execution plan is an indicator of the kind of data that must be used to compute the query. It returns a raw indicator if the query must be computed from the original data. Other valid indicators include IP AS, OP AS, and DP AS, which indicate that the query will be answered using one or more partial pre-aggregates. The input of the algorithm is a query tree Q t of an aggregate query. The algorithm first verifies if the conditions for a PERFECT-MATCHING between the query and the pre-aggregated queries are satisfied. If a perfect-matching is found, it returns the result of the pre-aggregated query. Otherwise, the algorithm verifies if the conditions for a PARTIALMATCHING between the query and set of pre-aggregate queries are satisfied. Then, the algorithm makes use of our cost model to determine the cost of using preaggregates that satisfy partial-matching conditions for the computation of the query, and the cost of computing the query using the original data. Finally, the algorithm picks the plan with least cost in terms of execution time. The algorithm makes use of the following auxiliary procedures: • DECOMPOSEQUERY(Q t ) examines the nodes of the query tree Q t and generates a standardized representation S qt that can be manipulated via SQL statements. i=0
4.3 Implementation 71 Algorithm 1 QUERYCOMPUTATION Require: A query tree Q t , a set of k number of pre-aggregate queries P 1: initialize R = 0, key = false 2: S qt = decomposeQuery(Q t ) 3: key = perfectMatching(S qt , P ) 4: if key then 5: R = fetchResult(key) 6: return R; 7: end if 8: if !key then 9: plan = partialMatching(S qt , P ) 10: return plan; 11: end if • PERFECTMATCHING(S qt ) compares a standardized representation of the query tree S qt against existing k number of pre-aggregates. The output is the corresponding key of the matched pre-aggregated query. A null value is returned if no perfect matching is found. • FETCHRESULT(key) retrieves the result R of the pre-aggregated query identified by key. The algorithm PARTIALMATCHING identifies an aggregate sub-expression in a query tree Q t , and finds pre-aggregated queries satisfying conditions 1, 2 and 3, but not condition 4 as defined in section 4.1.2. It considers the use of pre-aggregates that partially contribute to the answer of a query sub-expression that are either independent, overlapped, or dominant. The algorithm calculates the cost of using each pre-aggregate for computing the query, and returns an indicator of the type of query providing the least cost. The aggregateOp() procedure compares a node n of a given query tree Q t against a list of pre-defined aggregate operations, e.g, add cells, count cells, avg cells, max cells, and min cells. If the node matches any such operation, it returns a true value. The getSubtree() procedure receives as parameter a query tree Q t and a pointer to an aggregate node. If the aggregate node has children, it creates a subtree Q ′ where the root node corresponds to the aggregate node. The findP reaggregate() procedure receives as parameters an aggregate operation op, an object identifier ro, and a spatial domain sd. It then determines if the values of these parameters match those of any existing pre-aggregate. If a match is found, the result of the matched pre-aggregate is returned. The findIpasP reaggregates() procedure receives as a parameter a subtree Q ′ and verifies if any pre-aggregates satisfy conditions 1, 2 and 3 as defined in section 4.1.2 for equivalence between a query and a pre-aggregate. For those pre-aggregates
Page 1:
Applying OLAP Pre-Aggregation Techn
Page 5 and 6:
Acknowledgments I would like to exp
Page 7 and 8:
Abstract Large multidimensional arr
Page 9 and 10:
Contents 1 Introduction and Problem
Page 11 and 12:
List of Figures 2.1 3D Array . . .
Page 13 and 14:
List of Tables 3.1 UNO and FAO Suit
Page 15 and 16:
Chapter 1 Introduction and Problem
Page 17 and 18:
Relevant and complementary question
Page 19 and 20:
1.2 Publications Related to this Th
Page 21 and 22:
Chapter 2 Background and Related Wo
Page 23 and 24:
2.1 Array Databases 17 Figure 2.2 s
Page 25 and 26: 2.1 Array Databases 19 toward the s
Page 27 and 28: 2.1 Array Databases 21 • Bilinear
Page 29 and 30: 2.1 Array Databases 23 given image
Page 31 and 32: 2.2 On-Line Analytical Processing (
Page 39 and 40: 2.3 Discussion 33 spatial-vector da
Page 41 and 42: 2.3 Discussion 35 • Both applicat
Page 43 and 44: Chapter 3 Fundamental Geo-Raster Op
Page 45 and 46: 3.2 Geo-Raster Operations 39 3.1.2
Page 47 and 48: 3.2 Geo-Raster Operations 41 multip
Page 49 and 50: 3.2 Geo-Raster Operations 43 Table
Page 51 and 52: 3.2 Geo-Raster Operations 45 turn i
Page 53 and 54: 3.2 Geo-Raster Operations 47 (a) Or
Page 55 and 56: 3.2 Geo-Raster Operations 49 Query
Page 57 and 58: 3.2 Geo-Raster Operations 51 contai
Page 59 and 60: 3.2 Geo-Raster Operations 53 is the
Page 61 and 62: 3.2 Geo-Raster Operations 55 3.2.4
Page 63 and 64: 3.2 Geo-Raster Operations 57 As in
Page 65 and 66: 3.2 Geo-Raster Operations 59 Local
Page 67 and 68: 3.3 Summary 61 Slicing The slicing
Page 69 and 70: Chapter 4 Answering Basic Aggregate
Page 71 and 72: 4.1 Framework 65 pre-aggregated res
Page 73 and 74: 4.2 Cost Model 67 By partitioning t
Page 75: 4.2 Cost Model 69 Cost of independe
Page 79 and 80: 4.4 Experimental Results 73 Query E
Page 81 and 82: 4.5 Summary 75 pre-aggregates: inde
Page 83 and 84: Chapter 5 Pre-Aggregation Support B
Page 85 and 86: 5.2 Conceptual Framework 79 Figure
Page 87 and 88: 5.2 Conceptual Framework 81 Benefit
Page 89 and 90: 5.4 Answering Scaling Operations Us
Page 91 and 92: 5.5 Experimental Results 85 Algorit
Page 93 and 94: 5.5 Experimental Results 87 (a) Que
Page 95 and 96: 5.5 Experimental Results 89 (a) Sel
Page 97 and 98: 5.5 Experimental Results 91 vectors
Page 99 and 100: 5.5 Experimental Results 93 root no
Page 101 and 102: 5.5 Experimental Results 95 Figure
Page 107 and 108: 5.6 Summary 101 we considered non-u
Page 109 and 110: Chapter 6 Conclusion One of the big
Page 111 and 112: 6.1 Future Work 105 more non-spatio
Page 113 and 114: Bibliography [1] Blakeley J. A., La
Page 115 and 116: BIBLIOGRAPHY 109 [22] Moon B., Vega
Page 117 and 118: BIBLIOGRAPHY 111 [47] ESRI Inc. Arc
Page 119 and 120: BIBLIOGRAPHY 113 [73] Stefanovic N.
Page 121: BIBLIOGRAPHY 115 [97] Kotidis Y. an
show all

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?