Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

More documents

Recommendations

Info

104 6. Conclusion query using pre-aggregated data is influenced by the structural characteristics of the query and the pre-aggregate. Thus, by comparing query tree structures between the two, one can determine if the pre-aggregated result contributes fully or partially to the final answer of the query. The best case occurs when there is full-matching between the query and the pre-aggregate, since the time taken to compute the query is reduced to the time it takes to retrieve the result. However, in the case of partialmatching, several pre-aggregates can be considered for computing the answer of a query. The decision has to be made, therefore, as to which pre-aggregates provide the best performance in terms of execution time. To this end, we distinguished between different pre-aggregates and presented a cost-model to calculate the cost of using each qualifying pre-aggregate. Then we presented an algorithm that selects the best execution plan for evaluating a query considering pre-aggregated data. Tests performed on real-life raster image datasets showed that our distinction between different types of pre-aggregates is useful to determine the pre-aggregate providing the highest benefit (in terms of execution time) for computing a given query. We then described the issues of attempting to generalize our pre-aggregation framework to support more complex aggregate operations, and justified our decision to focus on one particular operation: scaling. Traditionally, 2D scaling operations have been performed using image pyramids. Practice shows that pyramids are typically constructed in scale levels of powers of 2, thus yielding scale vectors 2, 4, 6, 8, 16, 32, 64, 128, 256, and 512. The materialization of the pyramid requires an estimated 33% additional storage space. Our pre-aggregation selection algorithm is similar to the pyramid approach in that it selects a set of queries for materialization, where each level corresponds to a scaling operation with a defined scale factor. However, the selection of such queries is not restricted to a fixed number of levels interleveled by a power of two. Instead, our selection algorithm considers the frequency of each query in the workload, and how the results of each individual query can help to reduce the overall cost of computing the workload. We compared the performance of our pre-aggregation algorithm against that of image pyramids: results showed that for workloads with scale vectors uniformly distributed our algorithm computes the workload 36% cheaper than image pyramids, and requires 7% additional space than image pyramids. For scale vectors following a Poisson distribution, our algorithm computes the workload at a cost 55% cheaper than when using the pyramids approach. Further, our algorithm can be applied to datasets of higher dimensions, a feature not supported by traditional image pyramids. 6.1 Future Work There are natural extensions to this work that would help expand and strengthen the results. One area of further work is in adding self-management capabilities so that the DBMS maintains statistics about each scaling operation appearing within the incoming queries and, at some suitable time, adjust the pre-aggregate set accordingly. OLAP dynamic pre-aggregation addresses a similar problem. Another area is in applying the results studied here to the many real-world situations where data cubes contain one or
6.1 Future Work 105 more non-spatio-temporal dimensions, such as pressure, which is common in meteorological and oceanographic data sets. Workload distribution deserves further investigation. While the distributions chosen are practical and relevant, there might be further situations worth considering. Gaining empirical figures from user-exposed services like EarthLook 1 can be useful to tune our pre-aggregation selection algorithms. Further investigation is also necessary in the realm of rewriting scaling operations. In OLAP applications, there is a trade-off between speed and accuracy. But accuracy may be critical for certain Georaster applications, so solutions to the query rewriting problem must weight these two aspects according to user data analysis requirements. Moreover, it must consider the fact that the same dataset may be accessed by various users with totally different analysis needs. 1 www.earthlook.org
Page 1:
Applying OLAP Pre-Aggregation Techn
Page 5 and 6:
Acknowledgments I would like to exp
Page 7 and 8:
Abstract Large multidimensional arr
Page 9 and 10:
Contents 1 Introduction and Problem
Page 11 and 12:
List of Figures 2.1 3D Array . . .
Page 13 and 14:
List of Tables 3.1 UNO and FAO Suit
Page 15 and 16:
Chapter 1 Introduction and Problem
Page 17 and 18:
Relevant and complementary question
Page 19 and 20:
1.2 Publications Related to this Th
Page 21 and 22:
Chapter 2 Background and Related Wo
Page 23 and 24:
2.1 Array Databases 17 Figure 2.2 s
Page 25 and 26:
2.1 Array Databases 19 toward the s
Page 27 and 28:
2.1 Array Databases 21 • Bilinear
Page 29 and 30:
2.1 Array Databases 23 given image
Page 31 and 32:
2.2 On-Line Analytical Processing (
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
2.3 Discussion 33 spatial-vector da
Page 41 and 42:
2.3 Discussion 35 • Both applicat
Page 43 and 44:
Chapter 3 Fundamental Geo-Raster Op
Page 45 and 46:
3.2 Geo-Raster Operations 39 3.1.2
Page 47 and 48:
3.2 Geo-Raster Operations 41 multip
Page 49 and 50:
3.2 Geo-Raster Operations 43 Table
Page 51 and 52:
3.2 Geo-Raster Operations 45 turn i
Page 53 and 54:
3.2 Geo-Raster Operations 47 (a) Or
Page 55 and 56:
3.2 Geo-Raster Operations 49 Query
Page 57 and 58:
3.2 Geo-Raster Operations 51 contai
Page 59 and 60: 3.2 Geo-Raster Operations 53 is the
Page 61 and 62: 3.2 Geo-Raster Operations 55 3.2.4
Page 63 and 64: 3.2 Geo-Raster Operations 57 As in
Page 65 and 66: 3.2 Geo-Raster Operations 59 Local
Page 67 and 68: 3.3 Summary 61 Slicing The slicing
Page 69 and 70: Chapter 4 Answering Basic Aggregate
Page 71 and 72: 4.1 Framework 65 pre-aggregated res
Page 73 and 74: 4.2 Cost Model 67 By partitioning t
Page 75 and 76: 4.2 Cost Model 69 Cost of independe
Page 77 and 78: 4.3 Implementation 71 Algorithm 1 Q
Page 79 and 80: 4.4 Experimental Results 73 Query E
Page 81 and 82: 4.5 Summary 75 pre-aggregates: inde
Page 83 and 84: Chapter 5 Pre-Aggregation Support B
Page 85 and 86: 5.2 Conceptual Framework 79 Figure
Page 87 and 88: 5.2 Conceptual Framework 81 Benefit
Page 89 and 90: 5.4 Answering Scaling Operations Us
Page 91 and 92: 5.5 Experimental Results 85 Algorit
Page 93 and 94: 5.5 Experimental Results 87 (a) Que
Page 95 and 96: 5.5 Experimental Results 89 (a) Sel
Page 97 and 98: 5.5 Experimental Results 91 vectors
Page 99 and 100: 5.5 Experimental Results 93 root no
Page 101 and 102: 5.5 Experimental Results 95 Figure
Page 107 and 108: 5.6 Summary 101 we considered non-u
Page 109: Chapter 6 Conclusion One of the big
Page 113 and 114: Bibliography [1] Blakeley J. A., La
Page 115 and 116: BIBLIOGRAPHY 109 [22] Moon B., Vega
Page 117 and 118: BIBLIOGRAPHY 111 [47] ESRI Inc. Arc
Page 119 and 120: BIBLIOGRAPHY 113 [73] Stefanovic N.
Page 121: BIBLIOGRAPHY 115 [97] Kotidis Y. an
show all

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?