Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

More documents

Recommendations

Info

10 1. Introduction and Problem Statement the contents of the BLOB when they wish to operate on the data. The main drawback to this approach is that it either requires the entire array to be passed to the client, or it requires that the client perform a large number of BLOB input/output (I/O) operations to read only the required portions of the array. With databases growing beyond a few tens of terabytes, the analysis of large volumes of array datasets is severely limited by the relatively low I/O performance of most of todays computing platforms. Highperformance numerical simulations are also increasingly feeling the I/O bottleneck. To improve data management and analytics on large repositories of data, aggregation has been put forward as a key process when describing high-level data. An example of data aggregation is the computation and storage of statistical parameters, such as count, average, median, and standard deviation. Aggregate computation has been studied in a variety of settings [4, 21, 66]. In particular, On-Line Analytical Processing (OLAP) technology has emerged to address the problem of efficiently computing complex multidimensional aggregate queries on large data warehouses. Most OLAP systems rely on the process of selecting aggregate combinations, and then precomputing and storing their results so the database system can make use of them in subsequent requests. Such a process is known as pre-aggregation, which has proved to speed up aggregate queries by several orders of magnitude in business and statistical applications [31, 41]. While considerable work has been done on the problem of efficiently computing aggregate queries in OLAP-based applications, such computations continue to be a data management challenge in scientific applications. A relevant example in which the use of advanced data management and efficient query processing are highly desirable is hyper-spectral remote-sensing imaging, in which an image spectrometer collects hundreds or even thousands of measurements for the same area of the surface of the Earth. The scenes provided by such sensors are often called data cubes to denote the dimensionality of the data. Notably, efficient query processing and data mining techniques facilitate exploration of spatio-temporal data patterns, both interactively as well as in batch on archived data. A significant fraction of scientific data is image-based and can be naturally represented in multidimensional arrays. These datasets fit poorly into relational databases, which lack efficient support for the concepts of physical proximity and order. They are typically stored in array-friendly formats such as HDF5, netCDF, or FITS. The extremely high computational requirements introduced by image-based scientific applications make them an excellent case study for our research. Since array databases and OLAP/data warehousing both deal with large multidimensional datasets and aggregate queries, adapting OLAP pre-aggregation techniques to the management and computation of aggregate queries in array databases may provide a strong potential benefit. This thesis investigates the application of OLAP preaggregation techniques in speeding up query processing in array databases. In particular, we focus on enhancing aggregate computation in GIS and remote-sensing imaging applications. However, the results can be generalized to other domains as well.
Relevant and complementary questions to this thesis are: 1. What factors influence the decision of selecting an aggregate query for preaggregation? 2. What formalisms are necessary to establish an efficient and scalable pre-aggregation framework for array databases? 3. What type of constraints are typically considered by existing OLAP pre-aggregation algorithms, and how do they effect performance? The thesis objectives are outlined as follows: 1. To illustrate the necessity for improving aggregate computation in array databases for GIS and remote-sensing imaging applications. 2. To achieve a solid understanding of OLAP pre-aggregation algorithms and architectural issues when manipulating large amounts of data. 3. To formally describe fundamental operations in GIS and remote-sensing imaging applications and identify those that involve data summarization. 4. To design a theoretical pre-aggregation framework for array databases supporting GIS and remote-sensing imaging applications. 5. To design query selection and query rewriting algorithms using existing OLAP/data warehousing pre-aggregation techniques. 6. To implement algorithms in an array database management system. 7. To conduct a performance study of the developed algorithms. The methodological approach employed in this thesis is centered on a three-stage design methodology: • Identification of fundamental operations in GIS and remote-sensing imaging applications. A literature review helped us identify fundamental operations in GIS that require data summarization. The literature included different classification schemes, international standards and best practices. • Design and implementation Existing OLAP pre-aggregation techniques are used as a basis for the construction of a pre-aggregation framework for array databases. Storage space constraints are considered while designing query selection algorithms. The algorithms were developed using the C++ programming language and tested in the RasDaMan multidimensional array database management system. • Evaluation Performance of the developed algorithms is measured on 2D, 3D, and 4D datasets. For scaling operations on 2D datasets we compare our results against those of the traditional image pyramids approach. 11
Page 1: Applying OLAP Pre-Aggregation Techn
Page 5 and 6: Acknowledgments I would like to exp
Page 7 and 8: Abstract Large multidimensional arr
Page 9 and 10: Contents 1 Introduction and Problem
Page 11 and 12: List of Figures 2.1 3D Array . . .
Page 13 and 14: List of Tables 3.1 UNO and FAO Suit
Page 15: Chapter 1 Introduction and Problem
Page 19 and 20: 1.2 Publications Related to this Th
Page 21 and 22: Chapter 2 Background and Related Wo
Page 23 and 24: 2.1 Array Databases 17 Figure 2.2 s
Page 25 and 26: 2.1 Array Databases 19 toward the s
Page 27 and 28: 2.1 Array Databases 21 • Bilinear
Page 29 and 30: 2.1 Array Databases 23 given image
Page 31 and 32: 2.2 On-Line Analytical Processing (
Page 39 and 40: 2.3 Discussion 33 spatial-vector da
Page 41 and 42: 2.3 Discussion 35 • Both applicat
Page 43 and 44: Chapter 3 Fundamental Geo-Raster Op
Page 45 and 46: 3.2 Geo-Raster Operations 39 3.1.2
Page 47 and 48: 3.2 Geo-Raster Operations 41 multip
Page 49 and 50: 3.2 Geo-Raster Operations 43 Table
Page 51 and 52: 3.2 Geo-Raster Operations 45 turn i
Page 53 and 54: 3.2 Geo-Raster Operations 47 (a) Or
Page 55 and 56: 3.2 Geo-Raster Operations 49 Query
Page 57 and 58: 3.2 Geo-Raster Operations 51 contai
Page 59 and 60: 3.2 Geo-Raster Operations 53 is the
Page 61 and 62: 3.2 Geo-Raster Operations 55 3.2.4
Page 63 and 64: 3.2 Geo-Raster Operations 57 As in
Page 65 and 66: 3.2 Geo-Raster Operations 59 Local
Page 67 and 68:
3.3 Summary 61 Slicing The slicing
Page 69 and 70:
Chapter 4 Answering Basic Aggregate
Page 71 and 72:
4.1 Framework 65 pre-aggregated res
Page 73 and 74:
4.2 Cost Model 67 By partitioning t
Page 75 and 76:
4.2 Cost Model 69 Cost of independe
Page 77 and 78:
4.3 Implementation 71 Algorithm 1 Q
Page 79 and 80:
4.4 Experimental Results 73 Query E
Page 81 and 82:
4.5 Summary 75 pre-aggregates: inde
Page 83 and 84:
Chapter 5 Pre-Aggregation Support B
Page 85 and 86:
5.2 Conceptual Framework 79 Figure
Page 87 and 88:
5.2 Conceptual Framework 81 Benefit
Page 89 and 90:
5.4 Answering Scaling Operations Us
Page 91 and 92:
5.5 Experimental Results 85 Algorit
Page 93 and 94:
5.5 Experimental Results 87 (a) Que
Page 95 and 96:
5.5 Experimental Results 89 (a) Sel
Page 97 and 98:
5.5 Experimental Results 91 vectors
Page 99 and 100:
5.5 Experimental Results 93 root no
Page 101 and 102:
5.5 Experimental Results 95 Figure
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
5.6 Summary 101 we considered non-u
Page 109 and 110:
Chapter 6 Conclusion One of the big
Page 111 and 112:
6.1 Future Work 105 more non-spatio
Page 113 and 114:
Bibliography [1] Blakeley J. A., La
Page 115 and 116:
BIBLIOGRAPHY 109 [22] Moon B., Vega
Page 117 and 118:
BIBLIOGRAPHY 111 [47] ESRI Inc. Arc
Page 119 and 120:
BIBLIOGRAPHY 113 [73] Stefanovic N.
Page 121:
BIBLIOGRAPHY 115 [97] Kotidis Y. an
show all

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

Create successful ePaper yourself

Delete template?

Save as template?