11.03.2014 Views

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

10 1. Introduction and Problem Statement<br />

the contents of the BLOB when they wish <strong>to</strong> operate on the data. The main drawback<br />

<strong>to</strong> this approach is that it either requires the entire array <strong>to</strong> be passed <strong>to</strong> the client, or it<br />

requires that the client perform a large number of BLOB input/output (I/O) operations<br />

<strong>to</strong> read only the required portions of the array. With databases growing beyond a few<br />

tens of terabytes, the analysis of large volumes of array datasets is severely limited<br />

by the relatively low I/O performance of most of <strong>to</strong>days computing platforms. Highperformance<br />

numerical simulations are also increasingly feeling the I/O bottleneck.<br />

To improve data management and analytics on large reposi<strong>to</strong>ries of data, aggregation<br />

has been put forward as a key process when describing high-level data. An<br />

example of data aggregation is the computation and s<strong>to</strong>rage of statistical parameters,<br />

such as count, average, median, and standard deviation. Aggregate computation has<br />

been studied in a variety of settings [4, 21, 66]. In particular, On-Line Analytical Processing<br />

(<strong>OLAP</strong>) technology has emerged <strong>to</strong> address the problem of efficiently computing<br />

complex multidimensional aggregate queries on large data warehouses. Most<br />

<strong>OLAP</strong> systems rely on the process of selecting aggregate combinations, and then precomputing<br />

and s<strong>to</strong>ring their results so the database system can make use of them in<br />

subsequent requests. Such a process is known as pre-aggregation, which has proved <strong>to</strong><br />

speed up aggregate queries by several orders of magnitude in business and statistical<br />

applications [31, 41].<br />

While considerable work has been done on the problem of efficiently computing<br />

aggregate queries in <strong>OLAP</strong>-based applications, such computations continue <strong>to</strong> be a<br />

data management challenge in scientific applications. A relevant example in which the<br />

use of advanced data management and efficient query processing are highly desirable<br />

is hyper-spectral remote-sensing imaging, in which an image spectrometer collects<br />

hundreds or even thousands of measurements for the same area of the surface of the<br />

Earth. The scenes provided by such sensors are often called data cubes <strong>to</strong> denote<br />

the dimensionality of the data. Notably, efficient query processing and data mining<br />

techniques facilitate exploration of spatio-temporal data patterns, both interactively as<br />

well as in batch on archived data.<br />

A significant fraction of scientific data is image-based and can be naturally represented<br />

in multidimensional arrays. These datasets fit poorly in<strong>to</strong> relational databases,<br />

which lack efficient support for the concepts of physical proximity and order. They<br />

are typically s<strong>to</strong>red in array-friendly formats such as HDF5, netCDF, or FITS. The<br />

extremely high computational requirements introduced by image-based scientific applications<br />

make them an excellent case study for our research.<br />

Since array databases and <strong>OLAP</strong>/data warehousing both deal with large multidimensional<br />

datasets and aggregate queries, adapting <strong>OLAP</strong> pre-aggregation techniques<br />

<strong>to</strong> the management and computation of aggregate queries in array databases may provide<br />

a strong potential benefit. This thesis investigates the application of <strong>OLAP</strong> preaggregation<br />

techniques in speeding up query processing in array databases. In particular,<br />

we focus on enhancing aggregate computation in GIS and remote-sensing imaging<br />

applications. However, the results can be generalized <strong>to</strong> other domains as well.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!