11.03.2014 Views

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 1<br />

Introduction and Problem Statement<br />

Scientific computing platforms and infrastructures are making new kinds of experiments<br />

possible, resulting in the generation of vast volumes of arrays of data. This<br />

is happening in many specialized application areas such as meteorology, oceanography,<br />

hydrology, astronomy, medical imaging, and exploration systems for oil, natural<br />

gas, coal, and diamonds. These datasets range from uniformly spaced points<br />

(cells) along a single dimension <strong>to</strong> multidimensional arrays containing several different<br />

types of data. For example, astronomy and earth sciences operate on two- or<br />

three-dimensional spatial grids, often using a plethora of spherical coordinate systems.<br />

Furthermore, nearly all sciences must deal with data series over time. It is frequently<br />

necessary <strong>to</strong> understand relationships between consecutive elements in time,<br />

or <strong>to</strong> analyze entire sequences of observations, and such datasets may represent spatial,<br />

temporal, or spatio-temporal information. For example, if ocean measurements<br />

such as temperature, salinity, and oxygen are recorded every hour at spacings of every<br />

one meter in depth, and every ten meters in two horizontal dimensions, the result is<br />

a four-dimensional array with three spatial dimensions and one temporal dimension,<br />

and three values attached <strong>to</strong> each cell of the array.<br />

In the past, arrays were typically s<strong>to</strong>red in files and then manipulated by programs<br />

that operated on these files. Nowadays, with science moving <strong>to</strong>ward being computational<br />

and data based, the trend is <strong>to</strong>ward a new class of database system which provides<br />

support for not only traditional, or coded, data types such as text, integers, etc.,<br />

but also richer data types like multidimensional arrays. This new trend of databases is<br />

referred <strong>to</strong> as Array Databases.<br />

Implementing an efficient array database management system (DBMS) can be very<br />

challenging. Typically, there are two approaches that can be taken <strong>to</strong> s<strong>to</strong>re array<br />

datasets in a DBMS. In the first, the values of each cell are s<strong>to</strong>red in a separate row,<br />

along with fields describing the position of the cell in the array. The most obvious<br />

drawback of this approach is the need for a large multidimensional index <strong>to</strong> efficiently<br />

find rows in the table. Moreover, the space taken by a multidimensional index is larger<br />

than the size of the table itself if all dimensions forming an array are used as the key.<br />

In the second approach, a multidimensional array is written <strong>to</strong> a Binary Large Object<br />

(BLOB), which is s<strong>to</strong>red in a field of a table in the database. Applications then fetch<br />

9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!