Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University
Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University
Applying OLAP Pre-Aggregation Techniques to ... - Jacobs University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Chapter 1<br />
Introduction and Problem Statement<br />
Scientific computing platforms and infrastructures are making new kinds of experiments<br />
possible, resulting in the generation of vast volumes of arrays of data. This<br />
is happening in many specialized application areas such as meteorology, oceanography,<br />
hydrology, astronomy, medical imaging, and exploration systems for oil, natural<br />
gas, coal, and diamonds. These datasets range from uniformly spaced points<br />
(cells) along a single dimension <strong>to</strong> multidimensional arrays containing several different<br />
types of data. For example, astronomy and earth sciences operate on two- or<br />
three-dimensional spatial grids, often using a plethora of spherical coordinate systems.<br />
Furthermore, nearly all sciences must deal with data series over time. It is frequently<br />
necessary <strong>to</strong> understand relationships between consecutive elements in time,<br />
or <strong>to</strong> analyze entire sequences of observations, and such datasets may represent spatial,<br />
temporal, or spatio-temporal information. For example, if ocean measurements<br />
such as temperature, salinity, and oxygen are recorded every hour at spacings of every<br />
one meter in depth, and every ten meters in two horizontal dimensions, the result is<br />
a four-dimensional array with three spatial dimensions and one temporal dimension,<br />
and three values attached <strong>to</strong> each cell of the array.<br />
In the past, arrays were typically s<strong>to</strong>red in files and then manipulated by programs<br />
that operated on these files. Nowadays, with science moving <strong>to</strong>ward being computational<br />
and data based, the trend is <strong>to</strong>ward a new class of database system which provides<br />
support for not only traditional, or coded, data types such as text, integers, etc.,<br />
but also richer data types like multidimensional arrays. This new trend of databases is<br />
referred <strong>to</strong> as Array Databases.<br />
Implementing an efficient array database management system (DBMS) can be very<br />
challenging. Typically, there are two approaches that can be taken <strong>to</strong> s<strong>to</strong>re array<br />
datasets in a DBMS. In the first, the values of each cell are s<strong>to</strong>red in a separate row,<br />
along with fields describing the position of the cell in the array. The most obvious<br />
drawback of this approach is the need for a large multidimensional index <strong>to</strong> efficiently<br />
find rows in the table. Moreover, the space taken by a multidimensional index is larger<br />
than the size of the table itself if all dimensions forming an array are used as the key.<br />
In the second approach, a multidimensional array is written <strong>to</strong> a Binary Large Object<br />
(BLOB), which is s<strong>to</strong>red in a field of a table in the database. Applications then fetch<br />
9