29.08.2013 Views

The Data Warehouse System of MeteoSwiss: Concept and First ...

The Data Warehouse System of MeteoSwiss: Concept and First ...

The Data Warehouse System of MeteoSwiss: Concept and First ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>The</strong> <strong>Data</strong> <strong>Warehouse</strong> <strong>System</strong> <strong>of</strong> <strong>MeteoSwiss</strong>: <strong>Concept</strong> <strong>and</strong> <strong>First</strong><br />

Experiences<br />

Estelle Grüter*, Christian Häberli*, Dimitrios Tombros**, Nadine Tschichold*, Walter Krottendorfer †,<br />

Rudolf Höfler †<br />

<strong>MeteoSwiss</strong>*,STCG**, PSE†<br />

ABSTRACT<br />

In a world where the amount <strong>of</strong> measured data increased incredibly over the last years, an efficient<br />

system to store <strong>and</strong> h<strong>and</strong>le data <strong>and</strong> to assure its quality is <strong>of</strong> crucial importance. Large quantities<br />

<strong>of</strong> numerical <strong>and</strong> multi medial data are collected for meteorological <strong>and</strong> climatological purposes<br />

coming in from automatic systems as well as from observers. <strong>Data</strong> from most diverse sources have<br />

to be integrated in one central database system where it undergoes various quality control <strong>and</strong><br />

consistency check procedures. At the same time the data is continuously used by meteorological<br />

<strong>and</strong> climatological applications. This paper gives an insight into the data warehouse system <strong>and</strong> its<br />

architecture developed for the special needs <strong>of</strong> <strong>MeteoSwiss</strong> (the national weather service <strong>of</strong><br />

Switzerl<strong>and</strong>).<br />

1. Introduction<br />

Automatic meteorological measurements with high temporal resolution are carried out in<br />

Switzerl<strong>and</strong> since the late seventies. After more than two decades the need <strong>of</strong> a new efficient<br />

system to store <strong>and</strong> treat the continually increasing amount <strong>of</strong> data using state-<strong>of</strong>-the-art<br />

technologies became more <strong>and</strong> more urgent. Since the application <strong>of</strong> data warehouse technology in<br />

the domain <strong>of</strong> science can be advantageous for manipulating large quantities <strong>of</strong> sensor data,<br />

performing statistical analysis <strong>and</strong> extracting meaningful trends, this approach was chosen to meet<br />

the various <strong>and</strong> complex requests.<br />

2. <strong>Concept</strong> <strong>of</strong> <strong>MeteoSwiss</strong> <strong>Data</strong> <strong>Warehouse</strong><br />

<strong>The</strong> data to be stored in this new central system originates in different data sources such as<br />

automatic weather stations, observations made by observers, model output, synop data delivered<br />

by the Global Telecommunication <strong>System</strong> <strong>and</strong> others. <strong>The</strong>y have to be loaded according to their<br />

varying structure <strong>and</strong> afterwards to be checked <strong>of</strong> their plausibility using a complex system <strong>of</strong> rules<br />

<strong>and</strong> tests. In a next step aggregations are carried out to provide customers with data <strong>of</strong> a lower<br />

temporal resolution such as daily, monthly <strong>and</strong> yearly values. Finally homogenization <strong>of</strong> data series<br />

takes place. Applications allow an extraction <strong>of</strong> data <strong>of</strong> different quality levels.


Following figure is supposed to give a short insight in the data process chain at <strong>MeteoSwiss</strong>:<br />

measuring<br />

equipment<br />

datalogger,<br />

onsite<br />

QC<br />

level 1<br />

data<br />

realtime data<br />

collection <strong>and</strong><br />

quality control<br />

commu<br />

nication<br />

<strong>and</strong><br />

networkmonitoring<br />

Fig. 1: General data flow at <strong>MeteoSwiss</strong><br />

level 2<br />

data<br />

quality control,<br />

calculation <strong>of</strong><br />

derived<br />

quantities<br />

level 3<br />

data<br />

Since the needs <strong>of</strong> customers ask for different requests as far as performance <strong>and</strong> applications are<br />

concerned it was decided to part distinctively the various steps <strong>of</strong> data processing. This policy was<br />

followed when the abstract architecture <strong>of</strong> the <strong>Data</strong> <strong>Warehouse</strong> was designed.<br />

<strong>Data</strong> sources<br />

Automatic Stations<br />

Observations<br />

GTS Synop<br />

Sounding<br />

Model output<br />

...<br />

Quality control <strong>and</strong> Loading<br />

Process<br />

Staging area<br />

<strong>Data</strong>base<br />

optimized for<br />

online<br />

transactional<br />

processing<br />

4 steps <strong>of</strong> quality control<br />

<strong>Data</strong> processing<br />

MeteoSchweiz <strong>Data</strong> <strong>Warehouse</strong> <strong>System</strong><br />

Storage area<br />

quality control &<br />

error removal<br />

(automatically<br />

Fig. 2: Simplified abstract architecture <strong>of</strong> <strong>MeteoSwiss</strong> <strong>Data</strong> <strong>Warehouse</strong> <strong>System</strong><br />

Aggregeation<br />

QC<br />

<strong>Data</strong>base<br />

optimized for<br />

online analytical<br />

processing<br />

<strong>Data</strong> analysis<br />

Meta <strong>Data</strong>base ( to control all processes )<br />

Applications Clients<br />

meta<br />

data<br />

level 4<br />

data<br />

calculation <strong>and</strong> aggregation<br />

aggregation<br />

meta data<br />

Distribution<br />

homogenization<br />

Homog.<br />

Applications<br />

Privat customers<br />

Universities<br />

Extraction<br />

level 5<br />

data<br />

Research Institutes<br />

Companies<br />

...


As shown in Figure 2 the conceptual architecture consists <strong>of</strong> four layers. It is a further development<br />

<strong>of</strong> the conventional data warehouse seeing that <strong>Data</strong> <strong>Warehouse</strong> technology is combined with<br />

classical relational database technology.<br />

1. Layer: <strong>Data</strong> Sources: This contains a wide variety <strong>of</strong> sources for meteorological <strong>and</strong><br />

climatological data such as point measurements from observations <strong>and</strong><br />

automatic stations, pr<strong>of</strong>ile measurements from upper-air stations, picture<br />

data, volume data from weather radars <strong>and</strong> weather forecast models.<br />

2. Layer: Staging Area: It comprises all databases <strong>and</strong> tools to digitize, load, check <strong>and</strong> integrate<br />

data. Special emphasis is given to data quality control systems. All changes<br />

made on values are logged <strong>and</strong> thus historised.<br />

3. Layer: Storage Area: This is the classical data <strong>Warehouse</strong> which contains an integrated set <strong>of</strong><br />

data to support meteorologists <strong>and</strong> climatologists in weather forecasting<br />

<strong>and</strong> climate research.<br />

4. Layer: Analysis Area: This layer contains various analysis, visualization <strong>and</strong> data extraction<br />

tools<br />

4. Realisation <strong>and</strong> first experiences<br />

<strong>The</strong> main goal <strong>of</strong> this project is to establish an efficient way <strong>of</strong> processing data <strong>and</strong> to come up with<br />

an integrated data set <strong>of</strong> high quality. Since the complexity <strong>of</strong> this system is a huge challenge as<br />

well as risk it was decided to have an iterative project proceeding. <strong>First</strong> the architecture <strong>of</strong> the whole<br />

system was designed. Due to the fact that <strong>MeteoSwiss</strong> has not a core competence in realizing data<br />

warehouses, a call <strong>of</strong> tenders (WTO) was published <strong>and</strong> a company could be found. In<br />

collaboration with the partner the ‘backbone’ <strong>of</strong> the system was developed <strong>and</strong> implemented in a<br />

first step for a few automatic stations to pro<strong>of</strong> if the concept meets the dem<strong>and</strong>s. In a second step<br />

the set <strong>of</strong> stations was enhanced to have different data sources involved. Additionally more features<br />

were developed <strong>and</strong> integrated. <strong>The</strong> further planning <strong>of</strong> the project foresees defined releases in<br />

which legacy systems will be removed as well as new tools made operational. <strong>The</strong> idea <strong>of</strong> the<br />

release planning was to keep the complexity <strong>of</strong> the system under control.<br />

5. Conclusions<br />

In the described <strong>MeteoSwiss</strong> <strong>Data</strong> <strong>Warehouse</strong> <strong>System</strong> we have developed an architecture which<br />

uses several elements from <strong>Data</strong> <strong>Warehouse</strong> as well as other technologies. <strong>The</strong> distinction in four<br />

layers allows the requested flexibility as far as dem<strong>and</strong>s about data quality or performance are<br />

concerned. <strong>The</strong> iterative proceeding proved to be very worthwile to keep under control the<br />

complexity <strong>of</strong> the system <strong>and</strong> is well recommended also for other projects. <strong>The</strong> collaboration with a<br />

partner company enforced very clear specifications <strong>of</strong> the dem<strong>and</strong>s which resulted in a good<br />

documentation as well as an early determination <strong>of</strong> upcoming problems.


LITERATURE<br />

Häberli Ch., <strong>and</strong> D. Tombros, 2001: A <strong>Data</strong> <strong>Warehouse</strong> Architecture for <strong>MeteoSwiss</strong>: An<br />

Experience Report. International Workshop on Design <strong>and</strong> Management <strong>of</strong> <strong>Data</strong><br />

<strong>Warehouse</strong>s (DMDW’01) in conjunction with the 13 th Conference on Advanced Information<br />

<strong>System</strong>s Engineering (CaiSE’01). [available from Swiss Life, IT Research&Development,<br />

CC/ITRD, P.O. Box, 8022 Zurich, Switzerl<strong>and</strong>].<br />

Tschichold N., 2000: Konzeptbericht zum MeteoSchweiz <strong>Data</strong> <strong>Warehouse</strong> Vs. 1.1; MeteoSchweiz<br />

Konzelmann, T. et al.,1998: New treatment <strong>of</strong> real time climate data sets from smi weather stations.<br />

Proceedings 2 nd European conference on Applied Climatology, 19-23 Oct 1998

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!