The Data Warehouse System of MeteoSwiss: Concept and First ...
The Data Warehouse System of MeteoSwiss: Concept and First ...
The Data Warehouse System of MeteoSwiss: Concept and First ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>The</strong> <strong>Data</strong> <strong>Warehouse</strong> <strong>System</strong> <strong>of</strong> <strong>MeteoSwiss</strong>: <strong>Concept</strong> <strong>and</strong> <strong>First</strong><br />
Experiences<br />
Estelle Grüter*, Christian Häberli*, Dimitrios Tombros**, Nadine Tschichold*, Walter Krottendorfer †,<br />
Rudolf Höfler †<br />
<strong>MeteoSwiss</strong>*,STCG**, PSE†<br />
ABSTRACT<br />
In a world where the amount <strong>of</strong> measured data increased incredibly over the last years, an efficient<br />
system to store <strong>and</strong> h<strong>and</strong>le data <strong>and</strong> to assure its quality is <strong>of</strong> crucial importance. Large quantities<br />
<strong>of</strong> numerical <strong>and</strong> multi medial data are collected for meteorological <strong>and</strong> climatological purposes<br />
coming in from automatic systems as well as from observers. <strong>Data</strong> from most diverse sources have<br />
to be integrated in one central database system where it undergoes various quality control <strong>and</strong><br />
consistency check procedures. At the same time the data is continuously used by meteorological<br />
<strong>and</strong> climatological applications. This paper gives an insight into the data warehouse system <strong>and</strong> its<br />
architecture developed for the special needs <strong>of</strong> <strong>MeteoSwiss</strong> (the national weather service <strong>of</strong><br />
Switzerl<strong>and</strong>).<br />
1. Introduction<br />
Automatic meteorological measurements with high temporal resolution are carried out in<br />
Switzerl<strong>and</strong> since the late seventies. After more than two decades the need <strong>of</strong> a new efficient<br />
system to store <strong>and</strong> treat the continually increasing amount <strong>of</strong> data using state-<strong>of</strong>-the-art<br />
technologies became more <strong>and</strong> more urgent. Since the application <strong>of</strong> data warehouse technology in<br />
the domain <strong>of</strong> science can be advantageous for manipulating large quantities <strong>of</strong> sensor data,<br />
performing statistical analysis <strong>and</strong> extracting meaningful trends, this approach was chosen to meet<br />
the various <strong>and</strong> complex requests.<br />
2. <strong>Concept</strong> <strong>of</strong> <strong>MeteoSwiss</strong> <strong>Data</strong> <strong>Warehouse</strong><br />
<strong>The</strong> data to be stored in this new central system originates in different data sources such as<br />
automatic weather stations, observations made by observers, model output, synop data delivered<br />
by the Global Telecommunication <strong>System</strong> <strong>and</strong> others. <strong>The</strong>y have to be loaded according to their<br />
varying structure <strong>and</strong> afterwards to be checked <strong>of</strong> their plausibility using a complex system <strong>of</strong> rules<br />
<strong>and</strong> tests. In a next step aggregations are carried out to provide customers with data <strong>of</strong> a lower<br />
temporal resolution such as daily, monthly <strong>and</strong> yearly values. Finally homogenization <strong>of</strong> data series<br />
takes place. Applications allow an extraction <strong>of</strong> data <strong>of</strong> different quality levels.
Following figure is supposed to give a short insight in the data process chain at <strong>MeteoSwiss</strong>:<br />
measuring<br />
equipment<br />
datalogger,<br />
onsite<br />
QC<br />
level 1<br />
data<br />
realtime data<br />
collection <strong>and</strong><br />
quality control<br />
commu<br />
nication<br />
<strong>and</strong><br />
networkmonitoring<br />
Fig. 1: General data flow at <strong>MeteoSwiss</strong><br />
level 2<br />
data<br />
quality control,<br />
calculation <strong>of</strong><br />
derived<br />
quantities<br />
level 3<br />
data<br />
Since the needs <strong>of</strong> customers ask for different requests as far as performance <strong>and</strong> applications are<br />
concerned it was decided to part distinctively the various steps <strong>of</strong> data processing. This policy was<br />
followed when the abstract architecture <strong>of</strong> the <strong>Data</strong> <strong>Warehouse</strong> was designed.<br />
<strong>Data</strong> sources<br />
Automatic Stations<br />
Observations<br />
GTS Synop<br />
Sounding<br />
Model output<br />
...<br />
Quality control <strong>and</strong> Loading<br />
Process<br />
Staging area<br />
<strong>Data</strong>base<br />
optimized for<br />
online<br />
transactional<br />
processing<br />
4 steps <strong>of</strong> quality control<br />
<strong>Data</strong> processing<br />
MeteoSchweiz <strong>Data</strong> <strong>Warehouse</strong> <strong>System</strong><br />
Storage area<br />
quality control &<br />
error removal<br />
(automatically<br />
Fig. 2: Simplified abstract architecture <strong>of</strong> <strong>MeteoSwiss</strong> <strong>Data</strong> <strong>Warehouse</strong> <strong>System</strong><br />
Aggregeation<br />
QC<br />
<strong>Data</strong>base<br />
optimized for<br />
online analytical<br />
processing<br />
<strong>Data</strong> analysis<br />
Meta <strong>Data</strong>base ( to control all processes )<br />
Applications Clients<br />
meta<br />
data<br />
level 4<br />
data<br />
calculation <strong>and</strong> aggregation<br />
aggregation<br />
meta data<br />
Distribution<br />
homogenization<br />
Homog.<br />
Applications<br />
Privat customers<br />
Universities<br />
Extraction<br />
level 5<br />
data<br />
Research Institutes<br />
Companies<br />
...
As shown in Figure 2 the conceptual architecture consists <strong>of</strong> four layers. It is a further development<br />
<strong>of</strong> the conventional data warehouse seeing that <strong>Data</strong> <strong>Warehouse</strong> technology is combined with<br />
classical relational database technology.<br />
1. Layer: <strong>Data</strong> Sources: This contains a wide variety <strong>of</strong> sources for meteorological <strong>and</strong><br />
climatological data such as point measurements from observations <strong>and</strong><br />
automatic stations, pr<strong>of</strong>ile measurements from upper-air stations, picture<br />
data, volume data from weather radars <strong>and</strong> weather forecast models.<br />
2. Layer: Staging Area: It comprises all databases <strong>and</strong> tools to digitize, load, check <strong>and</strong> integrate<br />
data. Special emphasis is given to data quality control systems. All changes<br />
made on values are logged <strong>and</strong> thus historised.<br />
3. Layer: Storage Area: This is the classical data <strong>Warehouse</strong> which contains an integrated set <strong>of</strong><br />
data to support meteorologists <strong>and</strong> climatologists in weather forecasting<br />
<strong>and</strong> climate research.<br />
4. Layer: Analysis Area: This layer contains various analysis, visualization <strong>and</strong> data extraction<br />
tools<br />
4. Realisation <strong>and</strong> first experiences<br />
<strong>The</strong> main goal <strong>of</strong> this project is to establish an efficient way <strong>of</strong> processing data <strong>and</strong> to come up with<br />
an integrated data set <strong>of</strong> high quality. Since the complexity <strong>of</strong> this system is a huge challenge as<br />
well as risk it was decided to have an iterative project proceeding. <strong>First</strong> the architecture <strong>of</strong> the whole<br />
system was designed. Due to the fact that <strong>MeteoSwiss</strong> has not a core competence in realizing data<br />
warehouses, a call <strong>of</strong> tenders (WTO) was published <strong>and</strong> a company could be found. In<br />
collaboration with the partner the ‘backbone’ <strong>of</strong> the system was developed <strong>and</strong> implemented in a<br />
first step for a few automatic stations to pro<strong>of</strong> if the concept meets the dem<strong>and</strong>s. In a second step<br />
the set <strong>of</strong> stations was enhanced to have different data sources involved. Additionally more features<br />
were developed <strong>and</strong> integrated. <strong>The</strong> further planning <strong>of</strong> the project foresees defined releases in<br />
which legacy systems will be removed as well as new tools made operational. <strong>The</strong> idea <strong>of</strong> the<br />
release planning was to keep the complexity <strong>of</strong> the system under control.<br />
5. Conclusions<br />
In the described <strong>MeteoSwiss</strong> <strong>Data</strong> <strong>Warehouse</strong> <strong>System</strong> we have developed an architecture which<br />
uses several elements from <strong>Data</strong> <strong>Warehouse</strong> as well as other technologies. <strong>The</strong> distinction in four<br />
layers allows the requested flexibility as far as dem<strong>and</strong>s about data quality or performance are<br />
concerned. <strong>The</strong> iterative proceeding proved to be very worthwile to keep under control the<br />
complexity <strong>of</strong> the system <strong>and</strong> is well recommended also for other projects. <strong>The</strong> collaboration with a<br />
partner company enforced very clear specifications <strong>of</strong> the dem<strong>and</strong>s which resulted in a good<br />
documentation as well as an early determination <strong>of</strong> upcoming problems.
LITERATURE<br />
Häberli Ch., <strong>and</strong> D. Tombros, 2001: A <strong>Data</strong> <strong>Warehouse</strong> Architecture for <strong>MeteoSwiss</strong>: An<br />
Experience Report. International Workshop on Design <strong>and</strong> Management <strong>of</strong> <strong>Data</strong><br />
<strong>Warehouse</strong>s (DMDW’01) in conjunction with the 13 th Conference on Advanced Information<br />
<strong>System</strong>s Engineering (CaiSE’01). [available from Swiss Life, IT Research&Development,<br />
CC/ITRD, P.O. Box, 8022 Zurich, Switzerl<strong>and</strong>].<br />
Tschichold N., 2000: Konzeptbericht zum MeteoSchweiz <strong>Data</strong> <strong>Warehouse</strong> Vs. 1.1; MeteoSchweiz<br />
Konzelmann, T. et al.,1998: New treatment <strong>of</strong> real time climate data sets from smi weather stations.<br />
Proceedings 2 nd European conference on Applied Climatology, 19-23 Oct 1998