18.01.2015 Views

NIKLAS - Automatical quality control of time series data - ERAD 2010

NIKLAS - Automatical quality control of time series data - ERAD 2010

NIKLAS - Automatical quality control of time series data - ERAD 2010

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>ERAD</strong> <strong>2010</strong> - THE SIXTH EUROPEAN CONFERENCE ON RADAR IN METEOROLOGY AND HYDROLOGY<br />

<strong>NIKLAS</strong> - <strong>Automatical</strong> <strong>quality</strong> <strong>control</strong> <strong>of</strong> <strong>time</strong><br />

1. Introduction<br />

<strong>series</strong> <strong>data</strong><br />

G.Lempio 1 , C. Podlasly 1 , T. Einfalt 1<br />

1 hydro & meteo GmbH & Co.KG, Breite Str. 6-8, D-23552 Lübeck, Germany<br />

g.lempio@hydrometeo.de<br />

Guido Lempio<br />

The amount <strong>of</strong> precipitation measured by radar is relatively imprecise, caused by the determination through<br />

empirical ZR-relationships. For hydrological purposes, the accuracy <strong>of</strong> these measurements is not sufficient. One<br />

option to improve radar <strong>data</strong> is their adjustment with measurements from rain gauges. For this application it is<br />

extremely important to use only high <strong>quality</strong> <strong>time</strong> <strong>series</strong> (Steiner et al., 1999).<br />

The knowledge about the <strong>quality</strong> <strong>of</strong> input <strong>data</strong> from station measurements for radar adjustment, analysis and<br />

forecast tasks is important in order to assume the plausibility <strong>of</strong> results derived by models. Small amounts <strong>of</strong> input<br />

<strong>data</strong> can be <strong>quality</strong> <strong>control</strong>led manually by experienced observers, but larger amounts <strong>of</strong> <strong>data</strong> and real <strong>time</strong><br />

applications do not. Automatic algorithms can be used to reduce the manual effort for <strong>quality</strong> check by identifying<br />

the suspicious cases that have to be examined by human observers and to do a <strong>data</strong> check for real-<strong>time</strong> applications<br />

assuring a certain <strong>data</strong> <strong>quality</strong>.<br />

<strong>NIKLAS</strong> has been developed as a module for real-<strong>time</strong> and non-real-<strong>time</strong> check <strong>of</strong> meteorological <strong>data</strong>, required<br />

from Landesamt für Umwelt, Wasserwirtschaft und Gewerbeaufsicht Rheinland-Pfalz, Germany<br />

(www.wasser.rlp.de), for the Mosel forecast and warning system (www.timisflood.net). It validates <strong>time</strong> <strong>series</strong> for the<br />

parameters:<br />

• precipitation<br />

• global radiation<br />

• sunshine duration<br />

• air temperature<br />

• dew point temperature<br />

• relative humidity<br />

• wind speed<br />

• air pressure<br />

The <strong>quality</strong> check scheme is based on a <strong>series</strong> <strong>of</strong> tests. Their choice and order depend on the type <strong>of</strong> the<br />

meteorological parameters. The methodology for the test algorithms was taken from literature and operational<br />

experiences (Einfalt et. al., 2006).<br />

As result <strong>of</strong> each test the examined <strong>data</strong> records are classified into three <strong>quality</strong> levels – valid, low/limited <strong>quality</strong><br />

or bad <strong>quality</strong>. <strong>NIKLAS</strong> as a pure <strong>quality</strong> check tool excludes records with bad <strong>quality</strong> from the <strong>time</strong> <strong>series</strong> and does<br />

not replace them by plausible estimations.<br />

The <strong>NIKLAS</strong> working package has been applied successfully for real-<strong>time</strong> (TIMISflood) as well for non-real<strong>time</strong><br />

(ExUS) validation tasks (Quirmbach, et. al. 2009).<br />

2. <strong>NIKLAS</strong> – tool for automatic <strong>quality</strong> <strong>control</strong><br />

2.1 Control algorithms, for example applied to raingauge <strong>data</strong><br />

Quality Control (QC) <strong>of</strong> raingauge <strong>data</strong> has been an important topic since the beginning <strong>of</strong> <strong>data</strong> collection. Attempts<br />

to formalize this task have been started in several countries (see Einfalt et al. (2000) for an overview). However, the<br />

conclusion has been that the check <strong>of</strong> point rainfall measurements can only be done by human eye with a reliably<br />

good result (Jörgensen et al. 1998; Maul-Kötter and Einfalt 1998).<br />

Two aspects may prevent such manual procedures: real-<strong>time</strong> <strong>data</strong> to be further processed for flood warning, and<br />

large amounts <strong>of</strong> <strong>data</strong> to be investigated.


<strong>ERAD</strong> <strong>2010</strong> - THE SIXTH EUROPEAN CONFERENCE ON RADAR IN METEOROLOGY AND HYDROLOGY<br />

An automatic <strong>quality</strong> check starting from simple cases before tackling the difficult ones yields surprisingly good<br />

results.<br />

This approach has been proposed by the Bavarian agrometeorological service (Vaitl 1988) and refined by<br />

MeteoSwiss (Musa et al. 2003). Basically, the raingauge <strong>quality</strong> check starts with items that are easy to check followed<br />

by more complex ones. In practice, this means that firstly, existing gaps in the <strong>data</strong> are excluded from further<br />

treatment, then features on the <strong>data</strong> <strong>of</strong> one station only are investigated, and finally an intercomparison between the<br />

<strong>data</strong> from several raingauges is performed. The structure from simple check elements at the top, down to more<br />

complex ones, as proposed above, comprises the following steps (Einfalt et al. 2006):<br />

1. Detection <strong>of</strong> gaps in the <strong>data</strong><br />

2. Detection <strong>of</strong> physically impossible values<br />

3. Detection <strong>of</strong> constant values<br />

4. Detection <strong>of</strong> values above set thresholds<br />

5. Detection <strong>of</strong> improbable zero values<br />

6. Detection <strong>of</strong> unusually low values (which may be real, though)<br />

7. Detection <strong>of</strong> unusually high values (which may be real, though)<br />

2.1.1 Gaps in the <strong>data</strong> (Completeness)<br />

The detection <strong>of</strong> gaps in the <strong>data</strong> mainly serves to exclude the gap interval <strong>of</strong> a station to be used for<br />

comparison to other stations. Furthermore, a gap statistic can be derived which is an indicator <strong>of</strong> the reliability <strong>of</strong> the<br />

<strong>data</strong> from this station. Experience shows that well maintained stations with a good <strong>data</strong> <strong>quality</strong> rarely have gaps in<br />

their measurement <strong>series</strong>.<br />

2.1.2 Physically impossible values (Extreme value check)<br />

Physically impossible <strong>data</strong> consist <strong>of</strong> negative rainfall values and very high intensities, e.g., more than 5 mm per<br />

minute in a moderate climate. Such values should be excluded from further evaluation.<br />

As a function <strong>of</strong> the underlying <strong>data</strong> and s<strong>of</strong>tware, very high intensities may be a side effect <strong>of</strong> digitized paper<br />

charts. Such values can be further used if the <strong>time</strong> step for further analysis is large enough. For example, a value <strong>of</strong> 5<br />

mm in one minute can in reality be representative for five or ten minutes when analyzed with additional information<br />

(e.g., the basic paper charts). In such a case, the evaluation on a five or ten minute <strong>time</strong> grid is the correct further<br />

treatment.<br />

2.1.3 Constant values<br />

Constant values over a certain <strong>time</strong> are an indicator <strong>of</strong> unusable <strong>data</strong> which may be either due to bad digitization<br />

or to missing values in case <strong>of</strong> digital registration. The “certain <strong>time</strong>” for digital <strong>data</strong> with a <strong>time</strong> step <strong>of</strong> 1 to 5<br />

minutes is around 15 minutes for intensities above 1 mm/h. For digitized <strong>data</strong> derived from paper charts, this <strong>time</strong><br />

interval is a function <strong>of</strong> the paper chart resolution and may be as long as 60 minutes for old paper charts.<br />

2.1.4 Values above set thresholds (Extreme value check)<br />

For predefined durations, it is useful to indicate when the measurements are above statistically rare values, e.g.,<br />

higher than an event occurring every five or ten years. Useful durations comprise 5 minutes, 15 minutes, 60 minutes<br />

and 1440 minutes (one day).<br />

Such thresholds were defined in order to identify “interesting” events, i.e., events which should be carefully checked<br />

before accepting the measurements.<br />

2.1.5 Improbable zero values (Spatial consistency)<br />

While all <strong>of</strong> the above checks are performed on one station only, this check and the following two use the spatiotemporal<br />

rainfall structure as seen by several gauges. Improbable zero values can be detected at one station if all<br />

surrounding stations have significant rainfall and stations are close enough to each other.<br />

2.1.6 Unusually low daily values (Spatial consistency)<br />

A more sophisticated check is the check on too low values where the daily sum <strong>of</strong> a selected station is compared<br />

to the daily sums <strong>of</strong> the neighboring stations. If the surrounding stations recorded significantly more rainfall than the<br />

selected one, the measurement <strong>of</strong> this station has to be considered as doubtful.


<strong>ERAD</strong> <strong>2010</strong> - THE SIXTH EUROPEAN CONFERENCE ON RADAR IN METEOROLOGY AND HYDROLOGY<br />

2.1.7 Unusually high daily values (Spatial consistency)<br />

A season dependent check is the one on too high values, where the daily sum <strong>of</strong> a selected station is compared to<br />

the daily sums <strong>of</strong> the neighboring stations. If the surrounding stations recorded significantly less rainfall than the<br />

selected one, the measurement <strong>of</strong> this station has to be considered as doubtful if a convective rainfall can be<br />

excluded. This is usually the case in Germany between October and March.<br />

Additional check elements in <strong>NIKLAS</strong> are:<br />

- Variability<br />

- Inner consistency<br />

2.1.8 Variability<br />

This check compares two successive values from one <strong>time</strong> <strong>series</strong> in order to examine unusual differences <strong>of</strong><br />

these values.<br />

2.1.9 Inner consistency<br />

The inner consistency refers to the behavior <strong>of</strong> different parameters to each other at the same place. Values <strong>of</strong><br />

different parameters at the same <strong>time</strong> and the same place must be in a well defined relationship.<br />

2.2 Using <strong>NIKLAS</strong> for <strong>quality</strong> <strong>control</strong><br />

<strong>NIKLAS</strong> is a program for the plausibility check <strong>of</strong> meteorological station measurements <strong>of</strong> the parameters<br />

precipitation, air temperature, dew point temperature, relative humidity, global radiation, sunshine duration,<br />

atmospheric pressure and wind speed.<br />

2.2.1 Technical details to <strong>NIKLAS</strong><br />

The program <strong>NIKLAS</strong> is being delivered as WINDOWS application. Its required resources are at least<br />

Windows® XP as operating system and a RAM size <strong>of</strong> at least 512 MB. <strong>NIKLAS</strong> consists <strong>of</strong> two executables, first<br />

the command-line oriented computation program and second the graphical user interface (GUI) as support for the<br />

determination <strong>of</strong> the settings.<br />

The required input for <strong>NIKLAS</strong> are the <strong>time</strong> <strong>series</strong> <strong>of</strong> the meteorological parameters to be analyzed as well as<br />

the station information file and the configuration file (both in human readable ASCII-format). Output information<br />

comprises files with the results in form <strong>of</strong> a <strong>time</strong> <strong>series</strong>, as a check <strong>series</strong>, a log file and – if selected – extended test<br />

result information.<br />

All input <strong>data</strong> files need to have one <strong>of</strong> the following in Germany used specific formats:HMZ, station file<br />

format, ZRX.<br />

2.2.2 How to work with the <strong>NIKLAS</strong>-GUI<br />

The command line tool <strong>NIKLAS</strong> uses the configuration file “niklas.inp” which <strong>control</strong>s the behavior via <strong>control</strong><br />

words and parameter settings. The <strong>NIKLAS</strong>-GUI is a tool to setup the configuration file in an easy way. There are<br />

six main windows to perform the configuration:<br />

1. Basic Settings: Selection <strong>of</strong> …<br />

- the GUIs language, - input <strong>data</strong> format, - output files to be produced, - detailed check information, -<br />

procedures to be performed<br />

2. Extreme value check: Selection <strong>of</strong> …<br />

- minimum and maximum values for each selected parameter independently for warnings (“Warning value” –<br />

value is flagged) and/or errors (“Extreme value” – value is deleted). Maximum values per 60 minutes, per one<br />

day or per one month can be set for precipitation in order to integrate the cumulative check.<br />

3. Variability: Selection <strong>of</strong> …<br />

- maximum upward/downward tolerance between successive values <strong>of</strong> the parameters temperature, relative<br />

humidity, atmospheric pressure and dew point temperature.<br />

- A test according to Vaitl can be selected for wind speed but only available for the routine mode. This test is<br />

only implemented on hourly values for a <strong>time</strong> interval <strong>of</strong> a complete day and combines a given value with its<br />

predecessor and successor.<br />

- Results <strong>of</strong> a failed test are warnings.


<strong>ERAD</strong> <strong>2010</strong> - THE SIXTH EUROPEAN CONFERENCE ON RADAR IN METEOROLOGY AND HYDROLOGY<br />

FIG. 2-1. <strong>NIKLAS</strong>-GUI Window "Basic Settings".<br />

4. Constant values: Selection <strong>of</strong> …<br />

- maximum allowed <strong>time</strong> interval in hourly values for warnings and extreme values.<br />

- For the relative humidity, a difference is made between values below 95% and values equal or larger than<br />

95%. The first case should have stronger test criteria, but both tests produce errors or warnings when not<br />

passed.<br />

- Also for wind speed, there is a differentiation: values with > 0 m/s are checked with stronger criteria than<br />

values where the wind speed is 0 m/s.<br />

- For the relative humidity and the wind speed, the <strong>of</strong>fline (routine) mode produces a warning, while the online<br />

(operational) mode is producing an error when the test fails.<br />

5. Inner consistency: Selection <strong>of</strong> …<br />

- tests to perform (Air temperature – dew point temperature comparison, Relative humidity – Precipitation<br />

comparison, Sunshine duration – Precipitation comparison, - Check rain gage with radar <strong>data</strong>) and their<br />

outcome (Warning or Error).<br />

6. Spatial consistency: Selection <strong>of</strong> …<br />

- permitted deviation <strong>of</strong> atmospheric pressure, relative humidity, air temperature, dew point temperature,<br />

global radiation, sunshine duration and precipitation,<br />

- maximum distance and height corridor for analyzed stations,<br />

In order to do the processing, a maximum distance for the analysed stations needs to be defined, and a height<br />

corridor can be set in order to compare stations whose altitude is not too different from each other (“General<br />

settings”). The precipitation test allows to define additional parameters, e.g. the test for daily or monthly<br />

values, the consideration <strong>of</strong> the synoptical situation, the UMGMAX tests A and B, the test on the duration <strong>of</strong><br />

dry weather periods or the test for doubtful zero values through a neighbourhood check.<br />

The upper value is the factor for the maximally allowed value, to be applied to the average <strong>of</strong> the neighbouring<br />

stations, and the lower value the factor for the permitted minimum value. The minimum value for the check to<br />

be performed is the “minimum average value” which has to be reached either at the tested station or by the<br />

mean <strong>of</strong> the neighbours.<br />

The routine check only produces warnings when a test is not passed. This is identical for the operational check<br />

except for the zero value test, the consideration <strong>of</strong> the general weather situation in the spatial consistency test<br />

and the dry weather <strong>time</strong> period test (duration test) which produce errors.


<strong>ERAD</strong> <strong>2010</strong> - THE SIXTH EUROPEAN CONFERENCE ON RADAR IN METEOROLOGY AND HYDROLOGY<br />

The zero value test checks whether a station with 0 mm <strong>of</strong> precipitation is surrounded by stations where there<br />

is rainfall. For this the user can choose whether there need to be all neighbours with rain or only the majority<br />

<strong>of</strong> the neighbours.<br />

The review interval <strong>of</strong> the tested station is the <strong>time</strong> for which this station needs to have measured 0 mm <strong>of</strong><br />

precipitation to activate this test (intervals T1 to T3 in Figure 2-2). This value is compared to a comparison<br />

interval (“review interval”) at the neighbouring stations which can be shifted by a defined <strong>of</strong>fset (“buffer<br />

interval” - the figure shows an <strong>of</strong>fset <strong>of</strong> one hour). Thus the test can accommodate <strong>time</strong> differences in the<br />

arrival <strong>of</strong> precipitation at these stations. The test is now performed such that interval T2 <strong>of</strong> the neighbours is<br />

compared to T1, T2 and T3. The test flags the value <strong>of</strong> the tested station as doubtful (warning or error) if all<br />

part intervals <strong>of</strong> the tested station are zero, but for each <strong>of</strong> the neighbours’ T2 shows a value above the<br />

predefined minimum rainfall. The minimum rainfall is the amount <strong>of</strong> rainfall above which the neighbouring<br />

stations are considered to be “non-zero”.<br />

FIG. 2-2. Comparison intervals for the zero value test.<br />

3. Automatic <strong>quality</strong> <strong>control</strong> <strong>of</strong> <strong>data</strong><br />

3.1 Real <strong>time</strong> automatic <strong>quality</strong> <strong>control</strong><br />

The LUWG (Landesamt für Umwelt, Wasserwirtschaft und Gewerbeaufsicht Rheinland-Pfalz) is operating the<br />

Mosel flood warning system. In the context <strong>of</strong> the INTERREG TIMISflood project (www.timisflood.net), <strong>NIKLAS</strong><br />

has been developed. Operationally, it is used after <strong>data</strong> acquisition by WISKI to validate the measurement values.<br />

The result is then handed over to the hydrological simulation model LARSIM.<br />

3.2 Offline automatic <strong>quality</strong> <strong>control</strong><br />

FIG. 3-1. <strong>NIKLAS</strong> in operational flood warning.<br />

In the project ExUS (Extremwertstatistische Untersuchung von Starkniederschlägen in NRW (1950 - 2008),<br />

statistical analysis <strong>of</strong> extreme heavy precipitation values in North Rhine Westphalia), analysis <strong>of</strong> meteorological<br />

station <strong>data</strong> were performed by <strong>NIKLAS</strong>.<br />

The number <strong>of</strong> analysed stations was: continuously measuring 195, daily value stations 418.<br />

The selected thresholds for the <strong>NIKLAS</strong> configuration were:<br />

- 5 mm amount <strong>of</strong> precipitation in 1 minute,<br />

- 17.5 mm (*) amount <strong>of</strong> precipitation in 5 minutes,<br />

- 48 mm (*) amount <strong>of</strong> precipitation in 60minutes,<br />

- 90 mm (*) amount <strong>of</strong> precipitation in 1440 minutes,<br />

(*) corresponds to the 100-yearly Precipitation for Essen (Germany)


<strong>ERAD</strong> <strong>2010</strong> - THE SIXTH EUROPEAN CONFERENCE ON RADAR IN METEOROLOGY AND HYDROLOGY<br />

continous values<br />

daily values<br />

FIG. 3-2. Verification results: blue = no remarks; yellow = remarks by <strong>NIKLAS</strong> but no <strong>data</strong> error; red = remarks and <strong>data</strong><br />

corrections – numbers are # <strong>of</strong> stations.<br />

Nearly all continuously measuring and a third <strong>of</strong> the daily stations got warnings from <strong>NIKLAS</strong>. A third <strong>of</strong> these<br />

warnings induced <strong>data</strong> corrections (set gaps or <strong>time</strong> shift). A lot <strong>of</strong> the warnings caused by to high intensities or<br />

amounts occurred during summer and were plausible.<br />

4. Summary<br />

Quality <strong>control</strong> <strong>of</strong> <strong>data</strong> is a task which needs a lot <strong>of</strong> experience. Although many methods have been developed<br />

to perform these checks automatically, they depend on the climate, the density <strong>of</strong> the network and the <strong>quality</strong> <strong>of</strong> the<br />

<strong>data</strong>. The latter is very important, since only with good <strong>quality</strong> <strong>data</strong> erroneous <strong>data</strong> can be detected from a spatial<br />

perspective.<br />

The presented s<strong>of</strong>tware does not use any additional information such as radar or satellite measurements, or<br />

results from numerical weather models as it is done elsewhere (e.g., Musa et al. 2003).<br />

Until now the <strong>NIKLAS</strong> working package could be applied successfully as a proper tool for real-<strong>time</strong><br />

(TIMISflood) as well for non-real-<strong>time</strong> (ExUS) validation tasks.<br />

References<br />

Einfalt T., Arnbjerg-Nielsen K., Spies S., (2000): Rainfall <strong>data</strong> measurement and processing for model use in urban<br />

hydrology. 5th International Workshop on Precipitation in Urban Areas. Pontresina, 10.-13. December,<br />

Einfalt T., Jessen M., Quirmbach M., (2006): Can we check raingauge <strong>data</strong> automatically 7th International Workshop on<br />

Precipitation in Urban Areas, St. Moritz, Switzerland, 7.-10. December, ISBN 3-909386-65-2.<br />

Jörgensen H.K., Rosenörn S., Madsen H., Mikkelsen P.S., (1998): Quality <strong>control</strong> <strong>of</strong> rain <strong>data</strong> used for urban run<strong>of</strong>f<br />

systems, Water Sci Technol 37 No. 11<br />

Maul-Kötter B., Einfalt T., (1998): Correction and preparation <strong>of</strong> continuously measured raingauge <strong>data</strong>: a standard<br />

method in North Rhine-Westphalia. Water Sci Technol, vol. 37, No. 11<br />

Musa M., Grüter E., Abbt M., Häberli C., Häller E., Küng U., Konzelmann T., Dössegger R., (2003): Quality Control<br />

Tools for Meteorological Data in the MeteoSwiss Data Warehouse System. In: Proc. ICAM/MAP 2003., Brig,<br />

Switzerland, 19.-23. May.<br />

Quirmbach M., Papadakis I., Einfalt T., Langstädtler G., Mehlig B., (2009): Analysis <strong>of</strong> Precipitation <strong>data</strong> from<br />

climate model calculations for North Rhine-Westphalia. 8 th international workshop on precipitation in urban areas , St.<br />

Moritz, Switzerland, 10.-13. December, ISBN 978-3-909386-27-7.<br />

Steiner M., Smith J. A., Burges S. J., Alonso C. V., and Darden R.W., (1999): Effect <strong>of</strong> bias adjustment and rain gauge<br />

<strong>data</strong> <strong>quality</strong> <strong>control</strong> on radar rainfall estimation, Water Resources Research, 35 (8), 2487–2503.<br />

Vaitl W., (1988): Beschreibung der Prüfkriterien für die Qualitätskontrolle stündlicher bzw. 10-minütiger Daten von<br />

automatischen agrarmeteorologischen Stationen der Bayerischen Landesanstalt für Bodenkultur und<br />

Pflanzenbau. p 16, München-Freising.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!