27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Tool Support for Anomaly Detection in Scientific<br />

Sensor Data<br />

Irbis Gallegos<br />

The University of Texas at El Paso<br />

Department of Computer Science<br />

El Paso, USA<br />

irbisg@utep.edu<br />

Ann Gates<br />

The University of Texas at El Paso<br />

Department of Computer Science<br />

El Paso, USA<br />

agates@utep.edu<br />

Abstract— Environmental scientists working on understanding<br />

global changes and their implicatio ns for humanity, collect data in<br />

near-real time at remote locations using a variety of instruments. As<br />

the amount and complexity of collected data increases, so does the<br />

amount of time and domain knowledge required to determine if the<br />

collected data are correct and if the data collection instruments are<br />

working correctly. The Sensor Data Verification (SDVe) tool allows<br />

scientists to detect anom alies in sensor data. SD Ve evaluates if<br />

scientific datasets, which can be provided at near -real time or from<br />

file repositories, satisfy reusable data properties specified by<br />

scientists.<br />

Keywords- Big Data; Cyberinfrastructure; Data Quality E co-<br />

Informatics; Software Engineering.<br />

I. INTRODUCTION<br />

Scientists collect and analyze large amounts of data to<br />

determine the causes of changes in ecological ecosystems.<br />

Scientist use Eddy covariance (EC) towers [1] to collect<br />

measurements needed to understand such ecological changes.<br />

In particular, scientists use m easurements taken by EC towers<br />

to monitor carbon dioxide (CO 2 ), water balance (H 2 O), energy<br />

balance (irradiance), and other meteorological measurements<br />

such as temperature and atmospheric pressure.<br />

Anomaly detection in eddy cova riance data must not only<br />

identify instrument errors and problems with the sensors, but<br />

also evaluate how closely conditions fulfill the theoretical<br />

assumption underlying the method [2]. Anomaly detection<br />

must be done in real time or s hortly after the measurements to<br />

minimize data loss by reducing the time to detect and fix<br />

instrument problems. The quality control procedures,<br />

instrument malfunctions, maintenance and calibration periods<br />

often remove 20 to 40% of the data. Efficient anomaly<br />

detection is an outstanding pr oblem that is incom pletely<br />

fulfilled in most of the FLUXNET networks [2].<br />

In most cases, scientists manually evaluate eddy covariance<br />

data by using a variety of time consuming methods and a<br />

variety of customized software that has to be constantly<br />

modified and recompiled. In a ddition, the data evaluation is<br />

highly dependent on the expertise of the scientist, however,<br />

such knowledge is typically not captured nor reused. The<br />

sensor data verification (SD Ve) tool mitigates the<br />

aforementioned limitations by allowing scientists to<br />

automatically identify anomalies produced by environmental<br />

events and equipment malfunctioning in scientific sensor<br />

datasets collected at near-real tim e or extracted from file<br />

repositories. SDVe evaluates whether a dataset satisfies a set<br />

of formally specified data properties. SD Ve identifies<br />

anomalies, i.e., a deviation from an expected value, due to<br />

environmental variability or instrum ent malfunctioning, and<br />

raises alarms whenever anomalies are detected. SDVe does<br />

not require source code recompilation and allows expertdefined<br />

data properties to be reused to verify the datasets.<br />

Section 2 provides the background on the software<br />

engineering techniques adapted and extended by SDVe.<br />

Section 3 describes the data property specification approach<br />

used by scientists to specify the data properties that are input to<br />

SDVe. Section 4 explains the S DVe tool. Section 5 describes<br />

the experimental setup and the re sults of using SDVe to detect<br />

anomalies in Eddy covariance data, and section 6 discusses<br />

some of the lessons learned from the experiment. Finally<br />

section 7 described some efforts related to SDVe, followed by<br />

the concluding remarks in Section 8.<br />

II. BACKGROUND<br />

SDVe uses software engineer ing techniques, which have<br />

been used to provide assurance for critical softw are systems,<br />

to detect anom alies in scientific sensor datasets. T his section<br />

provides the background information associated to such<br />

software engineering techniques.<br />

A. Property Specification<br />

The specification and pattern system (SPS) [3] was<br />

introduced to assist practitioners to formally specify software<br />

and hardware properties.<br />

In the SPS, a specification consists of scopes and patterns.<br />

Scopes define the portion of a program over which the<br />

property holds. Patterns describe the structure of specific<br />

behaviors and define relationships betw een propositions.<br />

Propositions are used to represent Boolean expressions that are<br />

evaluated over the program execution.<br />

SPS patterns are divided into two groups: Occurrence and<br />

Order patterns. Occurrence patterns deal with single event or<br />

condition and specify the rate at which that condition or event<br />

occurs. Order patterns relate two conditions or events and<br />

specify the order at which they occur. In this context,<br />

conditions are propositions that hold in one or m ore<br />

consecutive states. Events are instants at w hich a proposition<br />

changes values in two consecutive states.<br />

678

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!