27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

with specific data granularity and quantitative values, have to<br />

be built for specific parts of the season and diurnal cycles.<br />

Data Properties can be specified to docum ent scientific<br />

knowledge about processes or to identify anomalies in<br />

scientific sensor data. D ata properties specified to detect<br />

anomalies in sensor data are typically specific about the sensor<br />

names and thresholds over which the data should be evaluated,<br />

can be interpreted and used to evaluate data by SDVe without<br />

further manipulation to the property, and can be used to<br />

document the scientific processes. D ata properties that are<br />

specified for the sole purpose of documenting processes are<br />

typically general in their descriptions, e.g., only describe<br />

which sensor reading will be evaluated and how it will be<br />

used, but do not include the speci fic name or threshold values<br />

to be evaluated. Data propertie s for documenting processes<br />

might also include computationa l methods that need to be<br />

applied to data before the data can be evaluated. These types<br />

of data properties are not suitable for the current version of<br />

SDVe.<br />

VII. RELATED WORK<br />

Some of the approaches that are frequently used to detect<br />

anomalies in sensor data are described in this section.<br />

The Intelligent Outlier Detection Algorithm (IODA) [16] is<br />

a technique used to perform quality control on tim e series<br />

data. IODA uses statistics, graph theory, image processing and<br />

decision trees to determine if data are correct. The I ODA<br />

algorithm compares incoming data, which are treated as<br />

images, to common patterns of failure. SDVe differs from this<br />

approach in that it is a scientist-centered approach in which the<br />

anomaly detection process is based on the expert scientific<br />

knowledge captured by the data properties.<br />

Dereszynski & Dietterich [1 7] use a Dynamic Bayesian<br />

Network (DBN) [18] approach to automatic data cleaning for<br />

individual air temperature data streams. The D BN combines<br />

discrete and conditional linear-Gaussian random variables to<br />

model the air temperature as a function of diurnal, seasonal,<br />

and local trend effects. The approach uses a general fault<br />

model to classify different type of anomalies. SDVe differs<br />

from this approach in that no mathematical or logic knowledge<br />

has to be acquired by the scientists to verify their data as it is<br />

the case with the DBN.<br />

EdiRe [19] is a software tool for eddy covariance and<br />

microclimatological measurement analysis. EdiRe is adaptable<br />

to most eddy covariance raw da ta formats and microclimate<br />

data, however it requires processing routines to be developed<br />

and redesigned to address the different areas associated with<br />

data analysis as opposed to SDVe that do not required any<br />

software implementation or recompilation.<br />

VIII. SUMMARY<br />

Scientists collecting large amounts of complex<br />

environmental sensor data need to be assured that their data<br />

are correct in a timely fashion. The SDve prototype tool adapts<br />

software engineering techniqu es to allow scientists to verify<br />

sensor datasets against predefined form ally specified data<br />

properties. SDVe automatically verifies datasets and raises<br />

flags whenever anomalies occur in the datasets. SD Ve has<br />

been used to identify anomalies in eddy covariance data.<br />

ACKNOWLEDGMENT<br />

The authors would like to thank Dr. Deanna Pennington,<br />

Dr. Craig Tweedie and Aline Jaim es for their invaluable input<br />

towards this work.<br />

REFERENCES<br />

[1] A. Jaimes. Presentation, Topic: “Defining Properties for Eddy<br />

Covariance Data,” Cybershare-Center, The University of Texas at El<br />

Paso, 2010.<br />

[2] X. Lee, W. Massman, B. Law. Handbook of Micrometeorology: A Guide<br />

for Surface Flux Measurements and Analysis. 1st ed., Kuwer Academic<br />

Publisher, 2004, pp. 181-208.<br />

[3] M.B Dwyer, G.S. Avrunin, J.C. Corbe tt. “A System of Specification<br />

Patterns,” in Proc. of the 2nd Workshop on Formal Methods in Software<br />

Practice, 1998.<br />

[4] O. Mondragon, A.Q. Gates. “Supporting Elicitation and Specification of<br />

Software Properties through Patterns and Composite Propositions,” Intl.<br />

Journal Software Engineering and <strong>Knowledge</strong> Engineering, vol. 14(1),<br />

Feb. 2004.<br />

[5] D. Peters. “Automated Testing of Real-Time <strong>Systems</strong>.” Technical report,<br />

Memorial University of Newfoundland, 1999.<br />

[6] N. Delgado, A.Q. Gates, S. Roach. “A Taxonomy and Catalog o f<br />

Runtime Software-Fault Monitoring Tools,” in IEEE Trans. Softw. Eng.<br />

30, 2004, pp. 859-872.<br />

[7] G. Holtzmann. “The Spin Model Checker,” in IEEE Transactions on SE,<br />

vol. 23(5), 1997, pp. 279-295.<br />

[8] I. Gallegos, A.Q. Gates, C.E. Tweedie. “Toward Improving<br />

Environmental Sensor Data Quality: A Preliminary Property<br />

Categorization,” in <strong>Proceedings</strong> of the International Conference on<br />

Information Quality (ICIQ), 2010.<br />

[9] S. Konrad, B.H.C. Cheng. “Facilitating the Construction of Specification<br />

Pattern-Based Properties,” in Proc. IEEE Requirements Engineering,<br />

2005, pp. 329-338.<br />

[10] I. Gallegos, A.Q. Gates, C.E. Tweedie. “DaProS: A Data Property<br />

Specification Tool to Capture Scientific Sensor Data Properties,” in<br />

<strong>Proceedings</strong> of the Workshop on Domain Engineering DE@ER10, 2010.<br />

[11] J.A. Hourcle. “Data Relationships: Towards a Conceptual Model of<br />

Scientific Data Catalogs,” in Eos Trans. AGU, vol. 89(53), 2009.<br />

[12] Campbells Scientific. “Instruction manual: CSAT Three Dimensional<br />

Sonic Anemometer.” Logan, Utah, Campbells Scientific Inc.: 70. 2008.<br />

[13] Campbells Scientific. “Open Path Eddy Covariance Training.” Logan,<br />

CSI. 2009.<br />

[14] K.M. Havstad, L.F. Huenneke, W. H. Schlesinger., “Structure and<br />

Function of a Chihuahua Desert Ecosystem: The Jornada Basin Long -<br />

Term Ecological Research Site,” Oxford University Press, 1st Edition,<br />

2006, pp. 44-80.<br />

[15] Weather Underground. “WeatherUnderground,” Internet:<br />

http://www.wunderground.com/. [Feb, <strong>2012</strong>].<br />

[16] R.A. Weekley, R.K. Goodrich, L.B. Cornman. “An Algorithm for<br />

Classification of Outlier Detection of Time-Series Data,” in Journal of<br />

Atmospheric & Oceanic Technology, vol. 27, pp. 94-107, 2010.<br />

[17] E.W. Dereszynski, T.G. Dietterich. “A Probabilistic M odel for Anomaly<br />

Detection in Remote Sensor Streams.” M.A thesis, Oregon State<br />

University, USA, 2007.<br />

[18] T. Dean, K. Kanazawa. “Probabilistic Temporal Reasoning,” in Proc.<br />

AAAI, 1988, pp. 524-529.<br />

[19] The University of Edinburgh School of GeoSciences. “EdiRe,” Internet:<br />

http://www.geos.ed.ac.uk/abs/research/micromet/EdiRe/ [Feb, <strong>2012</strong>]<br />

683

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!