28.02.2013 Views

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Extraction of Constraints from <strong>Bio</strong>logical Data 185<br />

family fa≠46459 are selected as either c<strong>and</strong>idate inconsistency or information for<br />

further investigation by biological experts.<br />

6 Conclusions<br />

In this chapter we present a framework for the application of data mining tools to<br />

constraint extraction in the biological domain. We focus on tuple constraint <strong>and</strong><br />

functional dependency detection in representative biological databases by means<br />

of association rule mining. By analyzing association rules we can deduce not only<br />

constraints <strong>and</strong> dependencies, which provide structural knowledge on a dataset<br />

<strong>and</strong> may be useful to perform query optimization or dimensional reduction, but<br />

also anomalies in data, which could be errors or interesting exceptions to be highlighted<br />

to domain experts.<br />

We have applied our analysis to the SCOP <strong>and</strong> CATH databases. We plan to<br />

extend our approach to different database models, such as XML or collections of<br />

relational tables, <strong>and</strong> to integrate automatic distributed inquiry about the detected<br />

anomalies on other databases, in order to help domain experts to distinguish<br />

biological anomalies from errors. Further developments of this work include the<br />

application of our method to heterogeneous data sources, to derive schema information<br />

that may be exploited during data integration.<br />

Acknowledgements<br />

We would like to thank Paolo Garza for his help in association rule extraction <strong>and</strong><br />

for many stimulating discussions.<br />

References<br />

[1] Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: VLDB<br />

Conference, Santiago, Cile (1994)<br />

[2] Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases.<br />

In: International Conference on Very Large Data Bases, pp. 478–499. Morgan<br />

Kaufmann, San Francisco (1994)<br />

[3] Apiletti, D., Baralis, E., Bruno, G., Ficarra, E.: Data Cleaning <strong>and</strong> Semantic Improvement<br />

in <strong>Bio</strong>logical Databases. Journal of Integrative <strong>Bio</strong>-informatics 3(2) (2006)<br />

[4] Atzeni, P., Ceri, S., Paraboschi, S., Torlone, R.: Database Systems - Concepts, Languages<br />

<strong>and</strong> Architectures. McGraw-Hill, New York (1999)<br />

[5] Baralis, E., Garza, P., Quintarelli, E., Tanca, L.: Answering Queries on XML Data by<br />

means of Association Rules. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y.,<br />

Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 260–269. Springer, Heidelberg<br />

(2004)<br />

[6] Bodenreider, O., Aubry, M., Burgun, A.: Non-lexical approaches to iden-tifying Associative<br />

Relations in the Gene Ontology. In: Pacific Symposium on <strong>Bio</strong>computing<br />

(2005)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!