28.02.2013 Views

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Extraction of Constraints from <strong>Bio</strong>logical Data 171<br />

(erroneous data used for the generation of new data), <strong>and</strong> staleness (unnoticed<br />

changes in data could produce the falsification of other data which depend on it).<br />

These problems lead to semantic errors <strong>and</strong> the resulting information does not<br />

represent the real-world facts correctly. Data dependencies inherent to data<br />

production process <strong>and</strong> data usage make genome data predestined for transmitting<br />

errors [21].<br />

Most existing works, such as [11], [23] <strong>and</strong> [20], focus on inaccuracy, lexical<br />

errors, redundancy problems <strong>and</strong> enforcement of integrity constraints, but ignore<br />

the functional constraint violations. Moreover, due to the large amount of data in<br />

existing databases, a tool for automatically detecting relationships among data can<br />

be useful for biological specialists to improve the domain knowledge. The objective<br />

is to define an algorithm that automatically infers rules from data: rules can be<br />

maintained by means of an incremental approach even if data are updated.<br />

We propose a method to discover tuple constraints <strong>and</strong> functional dependencies<br />

among data by means of association rule mining. Constraints <strong>and</strong> dependencies<br />

show semantic relationships among attributes in a database schema. Their knowledge<br />

can be exploited to improve data quality <strong>and</strong> integration in database design,<br />

<strong>and</strong> to perform query optimization <strong>and</strong> dimensional reduction. Association rules<br />

are a well-known data mining tool. They have been applied to biological data<br />

cleaning for detecting outliers <strong>and</strong> duplicates [17], <strong>and</strong> to Gene Ontology for finding<br />

relationships among terms of the three ontology levels (cellular components,<br />

molecular functions <strong>and</strong> biological processes) [18], [6], but not for finding constraints,<br />

dependencies or anomalies.<br />

By means of association rules we detect correlation relationships among attribute<br />

values. Then, by analyzing the support <strong>and</strong> confidence of each rule, (probabilistic)<br />

tuple constraints <strong>and</strong> functional dependencies may be detected. They may<br />

both show the presence of erroneous data <strong>and</strong> highlight novel semantic information.<br />

We present experiments on biological databases <strong>and</strong> we validate our method<br />

by verifying its correctness <strong>and</strong> completeness. Finally, we show how, by means of<br />

a further analysis of the obtained results, domain knowledge <strong>and</strong> data quality may<br />

be improved.<br />

2 Application of Constraint Extraction<br />

Recently, new ways to mine patterns <strong>and</strong> constraints in biological databases have<br />

been proposed. In [16] the authors focus on constrained pattern mining on the<br />

“transposed” database, thus facing a smaller search space when the number of attributes<br />

(genes) is orders of magnitude larger than the number of objects (experiments).<br />

They present a theoretical framework for database <strong>and</strong> constraint<br />

transposition, discuss the properties of constraint transposition <strong>and</strong> look into classical<br />

constraints. Our approach does not require the data to be transposed <strong>and</strong><br />

aims at discovering constraints among attributes instead of constraints associated<br />

with patterns.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!