28.02.2013 Views

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Extraction of Constraints from <strong>Bio</strong>logical Data 175<br />

Rules are usually represented in the form: body → head [support, confidence].<br />

Body <strong>and</strong> head are two arbitrary sets of data items, such that body ∩ head = Ø.<br />

Support (s) <strong>and</strong> confidence (c) are used to measure the quality of an association<br />

rule. They are computed as shown in equations (1) <strong>and</strong> (2).<br />

n<br />

=<br />

n<br />

s bh (1)<br />

n<br />

n<br />

bh c =<br />

(2)<br />

b<br />

In Eq. (1), n bh is the number of data instances that contain both body <strong>and</strong> head,<br />

<strong>and</strong> n the cardinality of the relation (i.e., the number of data instances in the relation).<br />

In Eq. (2), n b is the total number of data instances containing the body [12].<br />

In Eq. (1), n bh is also called absolute support, while the support s is a relative value<br />

with respect to the total number of tuples.<br />

For example, the rule ((City=Paris)AND(Age=30)) → (Preferred product=Car)<br />

[0.5%, 60%] means that thirty years old people living in Paris whose<br />

preferred product is a car are the 0.5% of the buyers stored in the database. It<br />

means also that the preferred product for 60% of thirty years old people living in<br />

Paris (stored in the buyers database) is a car.<br />

4 Constraint Extraction in <strong>Bio</strong>logical Data<br />

In this section we describe how constraints can be extracted from a database using<br />

a combination of the previously introduced techniques. Fig. 1 synthesizes the<br />

phases of the proposed approach, exploited in [3] for a different application. The<br />

method is based on association rule extraction, based on the Apriori algorithm [1].<br />

Any other association rule mining algorithm may be substituted as a building<br />

block. Association rules are extracted form the data. Next, tuple constraints <strong>and</strong><br />

functional dependencies are identified by analyzing the extracted rules.<br />

If the database constraints <strong>and</strong> dependencies are already known, they can be<br />

used as a criterion to evaluate the accuracy of the method. In this case, when a table<br />

row does not satisfy a tuple constraint or a functional dependency, its data is<br />

not correct.<br />

Constraint knowledge may be exploited to improve data quality <strong>and</strong> integration<br />

in database design, <strong>and</strong> to perform query optimization <strong>and</strong> dimensional reduction.<br />

We show an application of our method for identifying anomalies with respect to<br />

the detected constraints which can be used to improve domain knowledge.<br />

4.1 Quasi Tuple Constraints <strong>and</strong> Quasi Functional Dependencies<br />

The concepts of support <strong>and</strong> confidence of a detected rule are used to determine its<br />

frequency <strong>and</strong> its strength. An association rule with confidence equal to 1 means<br />

that there is a tuple constraint between the attribute values in the head <strong>and</strong> the ones

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!