28.02.2013 Views

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

178 D. Apiletti et al.<br />

Rules shown in Table 2 allow us to infer two quasi tuple constraints, which<br />

have a high confidence (e.g., greater than 70%):<br />

1. (Class = Mammalia) → (Reproduction = vivipary) [with confidence of 75.2%]<br />

2. (Class = Aves) → (Reproduction = ovipary) [with confidence of 99.7%]<br />

The first rule means that the majority of mammalians (75.2%) stored in the database<br />

are vivipary, the second one that almost all the birds (99.7%) are ovipary.<br />

To compute the dependency degree between Class <strong>and</strong> Reproduction we consider<br />

all the extracted rules that involve the two attributes (not only those that are<br />

reported in Table 2) <strong>and</strong> we obtain a value of 0.95. Supposing to set a threshold of<br />

0.05, this value is high enough to state that there is a quasi functional dependency<br />

between the two attributes, as reported in Table 3.<br />

Table 3. Dependency degree between Class <strong>and</strong> Reproduction<br />

Quasi-functional dependency Dependency degree<br />

Class → Reproduction 0.95<br />

4.2 Violations of Constraints <strong>and</strong> Dependencies<br />

Constraint extraction has the double aim of discovering hidden relationships<br />

among data, which may improve domain knowledge <strong>and</strong> data quality, <strong>and</strong> investigating<br />

anomalies with respect to these relationships. If a quasi tuple constraint or a<br />

quasi functional dependency has been detected, it means that there are few cases<br />

in which the relationship is not valid. Such cases are anomalies with respect to the<br />

frequent ones. These anomalies can be errors or exceptions in the data. Analyzing<br />

such anomalies can be useful in different application domains to perform data<br />

cleaning or improve the context knowledge.<br />

To better investigate the nature of the detected anomalies, we can analyze the<br />

confidence of the rules that involve the attribute values of the quasi tuple constraint<br />

or the quasi functional dependency. If this value is very low (compared to<br />

the confidence value of the other rules), we can strongly suggest that this is an error,<br />

otherwise it is more likely to be a correct exception.<br />

With respect to the Animal relation introduced in the previous section, we investigate<br />

the rules that involve Class <strong>and</strong> Reproduction with a low confidence<br />

(i.e., lower than 30%). We find the following rules:<br />

1. (Class = Mammalia) → (Reproduction = ovipary) [with confidence of 24.8%]<br />

2. (Class = Aves) → (Reproduction = vivipary) [with confidence of 0.3%]<br />

Both of them represent interesting cases. The first one is a correct, albeit infrequent,<br />

relationship, since there are some mammalians which lay eggs (such as the<br />

Ornithorhynchus). The second one is an error, since there is no bird that is not<br />

ovipary. There is a mistake in the 100th row because the Passer Domesticus has an<br />

ovipary reproduction.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!