28.02.2013 Views

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Extraction of Constraints from <strong>Bio</strong>logical Data<br />

Daniele Apiletti, Giulia Bruno, Elisa Ficarra, <strong>and</strong> Elena Baralis<br />

Department of Control <strong>and</strong> Computer Engineering (DAUIN),<br />

Politecnico di Torino<br />

C.so Duca degli Abruzzi 24, 10129 Torino, Italy<br />

1 Introduction<br />

Data constraints are used in structured <strong>and</strong> unstructured databases to capture realworld<br />

semantics observed in the modeled application domain. In our context, a<br />

constraint can be defined as a set of predicates P 1 ∧ P 2 ∧ ... P k . Each predicate is in<br />

the form C 1θ C 2 , where C 1 is an attribute, θ is a comparison operator <strong>and</strong> C 2 is either<br />

an attribute or a constant [15]. Constraints are assertions on permissible or<br />

consistent database states, <strong>and</strong> specify certain properties of data that need to be<br />

satisfied by valid instances of the database.<br />

Constraints show dependencies among data <strong>and</strong> add semantics to a database<br />

schema. Thus, they are useful for studying various problems such as database<br />

design, query optimization <strong>and</strong> dimensional reduction. Constraints are usually<br />

introduced at design time to describe a priori knowledge. Consequently, the valid<br />

instances of the database are those satisfying simultaneously all constraints. However,<br />

collected data can hide interesting <strong>and</strong> previously unknown information because<br />

of unstated constraints. For example, this happens when data is the result of<br />

an integration process of several sources or when it represents dynamic aspects.<br />

Furthermore, the design process is not always complete <strong>and</strong> the constraint definition<br />

may be omitted from the design. The analysis of heterogeneous data with the<br />

aim of detecting implicit information is in important <strong>and</strong> useful task, which may<br />

become complex due to the size of datasets.<br />

Among the numerous constraint types, we focus on table constraints, which refer<br />

to a single relation of the database. Examples of such constraints are domain<br />

constraints <strong>and</strong> tuple constraints. A domain constraint restricts allowed values of a<br />

single attribute, i.e. it describes its domain. A tuple constraint limits the allowed<br />

values for several (related) attributes of the same tuple. Constraints are properties<br />

of the database schema, thus they can not be directly inferred from data. We will<br />

denote this type of constraints as schema constraints. However, if the constraints<br />

are not a priori known, they can be hypothesized by analyzing database instances.<br />

We can not directly infer schema constraints from data, but we can infer instance<br />

constraints, i.e., constraints which represent the current relationships holding<br />

among data. Instance constraints represent a snapshot on the current database state<br />

A.S. Sidhu et al. (Eds.): <strong>Bio</strong><strong>medical</strong> Data <strong>and</strong> Applications, SCI 224, pp. 169–186.<br />

springerlink.com © Springer-Verlag Berlin Heidelberg 2009

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!