28.02.2013 Views

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Extraction of Constraints from <strong>Bio</strong>logical Data 173<br />

adopted representation for databases. Every concept is represented by a relation,<br />

i.e. a table. In the biological domain well known examples of relational databases<br />

are cuticleDB 1 , RCSB Protein Data Bank 2 , Identitag 3 , AbMiner 4 , PfamRDB 5 , <strong>and</strong><br />

Reactome 6 .<br />

Since older databases often lack a relational structure, many tools to parse <strong>and</strong><br />

load their data into a relational schema have been developed. Indeed, the relational<br />

model is more convenient than other data representations. For example the <strong>Bio</strong>-<br />

Warehouse 7 toolkit enables to translate the flat file representation of some databases,<br />

such as SwissProt 8 , NCBI 9 , KEGG 10 <strong>and</strong> GO 11 , into a relational schema.<br />

A relation consists of many tuples (rows), each of which represents an instance<br />

of an entity (i.e., a concept), <strong>and</strong> many columns, which represent the attributes<br />

(i.e., properties) of the entity. For example, in a protein database, the protein is an<br />

entity <strong>and</strong> its structure, function <strong>and</strong> sequence are its attributes. Every row related<br />

to a specific protein with all its attribute values is an instance of the protein entity.<br />

The table that contains all the proteins with their attributes is a relation.<br />

The relational model is characterized by a structured, fixed format, since data<br />

values have to be homogeneous <strong>and</strong> have to meet several constraints. The schema<br />

constraints may be known or unknown in advance. In both cases our analysis is<br />

helpful for detecting <strong>and</strong> investigating instance constraints, which may suggest<br />

novel information <strong>and</strong> detect errors.<br />

3.2 Constraints<br />

Constraints are assertions on permissible or consistent database states <strong>and</strong> specify<br />

data properties that should be satisfied by valid instances of the database. They are<br />

usually defined in the form of expressions that provide a boolean value, indicating<br />

whether or not the constraint holds.<br />

Let us now consider, as an example, a relational database containing information<br />

about students of a university. In particular, we consider only a relation Student<br />

(StudentID, Name, Age, City, Country, Mail). Each student is described by<br />

some attributes, among which there are a unique identifier (StudentID), his/her<br />

name (Name), city (City), country (Country) <strong>and</strong> mail address (Mail).<br />

1 http://biophysics.biol.uoa.gr/cuticle<br />

2 http://pdbbeta.rcsb.org<br />

3 http://pbil.univ-lyon1.fr/software/identitag<br />

4 http://discover.nci.nih.gov/abminer<br />

5 http://sanger.ac.uk/pub/databases/Pfam<br />

6 http://www.reactome.org<br />

7 http://brg.ai.sri.com/biowarehouse<br />

8 www.ebi.ac.uk/swissprot<br />

9 www.ncbi.nih.gov<br />

10 www.genome.jp/kegg<br />

11 www.geneontology.org

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!