28.02.2013 Views

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A Summary of Genomic Databases: Overview <strong>and</strong> Discussion 45<br />

Fig. 3. Database schemas<br />

based on the genolist schema are, for example, AureoList [2], LegioList<br />

[19] <strong>and</strong> PyloriGene [22].<br />

The Chado Schema [8]: this is a relational schema having an extensible<br />

modular structure, <strong>and</strong> is used in some model organism databases such<br />

as BeetleBase [3] <strong>and</strong> FlyBase [42].<br />

The Pathway Tools Schema [39]: this is an object schema, used in Pathway/Genome<br />

databases. It is based on an ontology defining a large set<br />

of classes (there are about 1350 of them), attributes <strong>and</strong> relations to<br />

model biological data, such as metabolic pathways, enzymatic functions,<br />

genes, promoters <strong>and</strong> genic regulation mechanisms. Databases based on<br />

this schema are <strong>Bio</strong>Cyc [4], EcoCyc [11] <strong>and</strong> HumanCyc [18].<br />

Unspecific Schema refers to databases which do not correspond to any<br />

of the specific groups illustrated above; in particular, most of them are<br />

based on a relational schema, without any special adaptation to manage<br />

biological data.<br />

2.3 Query Types<br />

In Figure 4, query types supported by the analyzed databases are illustrated.<br />

By simple querying it is possible to recover data satisfying some st<strong>and</strong>ard<br />

search parameters such as, e.g., gene names, functional categories <strong>and</strong> others.<br />

Batch queries consist of bunches of simple queries that are simultaneously<br />

processed. The answers to such queries result as a combination of the answers<br />

obtained for the constituent simple queries. Analysis queries are more<br />

complex <strong>and</strong>, somehow, more typical of the biological domain. They consist<br />

in retrieving data based on similarities (similarity queries) <strong>and</strong> patterns<br />

(pattern search queries). The former ones take in input a DNA or a protein<br />

(sub)sequence, <strong>and</strong> return those sequences found in the database that are the<br />

most similar to the input sequence. The latter ones take in input a pattern p<br />

<strong>and</strong> a DNA sequence s <strong>and</strong> return those subsequences of s, whichturnoutto<br />

be most strongly related to the input pattern p. Asanexample,considerthe<br />

pattern TATA, that is, a pattern denoting a regulative sequence upstream

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!