28.02.2013 Views

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

A Summary of Genomic Databases: Overview <strong>and</strong> Discussion 47<br />

sets of words, also combining them by logical operators. Such keywords are to<br />

be searched for in specific <strong>and</strong> not a-priori determined fields <strong>and</strong> tables of the<br />

database. With forms, searching starts by specifying the values to look for<br />

that are associated to attributes of the database. Multi-attributes searches<br />

can be obtained by combining the sub-queries by logical operators. As an<br />

example, ArkDB [1] supports both free text <strong>and</strong> forms queries.<br />

Graphical interaction based methods are also quite common in genomic<br />

database access systems. In fact, a large set of genomic information can be<br />

visualized by physical <strong>and</strong> genetic maps, whereby queries can be formulated<br />

by interacting with graphical objects representing the annotated genomic<br />

data. Graphical support tools have been indeed developed such as, e.g., the<br />

GenericGenomeBrowser(GBrowser)[48], which allows the user to navigate<br />

interactively through a genome sequence, to select specific regions <strong>and</strong> to<br />

recover the correspondent annotations <strong>and</strong> information. Databases such as<br />

CADRE [7] <strong>and</strong> HCVDB [17], for example, allow the user to query their<br />

repository by graphical interaction.<br />

Sequence based queries rely on the computation of similarity among genomic<br />

sequences <strong>and</strong> patterns. In this case, the input is a nucleotides or<br />

amino-acid sequence <strong>and</strong> data mining algorithms, such as in [29], are exploited<br />

to process the query, so that alignments <strong>and</strong> similar information are<br />

retrieved <strong>and</strong> returned.<br />

Query language based methods exploit DBMS-native query languages<br />

(e.g., SQL-based ones), allowing a “low level” interaction with the databases.<br />

At the moment, few databases (WormBase [28] <strong>and</strong> GrainGenes [15])<br />

support them.<br />

2.5 Result Formats<br />

In genomic databases, several formats are adopted to represent query results.<br />

Web interfaces usually provide answers encoded in HTML, but other formats<br />

are often available as well. In the following discussion, we shall refer to the<br />

formats adopted for representing query results <strong>and</strong> not to those used for<br />

ftp-downloadable data.<br />

Flat files are semi-structured text files where each information class is reported<br />

on one or more than one consecutive lines, identified by a code used<br />

to characterize the annotated attributes. The most common flat files formats<br />

are the GenBankFlatFile(GBFF)[41] <strong>and</strong> the European Molecular <strong>Bio</strong>logy<br />

Laboratory Data Library Format [44]. These formats represent basically the<br />

same information contents, even if with some syntactic differences. Databases<br />

supporting the GenBank Flat File are, for example, CADRE [7] <strong>and</strong> Ensembl<br />

[13], whereas databases supporting the EMBL format are CADRE [7], Ensembl<br />

[13], <strong>and</strong> ToxoDB [37].<br />

The comma/tab <strong>and</strong> the attribute-value separated files are similar to flat<br />

files, but featuring some mildly stronger form of structure. The former<br />

ones, supported for example by <strong>Bio</strong>Cyc [4], TAIR [46] <strong>and</strong> HumanCyc [18],

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!