14.06.2013 Views

Databases and Systems

Databases and Systems

Databases and Systems

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

208<br />

<strong>and</strong> returns the translated result to the query execution module for further processing<br />

within the original CPL query.<br />

Non-Human Homolog Search<br />

To illustrate how BioKleisli executes queries, we will walk through an example:<br />

“Find information on the known DNA sequences on human chromosome 22, as well<br />

as information on homologous sequences from other organisms.” The strategy taken<br />

in writing this query will be to combine information from relational GDB <strong>and</strong> ASN. 1<br />

GenBank. GDB is queried for information about the accession numbers of DNA<br />

sequences known to be within chromosome 22. The NA-Homolog-Summary function<br />

available in the Entrez interface to ASN.l GenBank is then invoked to retrieve<br />

homologous sequences (i.e., sequences with significant similarity to the original).<br />

The homologous sequences are then filtered to retrieve only non-human entries. The<br />

final answer is printed as a nested relation.<br />

The GDB Query.<br />

The GDB query joins three tables – locus, object_genbank_eref, <strong>and</strong> the portion of<br />

the locus_cyto_location that corresponds to entries on Chromosome 22– over the<br />

locus_symbol field, <strong>and</strong> projects over the locus_symbol <strong>and</strong> genbank_ref fields.<br />

Assuming that the function GDB has been registered within BioKleisli to access the<br />

contents of a table whose name must be specified by the query writer, the query is<br />

simply written in CPL as<br />

define Loci22 =<br />

setof rcd { locus-symbol:x, genbank-ref: y}<br />

where rcd{locus-symbol: \x, locus_id ;\a,...} (“locus”),<br />

rcd {genbank_ref : \y, object_id : a, object_class_key: 1, ...}<br />

(“object_genbank_eref”),<br />

rcd{loc_cyto_chrom_num: “22”, locus_cyto_location_id : a, ...}<br />

(“locus_cyto_location”)<br />

Note that we could also have written this query by registering separate functions for<br />

each table accessed (locus, locus_cyto_location <strong>and</strong> locus_symbol) <strong>and</strong> that this<br />

would have given query writers an idea of the names available within GDB.<br />

If executed as written, Loci22 would generate three separate SQL queries to GDB,<br />

each of which would extract the contents of a table. The optimizer, however,<br />

improves this by writing the entire function as a single SQL query. In fact, a feature<br />

of the optimizer is that it is capable of moving the largest possible subquery of a CPL<br />

query to an external server for execution. This can be done for a wide variety of<br />

types of data sources (5), including relational <strong>and</strong> ASN.l-Entrez data sources. The

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!