14.06.2013 Views

Databases and Systems

Databases and Systems

Databases and Systems

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

107<br />

For the purposes of the rest of this article, we assume that the system is<br />

implemented using an object-relational engine, such as Illustra/Informix Universal<br />

Server or Oracle version 8. With a straightforward design approach, each class/entity<br />

would be represented as a table, while associations between classes would be<br />

represented as "bridge" tables with foreign keys referencing one or more entity<br />

tables. Often subclasses might need to be derived from a parent class, to store<br />

additional, specific information.<br />

With highly heterogeneous data such as is typical of NS data, this approach<br />

eventually yields a significant number of classes as well as a complex class hierarchy.<br />

More important, the bridge tables can get potentially unmanageable because, with M<br />

classes, there are potentially M C2 bridge tables for binary relationships alone<br />

(ignoring the possibility of recursive relationships). We must therefore seriously<br />

consider ways of simplifying the schema.<br />

The Object Dictionary Approach<br />

A well-known approach, which we term the Object Dictionary (OD) approach, solves<br />

the problem partially. (This technique, we believe, was pioneered by Tom Slezak's<br />

team in the course of the Lawrence Livermore chromosome 19 mapping project (11).<br />

It was subsequently adopted in production systems such as version 5 of the Human<br />

Genome Database (12), as well as in DNA Workbench, a package to manage<br />

physical mapping data within a chromosomal region (13).) In the OD approach, all<br />

classes within the system are children of a parent "Object" class. The Objects table<br />

contains information on every "object" (class/subclass instance) within a system, with<br />

each row typically containing at least the following information: a machine-generated<br />

ID, object name <strong>and</strong> object class ID. (The last is a foreign key into a Classes table.)<br />

The details of a particular object are found in a class-specific table whose structure is<br />

specific to the object's class or subclass, <strong>and</strong> which is related one-to-one to the<br />

Objects table.<br />

One advantage of the OD approach is that, because all object names <strong>and</strong><br />

definitions are stored in one place, one can create supporting tables (e.g., synonym /<br />

keyword tables) to build search tools that have some semblance of intelligence.<br />

(Synonyms occur very commonly in NS data: the terms 5-HT, serotonin <strong>and</strong> 5hydroxy-tryptamine<br />

refer to the same neurotransmitter molecule.) It is unreasonable<br />

to insist that most users of an NS database specify the class of object along with the<br />

object name in a query, when the name is often unique enough. (For example,<br />

"muscarinic" can only refer to a receptor class, <strong>and</strong> amacrine refers only to a class of<br />

retinal neurons.) Only if the term specified by the user is ambiguous is it necessary<br />

for the system to display all likely c<strong>and</strong>idates, <strong>and</strong> force the user to select one.<br />

The OD approach is particularly useful for managing binary associations. Instead<br />

of numerous bridge tables for each pair of object classes, we have a single<br />

Associations table with at least three columns: Object ID 1, Object ID 2, description<br />

of relationship. (In an archival database that gathers information from multiple<br />

sources, there is typically a fourth column, a citatiodreference.) Only a single

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!