You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
107<br />
For the purposes of the rest of this article, we assume that the system is<br />
implemented using an object-relational engine, such as Illustra/Informix Universal<br />
Server or Oracle version 8. With a straightforward design approach, each class/entity<br />
would be represented as a table, while associations between classes would be<br />
represented as "bridge" tables with foreign keys referencing one or more entity<br />
tables. Often subclasses might need to be derived from a parent class, to store<br />
additional, specific information.<br />
With highly heterogeneous data such as is typical of NS data, this approach<br />
eventually yields a significant number of classes as well as a complex class hierarchy.<br />
More important, the bridge tables can get potentially unmanageable because, with M<br />
classes, there are potentially M C2 bridge tables for binary relationships alone<br />
(ignoring the possibility of recursive relationships). We must therefore seriously<br />
consider ways of simplifying the schema.<br />
The Object Dictionary Approach<br />
A well-known approach, which we term the Object Dictionary (OD) approach, solves<br />
the problem partially. (This technique, we believe, was pioneered by Tom Slezak's<br />
team in the course of the Lawrence Livermore chromosome 19 mapping project (11).<br />
It was subsequently adopted in production systems such as version 5 of the Human<br />
Genome Database (12), as well as in DNA Workbench, a package to manage<br />
physical mapping data within a chromosomal region (13).) In the OD approach, all<br />
classes within the system are children of a parent "Object" class. The Objects table<br />
contains information on every "object" (class/subclass instance) within a system, with<br />
each row typically containing at least the following information: a machine-generated<br />
ID, object name <strong>and</strong> object class ID. (The last is a foreign key into a Classes table.)<br />
The details of a particular object are found in a class-specific table whose structure is<br />
specific to the object's class or subclass, <strong>and</strong> which is related one-to-one to the<br />
Objects table.<br />
One advantage of the OD approach is that, because all object names <strong>and</strong><br />
definitions are stored in one place, one can create supporting tables (e.g., synonym /<br />
keyword tables) to build search tools that have some semblance of intelligence.<br />
(Synonyms occur very commonly in NS data: the terms 5-HT, serotonin <strong>and</strong> 5hydroxy-tryptamine<br />
refer to the same neurotransmitter molecule.) It is unreasonable<br />
to insist that most users of an NS database specify the class of object along with the<br />
object name in a query, when the name is often unique enough. (For example,<br />
"muscarinic" can only refer to a receptor class, <strong>and</strong> amacrine refers only to a class of<br />
retinal neurons.) Only if the term specified by the user is ambiguous is it necessary<br />
for the system to display all likely c<strong>and</strong>idates, <strong>and</strong> force the user to select one.<br />
The OD approach is particularly useful for managing binary associations. Instead<br />
of numerous bridge tables for each pair of object classes, we have a single<br />
Associations table with at least three columns: Object ID 1, Object ID 2, description<br />
of relationship. (In an archival database that gathers information from multiple<br />
sources, there is typically a fourth column, a citatiodreference.) Only a single