Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4<br />
methods of integrating maps in the erstwhile human Genome Database (GDB).<br />
Cooper et al describe their database of human mutations, an early entry in the<br />
increasingly important field of variation databases.<br />
The article by Nadkarni et al provides a link to the developing field of<br />
neuroinformatics, which is concerned with databases of such neuroscience data such<br />
as structural (MR or CT) or functional (PET, fMRI) images of the brain, histological<br />
slices, EEG <strong>and</strong> MEG data, cellular <strong>and</strong> network models, single cell recordings, <strong>and</strong><br />
so on. [15]. This article is included not only as a representative of neuroinformatic<br />
work, but because it is one of the few current neuroinformatics efforts that links the<br />
molecular scale of bioinformatics to the neurophysiological scale, since it addresses<br />
the physiology of olfaction from the receptor sequences up to cellular <strong>and</strong> network<br />
physiology.<br />
Eppig et al describe the Mouse Genome Database (MGD) <strong>and</strong> its companion<br />
system, the mouse Gene Expression Database (GXD). One of the key challenges for<br />
the next generation of databases is to begin to span the levels of organization between<br />
genotype <strong>and</strong> phenotype, where the processes of development <strong>and</strong> physiology reside.<br />
Baldock et al describe an anatomical atlas of the mouse suitable for representing<br />
spatiotemporal patterns of gene expression; the Edinburgh (Baldock et al) <strong>and</strong><br />
Jackson Laboratory (Eppig et al) projects are collaborating to link the genetic <strong>and</strong><br />
spatial databases together. The plant kingdom, which has recently experienced a<br />
rapid acceleration of genomic scrutiny in both the private <strong>and</strong> public sectors, is<br />
represented in articles on MaizeDB by Polacco <strong>and</strong> Coe <strong>and</strong> on the USDA’s<br />
Agricultural Genome Information System by Beckstrom-Sternberg <strong>and</strong> Jamison.<br />
Gelbart et al describe the rich integration of genomic <strong>and</strong> phenotypic data on<br />
Drosophila in Flybase. Mary Berlyn describes the E.coli Genetic Stock Center<br />
Database, which provides query-by-genotype access to the stock center’s extensive<br />
collection of mutant strains.<br />
The Software section contains a number of articles that address one or<br />
another aspect of the problem of integrating data from heterogenous sources. There<br />
are two common ways to achieve such integration: federation, in which the data<br />
continue to reside in separate databases but a software layer makes them act as a<br />
single integrated collection, <strong>and</strong> physical integration, often called warehousing, in<br />
which the data are combined into a single repository for querying purposes. Both<br />
approaches involve transforming the data into a common format; federation does the<br />
transformation at query time, whereas warehousing does it as a preprocessing step.<br />
One consequence of this difference is that warehouses are more difficult to keep<br />
current as the underlying databases are updated. The choice of federation vs.<br />
warehousing has performance implications as well, though they are not always easy<br />
to predict. A warehouse can map in a straightforward way to a DBMS product, <strong>and</strong><br />
make full use of the tuning <strong>and</strong> optimization capabilities of that product. Federated<br />
systems must pay the price of translating queries at run-time, possibly doing<br />
unoptimized distributed joins of query fragments across multiple databases, <strong>and</strong><br />
converting data into the st<strong>and</strong>ard form. It is also possible for federated systems to