14.06.2013 Views

Databases and Systems

Databases and Systems

Databases and Systems

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

86<br />

A common problem in the construction of databases of mapping information is<br />

how to optimally align or integrate mapping data from different methods, such as<br />

genetic or physical mapping, or different sources, such as radiation hybrid maps<br />

produced by different labs or from different mapping panels. There main motivations<br />

for aligning maps are to support database searches of chromosomal regions of<br />

interest <strong>and</strong> to produce better graphic displays. Several approaches to these problems<br />

have been explored <strong>and</strong> implemented within GDB; this article describes <strong>and</strong> critiques<br />

those methods.<br />

Map Querying<br />

An important query for GDB is to find all loci in a region of interest, sometimes with<br />

additional restrictions as to the type of locus, existence of polymorphisms, or<br />

functional category. The region of interest might in principle be specified in any of<br />

several ways, such as the region between two specified loci, or the neighborhood of a<br />

specified locus for some number of units in each direction. GDB stores many maps of<br />

many different types, <strong>and</strong> it is desirable to have such positional search across all<br />

maps of a region simultaneously. Intuitively, we want the database to function as a<br />

stack of aligned maps, <strong>and</strong> a query to cut a thick slice through that stack (see Figure<br />

1). A central concern of this article is how to best align the maps for this purpose.<br />

Figure 1 : Query cutting through a stack of aligned maps<br />

The details of the relational implementation of overlap queries are worth pointing<br />

out briefly. Each locus is considered to have a localization interval, consisting of a<br />

minimum <strong>and</strong> a maximum coordinate². The query to retrieve stored intervals I<br />

overlapping a query interval Q can be expressed intuitively as the negation of<br />

nonoverlap - find intervals I which are not disjoint from Q. I can be disjoint from Q<br />

² These follow naturally from binned maps, where the coordinates are those of the bin<br />

boundaries; for cytogenetic maps they are the coordinates of the b<strong>and</strong> boundaries,<br />

<strong>and</strong> in general they are coordinates of backbone markers. Backbone or point-like<br />

markers are represented by zero-width intervals. Distance-based linkages can be<br />

converted to intervals by using a suitable multiplier of the st<strong>and</strong>ard error.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!