14.06.2013 Views

Databases and Systems

Databases and Systems

Databases and Systems

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

176<br />

For 20 years, this information was held in very organized format by a single curator,<br />

Dr. Barbara Bachmann, but in the form of cross-referencing filecard catalogs,<br />

notebooks containing pedigree diagrams, genotype descriptions, gene <strong>and</strong> gene<br />

function tables <strong>and</strong> allele information, <strong>and</strong> also, informal notes <strong>and</strong> human memory.<br />

Because of its role in tracking genes <strong>and</strong> alleles, the stock center has also taken on<br />

the task of registering alleles <strong>and</strong> of publishing the linkage map for E. coli K-12 since<br />

1976 (e.g., 1-4). Converting this information to electronic form was a task begun in<br />

1989 as a two-phase development that was functional, in terms of software <strong>and</strong><br />

essential data entry, in early 1990. A major imperative for this was the need to<br />

ensure the continuity of the stock center into the future, <strong>and</strong> the crucial need to<br />

modernize the records as part of this process had been recognized by program <strong>and</strong><br />

division officers at the supporting agency, the National Science Foundation, for some<br />

time. The structured <strong>and</strong> generally consistent nature of the record-keeping that had<br />

evolved during those 20 years, the clear mission of the stock center, the observable<br />

patterns of usage of the various types of data, <strong>and</strong> the absence of a pressing deadline<br />

for completion facilitated a user's needs <strong>and</strong> dataflow analysis that led to conceptual<br />

<strong>and</strong> data models. We wanted the robustness of a commercial relational database<br />

management system <strong>and</strong> eventually chose Sybase from among those available, while<br />

keeping the model itself 'object-oriented'. Some schema modifications occurred<br />

during the implementation phases (with Stan Letovsky the sole software developer<br />

for a rapid development <strong>and</strong> testing process), but the resultant database bore a<br />

striking, perhaps surprising, resemblance to the early plans.<br />

Several aspects of the conceptual data model had either not been included in models<br />

of other databases or were distinctly different from other treatments. Any segment of<br />

the canonical (wildtype) chromosome was modeled as a "Site" (alias locus). This<br />

includes genes, control regions, intergenic <strong>and</strong> intragenic regions, groups of genes,<br />

including operons, segments of the chromosome that were deleted, inserted, or<br />

inverted in structural mutations. Every site has a left endpoint <strong>and</strong> a right endpoint.<br />

Thus overlaps between the end of one gene <strong>and</strong> the beginning of another or inclusion<br />

of a regulatory region within an adjacent gene could be described <strong>and</strong> detected in<br />

searches. The coordinates for these points can have multiple values, reflecting<br />

different map versions, with the current version, of course, setting coordinates<br />

according to completed nucleotide sequence for E. coli K-12 (5). Sites can have<br />

"subsites"; e.g., all the genes carried on a deleted or inverted segment are subsites of<br />

the segment, <strong>and</strong> genes within an operon are subsites of the operon, with each subsite<br />

also being represented as an independent site. Since there was a single isolate of K-<br />

12 that is considered to be THE "wildtype", the structure <strong>and</strong> sequence of this<br />

wildtype chromosome can be used to define the st<strong>and</strong>ard chromosome (genome), <strong>and</strong><br />

deviations from this structure can be described as mutations of the wildtype. Only<br />

mutations need be described in presenting the genotype of a strain; this is the<br />

convention followed by geneticists since the earliest days of the field. It is important<br />

in the database structure, since properties that belong to the gene itself can be

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!