You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
176<br />
For 20 years, this information was held in very organized format by a single curator,<br />
Dr. Barbara Bachmann, but in the form of cross-referencing filecard catalogs,<br />
notebooks containing pedigree diagrams, genotype descriptions, gene <strong>and</strong> gene<br />
function tables <strong>and</strong> allele information, <strong>and</strong> also, informal notes <strong>and</strong> human memory.<br />
Because of its role in tracking genes <strong>and</strong> alleles, the stock center has also taken on<br />
the task of registering alleles <strong>and</strong> of publishing the linkage map for E. coli K-12 since<br />
1976 (e.g., 1-4). Converting this information to electronic form was a task begun in<br />
1989 as a two-phase development that was functional, in terms of software <strong>and</strong><br />
essential data entry, in early 1990. A major imperative for this was the need to<br />
ensure the continuity of the stock center into the future, <strong>and</strong> the crucial need to<br />
modernize the records as part of this process had been recognized by program <strong>and</strong><br />
division officers at the supporting agency, the National Science Foundation, for some<br />
time. The structured <strong>and</strong> generally consistent nature of the record-keeping that had<br />
evolved during those 20 years, the clear mission of the stock center, the observable<br />
patterns of usage of the various types of data, <strong>and</strong> the absence of a pressing deadline<br />
for completion facilitated a user's needs <strong>and</strong> dataflow analysis that led to conceptual<br />
<strong>and</strong> data models. We wanted the robustness of a commercial relational database<br />
management system <strong>and</strong> eventually chose Sybase from among those available, while<br />
keeping the model itself 'object-oriented'. Some schema modifications occurred<br />
during the implementation phases (with Stan Letovsky the sole software developer<br />
for a rapid development <strong>and</strong> testing process), but the resultant database bore a<br />
striking, perhaps surprising, resemblance to the early plans.<br />
Several aspects of the conceptual data model had either not been included in models<br />
of other databases or were distinctly different from other treatments. Any segment of<br />
the canonical (wildtype) chromosome was modeled as a "Site" (alias locus). This<br />
includes genes, control regions, intergenic <strong>and</strong> intragenic regions, groups of genes,<br />
including operons, segments of the chromosome that were deleted, inserted, or<br />
inverted in structural mutations. Every site has a left endpoint <strong>and</strong> a right endpoint.<br />
Thus overlaps between the end of one gene <strong>and</strong> the beginning of another or inclusion<br />
of a regulatory region within an adjacent gene could be described <strong>and</strong> detected in<br />
searches. The coordinates for these points can have multiple values, reflecting<br />
different map versions, with the current version, of course, setting coordinates<br />
according to completed nucleotide sequence for E. coli K-12 (5). Sites can have<br />
"subsites"; e.g., all the genes carried on a deleted or inverted segment are subsites of<br />
the segment, <strong>and</strong> genes within an operon are subsites of the operon, with each subsite<br />
also being represented as an independent site. Since there was a single isolate of K-<br />
12 that is considered to be THE "wildtype", the structure <strong>and</strong> sequence of this<br />
wildtype chromosome can be used to define the st<strong>and</strong>ard chromosome (genome), <strong>and</strong><br />
deviations from this structure can be described as mutations of the wildtype. Only<br />
mutations need be described in presenting the genotype of a strain; this is the<br />
convention followed by geneticists since the earliest days of the field. It is important<br />
in the database structure, since properties that belong to the gene itself can be