Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
143<br />
Genes, including links via published sequence <strong>and</strong> functional similarities to other<br />
molecular <strong>and</strong> community databases.<br />
Gene products: transcripts <strong>and</strong> proteins, including structural <strong>and</strong> expression<br />
pattern data.<br />
Integrated gene order maps, incorporating recombinational, cytogenetic <strong>and</strong><br />
molecular information.<br />
Annotated molecular maps: reference sequence gene maps; regional physical<br />
maps.<br />
Alleles: wild-type, mutant <strong>and</strong> engineered.<br />
Chromosomal aberrations.<br />
Engineered <strong>and</strong> natural transposons <strong>and</strong> their insertions in the genome.<br />
Fly strains: principally, the publicly-funded stock collections.<br />
Contact information for Drosophila researchers.<br />
FlyBase “literature” is meant broadly, <strong>and</strong> includes hard copy publications,<br />
sequence databank entries, bulk submitted data, stock lists, textual personal<br />
communications, etc. The principle is that all information in FlyBase is attributed,<br />
<strong>and</strong> is linked to a hard copy or electronic text. The literature database consists of<br />
records containing a mixture of controlled or structured fields <strong>and</strong> free text<br />
descriptions to extend the structured information. Extensive internal <strong>and</strong> external<br />
cross-referencing of related data objects is included. For these controlled fields,<br />
FlyBase has developed extensive controlled vocabularies, e.g., for phenotype,<br />
anatomy, mutagen, function of gene product.<br />
Data Curation Approaches<br />
The data of the BDGP <strong>and</strong> EDGP consist almost exclusively of structured outputs of<br />
high throughput genomic analyses. There is considerable curational input at such<br />
levels as gene predictions in genomic DNA sequences, <strong>and</strong> genetic analyses of P<br />
element insertions.<br />
The literature curation aspect of FlyBase began in 1992. It took as a starting<br />
point the compilation of Lindsley <strong>and</strong> Zimm [1], which was largely current to the<br />
beginning of 1990. The goal was to capture information from the post-1989 primary<br />
literature, <strong>and</strong> selectively curate earlier material as deemed necessary. Curation of<br />
genetic information occurs shortly after journal publication for those journals<br />
considered to be the major ones used by the Drosophila research community. For<br />
papers with molecular information (e.g., on gene structure, transcripts, proteins,<br />
expression patterns, transposons or their insertions), these receive a second round of<br />
curation with sets of papers relating to the same gene being curated together;<br />
available GenBank/EMBL/DDBJ records are examined at the same time, since a<br />
better picture of these molecular data classes emerges from simultaneous<br />
consideration of a group of related papers. Curated reference annotated sequence<br />
records have recently been added to molecular curation (further discussed in<br />
Sequence Annotation, below).