14.06.2013 Views

Databases and Systems

Databases and Systems

Databases and Systems

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

143<br />

Genes, including links via published sequence <strong>and</strong> functional similarities to other<br />

molecular <strong>and</strong> community databases.<br />

Gene products: transcripts <strong>and</strong> proteins, including structural <strong>and</strong> expression<br />

pattern data.<br />

Integrated gene order maps, incorporating recombinational, cytogenetic <strong>and</strong><br />

molecular information.<br />

Annotated molecular maps: reference sequence gene maps; regional physical<br />

maps.<br />

Alleles: wild-type, mutant <strong>and</strong> engineered.<br />

Chromosomal aberrations.<br />

Engineered <strong>and</strong> natural transposons <strong>and</strong> their insertions in the genome.<br />

Fly strains: principally, the publicly-funded stock collections.<br />

Contact information for Drosophila researchers.<br />

FlyBase “literature” is meant broadly, <strong>and</strong> includes hard copy publications,<br />

sequence databank entries, bulk submitted data, stock lists, textual personal<br />

communications, etc. The principle is that all information in FlyBase is attributed,<br />

<strong>and</strong> is linked to a hard copy or electronic text. The literature database consists of<br />

records containing a mixture of controlled or structured fields <strong>and</strong> free text<br />

descriptions to extend the structured information. Extensive internal <strong>and</strong> external<br />

cross-referencing of related data objects is included. For these controlled fields,<br />

FlyBase has developed extensive controlled vocabularies, e.g., for phenotype,<br />

anatomy, mutagen, function of gene product.<br />

Data Curation Approaches<br />

The data of the BDGP <strong>and</strong> EDGP consist almost exclusively of structured outputs of<br />

high throughput genomic analyses. There is considerable curational input at such<br />

levels as gene predictions in genomic DNA sequences, <strong>and</strong> genetic analyses of P<br />

element insertions.<br />

The literature curation aspect of FlyBase began in 1992. It took as a starting<br />

point the compilation of Lindsley <strong>and</strong> Zimm [1], which was largely current to the<br />

beginning of 1990. The goal was to capture information from the post-1989 primary<br />

literature, <strong>and</strong> selectively curate earlier material as deemed necessary. Curation of<br />

genetic information occurs shortly after journal publication for those journals<br />

considered to be the major ones used by the Drosophila research community. For<br />

papers with molecular information (e.g., on gene structure, transcripts, proteins,<br />

expression patterns, transposons or their insertions), these receive a second round of<br />

curation with sets of papers relating to the same gene being curated together;<br />

available GenBank/EMBL/DDBJ records are examined at the same time, since a<br />

better picture of these molecular data classes emerges from simultaneous<br />

consideration of a group of related papers. Curated reference annotated sequence<br />

records have recently been added to molecular curation (further discussed in<br />

Sequence Annotation, below).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!