14.06.2013 Views

Databases and Systems

Databases and Systems

Databases and Systems

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

144<br />

Creation of External Database Links<br />

FlyBase receives daily updates of GenBank/EMBL/DDBJ records of the family<br />

Drosophilidae, <strong>and</strong> captures the links to valid FlyBase gene symbols <strong>and</strong> identifers.<br />

Tables of these links are shared with the sequence databanks. Similar procedures are<br />

used with regard to links to other sequence databases, particularly SwissProt. Links<br />

from the literature to “homologs” in other species are selective, <strong>and</strong> are based upon<br />

stated sequence similarities in papers. Based on these statements, FlyBase captures<br />

the valid gene symbol <strong>and</strong> identifier for the declared “homolog” in the foreign<br />

community database. The BDGP <strong>and</strong> EDGP use sequence similarities as one aspect<br />

of gene prediction, <strong>and</strong> capture <strong>and</strong> maintain links to the strongest BLAST<br />

similarities in GenBank/EMBL/DDBJ, focusing on the major genetic systems<br />

wherever possible.<br />

Data Coordination within FlyBase<br />

It will be obvious from the above that there are many data objects in FlyBase that are<br />

common to the genome projects <strong>and</strong> the literature (the latter being the product of the<br />

entire Drosophila research community). In order to bring these data into a single<br />

structure, we will need to integrate <strong>and</strong> homogenize the data in a stepwise process.<br />

All FlyBase data will move, as a first step into either of two databases: one being<br />

an integrated Genome Project database <strong>and</strong> the other an integrated Literature<br />

database. (The integrated Literature database is already in production use; the<br />

integrated Genome Project database will be implemented soon.) At this step,<br />

considerable data validation <strong>and</strong> homogenization occurs. The next step will be to<br />

interrelate <strong>and</strong> homogenize literature-derived <strong>and</strong> genome project data of the same<br />

class. This has been done successfully as an experiment for selected data classes,<br />

such as transposon insertions. Based on this experience, expert annotators will need<br />

to examine the data to ensure that identical objects with variant names are being<br />

recognized as identical, <strong>and</strong> valid symbols will need to be agreed upon <strong>and</strong><br />

propagated to all of the relevant working <strong>and</strong> intermediary databases. Based on<br />

FlyBase’s experience with the integrated Literature database, procedures will be<br />

established such that new data objects at each site will receive valid symbols <strong>and</strong><br />

interconnections to other valid objects as they are introduced into the database. The<br />

final step will be to map field identities between the integrated Literature <strong>and</strong><br />

integrated Genome Project databases <strong>and</strong> thereby permit the data to be housed in one<br />

structure.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!