15.12.2012 Views

Bioinformatics, Volume I Data, Sequence Analysis and Evolution

Bioinformatics, Volume I Data, Sequence Analysis and Evolution

Bioinformatics, Volume I Data, Sequence Analysis and Evolution

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Managing <strong>Sequence</strong> <strong>Data</strong><br />

Ilene Karsch Mizrachi<br />

Abstract<br />

Chapter 1<br />

Nucleotide <strong>and</strong> protein sequences are the foundation for all bioinformatics tools <strong>and</strong> resources. Researchers<br />

can analyze these sequences to discover genes or predict the function of their products. The INSD<br />

(International Nucleotide <strong>Sequence</strong> <strong>Data</strong>base—DDBJ/EMBL/GenBank) is an international, centralized<br />

primary sequence resource that is freely available on the internet. This database contains all publicly<br />

available nucleotide <strong>and</strong> derived protein sequences. This chapter summarizes the nucleotide sequence<br />

database resources, provides information on how to submit sequences to the databases, <strong>and</strong> explains how<br />

to access the sequence data.<br />

Key words: DNA sequence database, GenBank, EMBL, DDBJ, INSD.<br />

1. Introduction<br />

The International Nucleotide <strong>Sequence</strong> <strong>Data</strong>base (INSD) is a centralized<br />

public sequence resource. As of August 2007, it contains<br />

over 101 million DNA sequences comprised of over 181 billion<br />

nucleotides, numbers that continue to increase exponentially.<br />

Scientists generate <strong>and</strong> submit their primary sequence data to<br />

INSD as part of the publication process. The database is archival<br />

<strong>and</strong> represents the results of scientists’ experiments. The annotation,<br />

represented by annotated features such as coding regions,<br />

genes, <strong>and</strong> structural RNAs, on the sequence is based on the<br />

submitter’s observations <strong>and</strong> conclusions rather than those of the<br />

database curators.<br />

Jonathan M. Keith (ed.), <strong>Bioinformatics</strong>, <strong>Volume</strong> I: <strong>Data</strong>, <strong>Sequence</strong> <strong>Analysis</strong>, <strong>and</strong> <strong>Evolution</strong>, vol. 452<br />

© 2008 Humana Press, a part of Springer Science + Business Media, Totowa, NJ<br />

Book doi: 10.1007/978-1-60327-159-2 Springerprotocols.com<br />

3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!