28.01.2015 Views

Appendix 2 – Data Submission Checklist

Appendix 2 – Data Submission Checklist

Appendix 2 – Data Submission Checklist

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Appendix</strong> 2 – <strong>Submission</strong> to public databases and compliance to<br />

appropriate data standards<br />

1. <strong>Submission</strong> of biological data to public repositories.................................... 1<br />

2. Primary Sequence <strong>Data</strong>.............................................................................. 1<br />

2.1 EMBL..................................................................................................... 1<br />

2.2 GenBank................................................................................................ 2<br />

3.Transcriptomics <strong>Data</strong>.................................................................................... 2<br />

3.1 Microarray Experiments ........................................................................ 2<br />

4.Proteomic data.............................................................................................. 3<br />

4.1 Protein Sequence <strong>Data</strong>.......................................................................... 3<br />

5.References.................................................................................................... 3<br />

1. <strong>Submission</strong> of biological data to public repositories<br />

This section reviews specific information for the handling and submission of<br />

common biological data types to public repositories including any standards<br />

or formats that must be adhered to. Web links for submission sites and<br />

programs are given where applicable. This list is based on data types<br />

currently being generated by EG Awardees and will be extended as<br />

necessary over time.<br />

Most public databases allow the data submitter to specify date of public<br />

release of the data and will provide accession numbers. Please make this<br />

the public release data available to us along with the accession numbers of<br />

your submission for inclusion in the data catalogue.<br />

EGTDC staff are available to help with data submissions, if you have further<br />

queries please contact helpdesk@envgen.nox.ac.uk.<br />

2. Primary Sequence <strong>Data</strong><br />

Primary nucleotide sequence data is generally submitted directly to one of the<br />

three international repositories, EMBL, GenBank and DDBJ. <strong>Submission</strong> of<br />

EST sequences to the dbEST database is discussed below.<br />

2.1 EMBL<br />

Information for submitters to EMBL:<br />

http://www.ebi.ac.uk/embl/Documentation/information_for_submitters.html<br />

Webin (http://www.ebi.ac.uk/embl/<strong>Submission</strong>/webin.html) is the preferred<br />

system for submitting nucleotide sequence and biological annotation to<br />

EMBL (Kanz C. et al. 2005). Single, multiple, or large numbers of<br />

sequences can be submitted through this interface.<br />

1


Please note that EMBL stopped accepting email submissions on January<br />

1, 2003.<br />

If you will produce a large volume of genome sequence over an extended<br />

period of time, please contact the EMBL database administrators at<br />

datasubs@ebi.ac.uk<br />

2.2 GenBank<br />

<strong>Submission</strong>s to GenBank can be done using the BankIt web submission<br />

tool (http://www.ncbi.nlm.nih.gov/BankIt/) or the Sequin tool<br />

(http://www.ncbi.nlm.nih.gov/Sequin/index.html). For simple submissions,<br />

BankIt is recommended (Dennis A. et al. 2005). Sequin is available on<br />

Bio-Linux.<br />

2.3 Expressed Sequence Tag (EST) sequence<br />

EST sequences can be submitted to the public EST repository dbEST<br />

(http://www.ncbi.nlm.nih.gov/dbEST/). The trace2dbEST software<br />

developed by the EGTDC and available on Bio-Linux can be used for EST<br />

processing and direct submission to dbEST<br />

(http://envgen.nox.ac.uk/est.html)<br />

3. Transcriptomics <strong>Data</strong><br />

3.1 Microarray Experiments<br />

Microarray experiment descriptions and results should be annotated to<br />

MIAME standard and submitted to a public repository such as ArrayExpress.<br />

(http://www.ebi.ac.uk/arrayexpress/). Further details on the MIAME standard<br />

can be found at http://envgen.nox.ac.uk/miame/index.html and on the<br />

MIAME/Env data standard at<br />

http://envgen.nox.ac.uk/miame/miame_env.html.<br />

We recommend the maxdLoad2 software, developed by the EGTDC and<br />

installed on Bio-Linux for annotation and preparation of a file in MAGEML<br />

format suitable for submission to ArrayExpress.<br />

The EGTDC works closely with ArrayExpress. As of March 2005, the EGTDC<br />

recommends that microarray data be submitted via the EGTDC and a copy of<br />

the annotated data will be held at the data centre as well as in ArrayExpress.<br />

Reasons for this include:<br />

• Functions for data retrieval and searching across datasets held in<br />

ArrayExpress are still under development and holding the data locally<br />

enables us to provide accessibility and functionality not currently<br />

supported by the public repository<br />

• Partial datasets can be submitted and held<br />

• Potential for searching across datasets of other types held in<br />

compatible databases being developed by the EGTDC<br />

2


Hence, the recommended process for submission of microarray expression<br />

data is as follows:-<br />

1. Use maxdLoad2 to export your experiment as maxdML<br />

2. Submit the maxdML file to the EGTDC<br />

3. The EGTDC stores the data in a central maxd database and arranges<br />

submission to ArrayExpress with all communications between the EBI and<br />

the EGTDC open to the researcher providing the dataset<br />

You will be issued an ArrayExpress accession number for your dataset. By<br />

default ArrayExpress holds back your data from public release until you have<br />

published the data.<br />

If you have made other arrangements to store your data and submit your data<br />

to the EBI please advise us of the accession number to complete your data<br />

catalogue entry.<br />

For further information please see the EGTDC MIAME-Compliance Guide<br />

(http://envgen.nox.ac.uk/envgen/software/archives/000527.html)<br />

4. Proteomic data<br />

4.1 Protein Sequence <strong>Data</strong><br />

Directly-sequenced protein/peptide sequences can be submitted to Uni-Prot<br />

(Universal Protein Resource). The recommended method for direct<br />

submission to Uni-Prot is via the Spin website<br />

(http://www.ebi.ac.uk/swissprot/<strong>Submission</strong>s/submissions.html).<br />

The EGTDC is currently evaluating solutions for proteomics data. If you have<br />

data to submit please contact the EGTDC (helpdesk@envgen.nox.ac.uk).<br />

5. References<br />

Carola Kanz, Philippe Aldebert, Nicola Althorpe, et al. Title: The EMBL Nucleotide<br />

Sequence <strong>Data</strong>base. Nucl. Acids Res. 2005. Full Text at<br />

http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D29<br />

Dennis A. Benson, Ilene Karsch-Mizrachi, David J. Lipman, James Ostell, and<br />

David L. Wheeler. Title: GenBank. Nucl. Acids Res. 2005. Full text at<br />

http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D34<br />

3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!