21.11.2014 Views

ayout 1 - EMBL Grenoble

ayout 1 - EMBL Grenoble

ayout 1 - EMBL Grenoble

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>EMBL</strong>-EBI<br />

PANDA proteins and the Apweiler research<br />

group<br />

Previous and current research<br />

The PANDA (Protein and Nucleotide Data) group was created in June 2007 by merging the former<br />

Ensembl (Birney) and Sequence Database (Apweiler) groups.<br />

The activities of the PANDA group are focussed on the production of protein sequence, protein<br />

family and nucleotide sequence databases at <strong>EMBL</strong>-EBI. We maintain and host the <strong>EMBL</strong> Nucleotide<br />

Sequence Database, the Ensembl genome browser, the UniProt protein resource, and a<br />

range of other biomolecular databases. These efforts can be divided into three major groups: nucleotides,<br />

proteins, and chemoinformatics and metabolism. In addition to PANDA activities, the<br />

Apweiler group has a complementary research component.<br />

The activities of the PANDA proteins teams are centred on the mission of providing public access<br />

to all known protein sequences and functional information about these proteins. The UniProt resource<br />

provides the centrepiece for these activities. Most of the UniProt sequence data is derived<br />

from translation of nucleotide sequences provided by the European Nucleotide Archive and Ensembl. All UniProt data undergoes classification<br />

provided by InterPro (see the report from Sarah Hunter, page 78). In addition, we add information extracted from the scientific literature<br />

and curator-evaluated computational analysis whenever possible. The combined InterPro literature annotation forms the basis for<br />

automatic annotation approaches to annotate all the sequence data without experimental functional data. Protein interaction and identification<br />

data is or will be provided to UniProt by the IntAct protein–protein interaction database and by the Protein Identification (PRIDE) database.<br />

Ongoing research activities in the group include the development of methods to improve searching of large biological datasets, approaches<br />

to improve protein identification from mass spectrometry data, algorithms for genome-wide sequence comparison and the development of<br />

tools for the automatic annotation of proteins.<br />

Future projects and goals<br />

Rolf Apweiler<br />

PhD 199, University of<br />

Heidelberg. Germany.<br />

Team leader at <strong>EMBL</strong>-EBI<br />

since 1997.<br />

It is our intention to work on improved integration and synchronisation of all PANDA resources. Despite the abundance of data from largescale<br />

experimentation on a genome-wide level, such as expression profiling, protein–protein interaction screens or protein localisation, the<br />

systematic and integrated use of this type of information for high-throughput annotation of proteins remains largely unexplored. We therefore<br />

intend to build on ongoing research activities at <strong>EMBL</strong>-EBI to develop and assess new protocols to integrate and analyse functional genomics<br />

datasets for the purpose of high-throughput annotation of uncharacterised proteins. This will include the analysis of different data<br />

types regarding their suitability for the approach, development of data structures that allow the efficient integration and mining of data of different<br />

types and quality as well as benchmarking of the obtained results and the application of new methodologies to the annotation of UniProtKB/Tr<strong>EMBL</strong><br />

records.<br />

Selected references<br />

Klie, S. et al. (2008). Analyzing large-scale proteomics projects with<br />

latent semantic indexing. J. Proteome Res., 7, 182-191<br />

Mueller, M. et al. (2008). Analysis of the experimental detection of<br />

central nervous system-related genes in human brain and<br />

cerebrospinal fluid datasets. Proteomics, 8, 1138-118<br />

The UniProt Consortium (2008). The Universal Protein Resource<br />

(UniProt). Nucleic Acids Res., 36, D190-195<br />

Mueller, M. et al. (2007). Annotating the human proteome: Beyond<br />

establishing a parts list. Biochimica et Biophysica Acta, 177, 175-<br />

191<br />

71

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!