You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>EMBL</strong>-EBI<br />
PANDA proteins and the Apweiler research<br />
group<br />
Previous and current research<br />
The PANDA (Protein and Nucleotide Data) group was created in June 2007 by merging the former<br />
Ensembl (Birney) and Sequence Database (Apweiler) groups.<br />
The activities of the PANDA group are focussed on the production of protein sequence, protein<br />
family and nucleotide sequence databases at <strong>EMBL</strong>-EBI. We maintain and host the <strong>EMBL</strong> Nucleotide<br />
Sequence Database, the Ensembl genome browser, the UniProt protein resource, and a<br />
range of other biomolecular databases. These efforts can be divided into three major groups: nucleotides,<br />
proteins, and chemoinformatics and metabolism. In addition to PANDA activities, the<br />
Apweiler group has a complementary research component.<br />
The activities of the PANDA proteins teams are centred on the mission of providing public access<br />
to all known protein sequences and functional information about these proteins. The UniProt resource<br />
provides the centrepiece for these activities. Most of the UniProt sequence data is derived<br />
from translation of nucleotide sequences provided by the European Nucleotide Archive and Ensembl. All UniProt data undergoes classification<br />
provided by InterPro (see the report from Sarah Hunter, page 78). In addition, we add information extracted from the scientific literature<br />
and curator-evaluated computational analysis whenever possible. The combined InterPro literature annotation forms the basis for<br />
automatic annotation approaches to annotate all the sequence data without experimental functional data. Protein interaction and identification<br />
data is or will be provided to UniProt by the IntAct protein–protein interaction database and by the Protein Identification (PRIDE) database.<br />
Ongoing research activities in the group include the development of methods to improve searching of large biological datasets, approaches<br />
to improve protein identification from mass spectrometry data, algorithms for genome-wide sequence comparison and the development of<br />
tools for the automatic annotation of proteins.<br />
Future projects and goals<br />
Rolf Apweiler<br />
PhD 199, University of<br />
Heidelberg. Germany.<br />
Team leader at <strong>EMBL</strong>-EBI<br />
since 1997.<br />
It is our intention to work on improved integration and synchronisation of all PANDA resources. Despite the abundance of data from largescale<br />
experimentation on a genome-wide level, such as expression profiling, protein–protein interaction screens or protein localisation, the<br />
systematic and integrated use of this type of information for high-throughput annotation of proteins remains largely unexplored. We therefore<br />
intend to build on ongoing research activities at <strong>EMBL</strong>-EBI to develop and assess new protocols to integrate and analyse functional genomics<br />
datasets for the purpose of high-throughput annotation of uncharacterised proteins. This will include the analysis of different data<br />
types regarding their suitability for the approach, development of data structures that allow the efficient integration and mining of data of different<br />
types and quality as well as benchmarking of the obtained results and the application of new methodologies to the annotation of UniProtKB/Tr<strong>EMBL</strong><br />
records.<br />
Selected references<br />
Klie, S. et al. (2008). Analyzing large-scale proteomics projects with<br />
latent semantic indexing. J. Proteome Res., 7, 182-191<br />
Mueller, M. et al. (2008). Analysis of the experimental detection of<br />
central nervous system-related genes in human brain and<br />
cerebrospinal fluid datasets. Proteomics, 8, 1138-118<br />
The UniProt Consortium (2008). The Universal Protein Resource<br />
(UniProt). Nucleic Acids Res., 36, D190-195<br />
Mueller, M. et al. (2007). Annotating the human proteome: Beyond<br />
establishing a parts list. Biochimica et Biophysica Acta, 177, 175-<br />
191<br />
71