Annual Scientific Report 2015
EMBL_EBI_ASR_2015_DigitalEdition
EMBL_EBI_ASR_2015_DigitalEdition
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Pfam<br />
Pfam is a database of protein sequence families. Each<br />
Pfam family is represented by a statistical model (a<br />
profile-hidden Markov model), trained using a curated<br />
alignment of representative sequences. These models<br />
can be searched against all protein sequences to find<br />
occurrences of Pfam families, thereby aiding the<br />
identification of evolutionarily related sequences. As<br />
homologous proteins are more likely to share structural<br />
and functional features, Pfam families can aid in the<br />
annotation of uncharacterised sequences and guide<br />
experimental work.<br />
http://pfam.xfam.org<br />
HMMER<br />
HMMER is a sequence-analysis package that can be<br />
used with both protein and nucleotide sequences. At<br />
the core of the software is an algorithm that enables the<br />
searching of one or more probabilistic models (profile<br />
hidden Markov models, HMMs) against either a single<br />
sequence or a database of sequences. The HMMER<br />
website has implemented this software as a set of fast<br />
web services, with both a programmatic interface<br />
and graphical user interfaces. Profile HMMs are<br />
incredibly powerful, allowing users to detect distant<br />
evolutionary relationships.<br />
www.ebi.ac.uk/Tools/Hmmer<br />
Protein Function Development<br />
Maria Martin<br />
• Re-launched the Enzyme portal and developed new<br />
interfaces and tools for UniProt and QuickGO, with a focus<br />
on optimising user interaction with these websites;<br />
• Implemented a method for identification of highly<br />
redundant proteomes and removal from UniProtKB;<br />
• Extended the provision of variants with consequences at the<br />
protein level, incorporated variation data from ExAC and<br />
the Exome Sequencing Project (ESP);<br />
• Released experimental peptides mapped to UniProt proteins<br />
from mass-spectrometry studies in collaboration with<br />
PeptideAtlas and MaxQB;<br />
• Extended the scope of the annotation tool Protein2GO and<br />
the GO browser QuickGO, and implemented a PSIQUIC<br />
server for protein-protein interaction annotations<br />
visualisation in Cytoscape;<br />
• Implemented the automatic annotation of domains, signal<br />
peptides, transmembrane and coil-coil regions for millions of<br />
protein sequences in UniProtKB/TrEMBL.<br />
Protein Function Content<br />
Claire O’Donovan<br />
• In the context of the Consensus Coding Sequence (CCDS)<br />
project, ensured the curated, complete synchronisation with<br />
the HGNC, which has assigned unique gene symbols and<br />
names to 39 000 human loci (19 001 of which are listed as<br />
coding for proteins);<br />
• Helped establish minimum standards for genome<br />
annotation to enable scientists to exploit complete genome<br />
and proteome datasets to their full potential;<br />
• Improved UniProt Automatic Annotation by significantly<br />
increasing the number of UniRules, with an emphasis on<br />
enzymes across the taxonomic space;<br />
• Secured funding to continue our contribution to the<br />
validation of the computational approaches submitted to the<br />
Critical Assessment of Function Annotation experiment.<br />
Sequence Families<br />
Rob Finn<br />
• Refactored Pfam to utilise UniProt reference proteomes as<br />
the underlying sequence database, streamlining curation<br />
and production processes while minimising impact on<br />
sensitivity;<br />
• Optimised Pfam quality control to allow minor overlaps<br />
between Pfam entries to allow better modeling of protein<br />
families;<br />
• Streamlined production and delivered monthly updates of<br />
InterPro data to UniProt for their automatic annotation<br />
procedures;<br />
• Integrated a net gain of over 2000 new member database<br />
signatures within InterPro, resulting in over 1800 new<br />
entries;<br />
• Provided GO terms to UniProt, with the latest release<br />
assigning ~110 million terms to approximately 35 million<br />
proteins in UniProt release 2016_01;<br />
• Migrated the HMMER web services from Janelia Research<br />
Campus;<br />
• Expanded HMMER services to include PIRSF HMM<br />
searches and support for UniProt reference proteomes, now<br />
the default sequence database;<br />
• Issued two releases of Pfam and six releases of InterPro.<br />
<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong> 19