22.08.2016 Views

Annual Scientific Report 2015

EMBL_EBI_ASR_2015_DigitalEdition

EMBL_EBI_ASR_2015_DigitalEdition

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Pfam<br />

Pfam is a database of protein sequence families. Each<br />

Pfam family is represented by a statistical model (a<br />

profile-hidden Markov model), trained using a curated<br />

alignment of representative sequences. These models<br />

can be searched against all protein sequences to find<br />

occurrences of Pfam families, thereby aiding the<br />

identification of evolutionarily related sequences. As<br />

homologous proteins are more likely to share structural<br />

and functional features, Pfam families can aid in the<br />

annotation of uncharacterised sequences and guide<br />

experimental work.<br />

http://pfam.xfam.org<br />

HMMER<br />

HMMER is a sequence-analysis package that can be<br />

used with both protein and nucleotide sequences. At<br />

the core of the software is an algorithm that enables the<br />

searching of one or more probabilistic models (profile<br />

hidden Markov models, HMMs) against either a single<br />

sequence or a database of sequences. The HMMER<br />

website has implemented this software as a set of fast<br />

web services, with both a programmatic interface<br />

and graphical user interfaces. Profile HMMs are<br />

incredibly powerful, allowing users to detect distant<br />

evolutionary relationships.<br />

www.ebi.ac.uk/Tools/Hmmer<br />

Protein Function Development<br />

Maria Martin<br />

• Re-launched the Enzyme portal and developed new<br />

interfaces and tools for UniProt and QuickGO, with a focus<br />

on optimising user interaction with these websites;<br />

• Implemented a method for identification of highly<br />

redundant proteomes and removal from UniProtKB;<br />

• Extended the provision of variants with consequences at the<br />

protein level, incorporated variation data from ExAC and<br />

the Exome Sequencing Project (ESP);<br />

• Released experimental peptides mapped to UniProt proteins<br />

from mass-spectrometry studies in collaboration with<br />

PeptideAtlas and MaxQB;<br />

• Extended the scope of the annotation tool Protein2GO and<br />

the GO browser QuickGO, and implemented a PSIQUIC<br />

server for protein-protein interaction annotations<br />

visualisation in Cytoscape;<br />

• Implemented the automatic annotation of domains, signal<br />

peptides, transmembrane and coil-coil regions for millions of<br />

protein sequences in UniProtKB/TrEMBL.<br />

Protein Function Content<br />

Claire O’Donovan<br />

• In the context of the Consensus Coding Sequence (CCDS)<br />

project, ensured the curated, complete synchronisation with<br />

the HGNC, which has assigned unique gene symbols and<br />

names to 39 000 human loci (19 001 of which are listed as<br />

coding for proteins);<br />

• Helped establish minimum standards for genome<br />

annotation to enable scientists to exploit complete genome<br />

and proteome datasets to their full potential;<br />

• Improved UniProt Automatic Annotation by significantly<br />

increasing the number of UniRules, with an emphasis on<br />

enzymes across the taxonomic space;<br />

• Secured funding to continue our contribution to the<br />

validation of the computational approaches submitted to the<br />

Critical Assessment of Function Annotation experiment.<br />

Sequence Families<br />

Rob Finn<br />

• Refactored Pfam to utilise UniProt reference proteomes as<br />

the underlying sequence database, streamlining curation<br />

and production processes while minimising impact on<br />

sensitivity;<br />

• Optimised Pfam quality control to allow minor overlaps<br />

between Pfam entries to allow better modeling of protein<br />

families;<br />

• Streamlined production and delivered monthly updates of<br />

InterPro data to UniProt for their automatic annotation<br />

procedures;<br />

• Integrated a net gain of over 2000 new member database<br />

signatures within InterPro, resulting in over 1800 new<br />

entries;<br />

• Provided GO terms to UniProt, with the latest release<br />

assigning ~110 million terms to approximately 35 million<br />

proteins in UniProt release 2016_01;<br />

• Migrated the HMMER web services from Janelia Research<br />

Campus;<br />

• Expanded HMMER services to include PIRSF HMM<br />

searches and support for UniProt reference proteomes, now<br />

the default sequence database;<br />

• Issued two releases of Pfam and six releases of InterPro.<br />

<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong> 19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!