Annual Scientific Report 2015
EMBL_EBI_ASR_2015_DigitalEdition
EMBL_EBI_ASR_2015_DigitalEdition
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Maria-Jesus Martin<br />
Protein Function Development<br />
BSc In Veterinary Medicine, University<br />
Autonoma in Madrid. PhD in Molecular Biology<br />
(Bioinformatics), 2003.<br />
At EMBL-EBI since 1996.<br />
Team Leader since 2009.<br />
millions of proteins in UniProt. Informed by specialist<br />
biocurators, the automated system adds as much useful<br />
information as possible to imported sequences, which<br />
now include domains, signal, transmembrane and<br />
coil regions. We extended UniRule and the Statistical<br />
Automatic Annotation System (SAAS), two systems<br />
for the automatic annotation of large volumes of<br />
uncharacterised proteins. These are now available<br />
through newly implemented interactive web pages,<br />
allowing our users to browse annotation rules. We also<br />
started to work in a service to download and/or use these<br />
rules as a system for genome annotation. We extended<br />
our collaborations with external automatic annotation<br />
communities including the Biofunction Prediction and<br />
Critical Assessment of Function Annotation initiatives,<br />
which will expand our knowledge and use of functional<br />
prediction methods.<br />
In <strong>2015</strong> we further extended the scope of GO<br />
annotation to support annotations to RNA, identified by<br />
RNAcentral identifiers. We made significant changes<br />
to our database and Protein2GO, the web-based GO<br />
curation tool used by UniProt and GO Consortium<br />
curators to contribute annotations to the GOA project,<br />
in order to support a number of changes to annotation<br />
format and rules agreed by the GO Consortium. We<br />
re-engineered our pipeline that verifies the taxonomic<br />
correctness of GO annotations using a much-extended<br />
set of taxonomic constraints that originate from both<br />
GO and other ontologies, principally UBERON.<br />
To make GO protein–protein interaction annotations<br />
available for visualisation in tools such as Cytoscape,<br />
we implemented a PSICQUIC (Proteomics Standards<br />
Initiative Common Query Interface) server, available<br />
through EMBL-EBI’s PSICQUIC portal.<br />
Our team maintains the Enzyme Portal, a resource<br />
that integrates enzyme-related data for all relevant<br />
EMBL-EBI resources and the underlying functional and<br />
genomic data. We re-launched the service, which now<br />
features improved interfaces and functionalities and<br />
provides a one-stop shop for all information available<br />
on enzymes. To further improve the discoverability of<br />
enzyme data, we collaborated with the Web Production<br />
team to refine the enzyme search within the EBI-Search<br />
and EBI Blast sequence search tools.<br />
Future plans<br />
In 2016 we plan to release a protein-sequence feature<br />
viewer that summarises functional sites in the<br />
UniProt web site. We will continue to engage with user<br />
communities working in functional prediction, and<br />
explore methods and data-exchange mechanisms to<br />
improve accuracy and coverage of protein annotations.<br />
We will maintain our focus on usability and engage<br />
with our users to ensure we maintain a global genome/<br />
proteome- and gene-product-centric view of the<br />
sequence space. We aim to expand our collaboration<br />
with the ProteomeXchange resources in the integration<br />
of post-translational modifications in UniProtKB, and in<br />
the provision of experimental, unique peptide mappings<br />
for reference species. We will continue to co-operate<br />
with variation projects such as ExAC to integrate<br />
relevant genome and proteome information.<br />
Restructuring GO electronic annotation pipelines,<br />
principally those based on orthology supplied by<br />
Ensembl, will help us improve the quality of the<br />
projected annotations. We will continue the work<br />
undertaken on behalf of the GO Consortium in <strong>2015</strong><br />
to transition from using UniProt cross-references<br />
rather than MOD-supplied mapping files to map from<br />
“foreign” identifiers to UniProtKB accessions. We also<br />
plan to revise the set of annotation files that we publish<br />
and submit to the GO Consortium. We plan to release<br />
a new QuickGO with re-designed interfaces and new<br />
features to improve the overall user experience. We<br />
will also continue to develop Protein2GO to keep it in<br />
line with changes in annotation strategy agreed by the<br />
GO Consortium, and to introduce additional function<br />
to enhance curators’ workflow. We will make use of<br />
the enhanced set of Web Services provided by the new<br />
QuickGO to provide improved searching capabilities,<br />
and the ability to use any available ECO evidence code.<br />
Following the successful relaunch of the Enzyme Portal<br />
in <strong>2015</strong>, we will expand its functionalities in response to<br />
user needs and create new training activities.<br />
Selected publications<br />
Alpi E, Griss J, et al. (<strong>2015</strong>) Analysis of the tryptic search<br />
space in UniProt databases. Proteomics 15:48-57<br />
Huntley RP, et al. (<strong>2015</strong>) The GOA database: Gene<br />
Ontology annotation updates for <strong>2015</strong>. Nucleic Acids Res.<br />
43:d1057-d1063<br />
Pundir S, Magrane M, Martin MJ, O’Donovan C, UniProt<br />
Consortium (<strong>2015</strong>) Searching and navigating UniProt<br />
databases. Curr. Protoc. Bioinform. 50:1.27.1–1.27.10<br />
UniProt Consortium (<strong>2015</strong>) UniProt: a hub for protein<br />
information. Nucleic Acids Res. 43:d204-d212<br />
<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong> 104