Annual Scientific Report 2015
EMBL_EBI_ASR_2015_DigitalEdition
EMBL_EBI_ASR_2015_DigitalEdition
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Protein Function Content<br />
One of the central activities of the Protein Function Content team is the<br />
biocuration of our databases, interpreting and integrating information relevant<br />
to biology. The primary goals of biocuration are accurate and comprehensive<br />
representation of biological knowledge, as well as facilitating easy access to this<br />
data for working scientists and providing a basis for computational analysis.<br />
The curation methods we apply to UniProtKB/<br />
Swiss-Prot include manual extraction and structuring<br />
of experimental information from the literature, manual<br />
verification of results from computational analyses,<br />
quality assessment, integration of large-scale datasets<br />
and continuous updating as new information<br />
becomes available.<br />
UniProt has two complementary approaches to<br />
automatic annotation of protein sequences with a high<br />
degree of accuracy. UniRule is a collection of manually<br />
curated annotation rules, which define annotations that<br />
can be propagated based on specific conditions. The<br />
Statistical Automatic Annotation System (SAAS) is an<br />
automatic, decision-tree-based, rule-generating system.<br />
The central components of these approaches are rules<br />
based on the manually curated data in UniProtKB/<br />
Swiss-Prot from the experimental literature and<br />
InterPro classification.<br />
The UniProt GO annotation (GOA) program aims to<br />
add high-quality Gene Ontology (GO) annotations to<br />
proteins in the UniProt Knowledgebase (UniProtKB).<br />
We supplement UniProt manual and electronic GO<br />
annotations with manual annotations supplied by<br />
external collaborating GO Consortium groups. This<br />
ensures that users have a comprehensive GO annotation<br />
dataset. UniProt is a member of the GO Consortium.<br />
Major achievements<br />
As a core contributor to the Consensus CDS project,<br />
UniProt is creating an authoritative complete<br />
proteome set for Homo sapiens in close collaboration<br />
with the RefSeq annotation group at the National<br />
Center for Biotechnology Information (NCBI) and<br />
the Ensembl and HAVANA teams at EMBL-EBI and<br />
the Wellcome Trust Sanger Institute. A component of<br />
this effort involves ensuring a curated and complete<br />
synchronisation with the HUGO Gene Nomenclature<br />
Committee (HGNC), which has assigned unique gene<br />
symbols and names to 39 000 human loci (19 003 of<br />
which are listed as coding for proteins). Information<br />
on the reviewed set of 20 199 entries is available on the<br />
UniProt website.<br />
We play a major role in establishing minimum standards<br />
for genome annotation across the taxonomic range,<br />
largely thanks to collaborations arising from the annual<br />
NCBI Genome Annotation Workshops, which are<br />
attended by researchers from life science organisations<br />
worldwide. These standards have contributed<br />
significantly to the annotation of complete genomes and<br />
proteomes and are helping scientists exploit these data<br />
to their full potential.<br />
The UniProt Automatic Annotation effort made great<br />
strides in <strong>2015</strong>. We increased the number of UniRules<br />
significantly, with an emphasis on enzymes across<br />
the taxonomic space to enable us to respond to the<br />
need for annotation of uncharacterised genomes. We<br />
began establishing relationships with sequencing and<br />
annotation centres such as Genoscope to share these<br />
rules and to expand into new approaches.<br />
The UniProt GO annotation program provides<br />
high-quality GO annotations to proteins in UniProtKB.<br />
The assignment of GO terms to UniProt records is an<br />
integral part of UniProt biocuration. UniProt manual<br />
and electronic GO annotations are supplemented with<br />
manual annotations supplied by external collaborating<br />
GO Consortium groups, to ensure a comprehensive GO<br />
annotation dataset is supplied to users. Our curators<br />
are key members of the GO Consortium Reference<br />
Genomes Initiative for the human proteome and provide<br />
high-quality annotations for human proteins. In 2014,<br />
we provided a manually curated set of human proteins<br />
for the validation of the computational approaches<br />
submitted to for the Critical Assessment of Function<br />
Annotation experiment (CAFA) and presented a guide to<br />
how best to use and interpret Gene Ontology data at the<br />
Automated Function Prediction SIG at the International<br />
Conference on Intelligent Systems for Molecular<br />
Biology (ISMB).<br />
105<br />
<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>