22.08.2016 Views

Annual Scientific Report 2015

EMBL_EBI_ASR_2015_DigitalEdition

EMBL_EBI_ASR_2015_DigitalEdition

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Protein Function Content<br />

One of the central activities of the Protein Function Content team is the<br />

biocuration of our databases, interpreting and integrating information relevant<br />

to biology. The primary goals of biocuration are accurate and comprehensive<br />

representation of biological knowledge, as well as facilitating easy access to this<br />

data for working scientists and providing a basis for computational analysis.<br />

The curation methods we apply to UniProtKB/<br />

Swiss-Prot include manual extraction and structuring<br />

of experimental information from the literature, manual<br />

verification of results from computational analyses,<br />

quality assessment, integration of large-scale datasets<br />

and continuous updating as new information<br />

becomes available.<br />

UniProt has two complementary approaches to<br />

automatic annotation of protein sequences with a high<br />

degree of accuracy. UniRule is a collection of manually<br />

curated annotation rules, which define annotations that<br />

can be propagated based on specific conditions. The<br />

Statistical Automatic Annotation System (SAAS) is an<br />

automatic, decision-tree-based, rule-generating system.<br />

The central components of these approaches are rules<br />

based on the manually curated data in UniProtKB/<br />

Swiss-Prot from the experimental literature and<br />

InterPro classification.<br />

The UniProt GO annotation (GOA) program aims to<br />

add high-quality Gene Ontology (GO) annotations to<br />

proteins in the UniProt Knowledgebase (UniProtKB).<br />

We supplement UniProt manual and electronic GO<br />

annotations with manual annotations supplied by<br />

external collaborating GO Consortium groups. This<br />

ensures that users have a comprehensive GO annotation<br />

dataset. UniProt is a member of the GO Consortium.<br />

Major achievements<br />

As a core contributor to the Consensus CDS project,<br />

UniProt is creating an authoritative complete<br />

proteome set for Homo sapiens in close collaboration<br />

with the RefSeq annotation group at the National<br />

Center for Biotechnology Information (NCBI) and<br />

the Ensembl and HAVANA teams at EMBL-EBI and<br />

the Wellcome Trust Sanger Institute. A component of<br />

this effort involves ensuring a curated and complete<br />

synchronisation with the HUGO Gene Nomenclature<br />

Committee (HGNC), which has assigned unique gene<br />

symbols and names to 39 000 human loci (19 003 of<br />

which are listed as coding for proteins). Information<br />

on the reviewed set of 20 199 entries is available on the<br />

UniProt website.<br />

We play a major role in establishing minimum standards<br />

for genome annotation across the taxonomic range,<br />

largely thanks to collaborations arising from the annual<br />

NCBI Genome Annotation Workshops, which are<br />

attended by researchers from life science organisations<br />

worldwide. These standards have contributed<br />

significantly to the annotation of complete genomes and<br />

proteomes and are helping scientists exploit these data<br />

to their full potential.<br />

The UniProt Automatic Annotation effort made great<br />

strides in <strong>2015</strong>. We increased the number of UniRules<br />

significantly, with an emphasis on enzymes across<br />

the taxonomic space to enable us to respond to the<br />

need for annotation of uncharacterised genomes. We<br />

began establishing relationships with sequencing and<br />

annotation centres such as Genoscope to share these<br />

rules and to expand into new approaches.<br />

The UniProt GO annotation program provides<br />

high-quality GO annotations to proteins in UniProtKB.<br />

The assignment of GO terms to UniProt records is an<br />

integral part of UniProt biocuration. UniProt manual<br />

and electronic GO annotations are supplemented with<br />

manual annotations supplied by external collaborating<br />

GO Consortium groups, to ensure a comprehensive GO<br />

annotation dataset is supplied to users. Our curators<br />

are key members of the GO Consortium Reference<br />

Genomes Initiative for the human proteome and provide<br />

high-quality annotations for human proteins. In 2014,<br />

we provided a manually curated set of human proteins<br />

for the validation of the computational approaches<br />

submitted to for the Critical Assessment of Function<br />

Annotation experiment (CAFA) and presented a guide to<br />

how best to use and interpret Gene Ontology data at the<br />

Automated Function Prediction SIG at the International<br />

Conference on Intelligent Systems for Molecular<br />

Biology (ISMB).<br />

105<br />

<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!