22.08.2016 Views

Annual Scientific Report 2015

EMBL_EBI_ASR_2015_DigitalEdition

EMBL_EBI_ASR_2015_DigitalEdition

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Non-vertebrate Genomics<br />

High-throughput sequencing is transforming both understanding and<br />

application of the biology of many organisms. Our team integrates, analyses<br />

and disseminates these data for scientists working in domains as diverse as<br />

agriculture, pathogen-mediated disease and the study of model organisms.<br />

We run services for bacterial, protist, fungal, plant<br />

and invertebrate metazoan genomes, mostly using<br />

the power of the Ensembl software suite, and usually<br />

in partnership with interested communities. In such<br />

collaborations we contribute to the development<br />

of many resources, including VectorBase (Giraldo-<br />

Calderon et al., <strong>2015</strong>) for invertebrate vectors of human<br />

disease, WormBase (Howe et al. 2016) for nematode<br />

biology, PomBase (McDowall et al., <strong>2015</strong>) for fission<br />

yeast Schizosaccharomyces pombe, and PhytoPath<br />

(Pedro et al. 2016) for plant pathogens. In the plant<br />

domain, we collaborate closely with Gramene in the US<br />

and with a range of European groups in the transPLANT<br />

and ELIXIR-EXCELERATE projects.<br />

By collaborating with EMBL-EBI and re-using our<br />

established toolset, small communities with little<br />

informatics infrastructure can perform and interpret<br />

highly complex and data-generative experiments—<br />

the type of work once the sole domain of large,<br />

internationally co-ordinated sequencing projects. We<br />

also work on large, complex genomes like hexaploid<br />

bread wheat, establishing informatics frameworks for<br />

the analysis of species for which genomic data is only<br />

now gaining traction as technologies improve.<br />

Our major activities include genome annotation,<br />

broad-range comparative genomics and the<br />

visualisation and interpretation of genomic variation,<br />

which is studied increasingly in species throughout<br />

the taxonomy.<br />

Major achievements<br />

In <strong>2015</strong> we issued six public releases of Ensembl<br />

Genomes. Ensembl Bacteria now includes almost 30<br />

000 genomes from over 5000 distinct species; while<br />

the number of fungal and protist genomes included<br />

have increased approximately 10-fold and 5-fold,<br />

respectively, in one year. It is likely that we will deploy<br />

a similar, automated approach to that currently taken<br />

for incorporating microorganism genomes for those of<br />

multicellular species in 2016.<br />

With each release we have updated cross-references<br />

and comparative genomics, introduced improved<br />

assemblies and annotations, and sourced additional<br />

data sets, mapping them onto the relevant genomes and<br />

incorporating them into the resource.<br />

We contributed to the regular data releases of and<br />

PomBase, VectorBase, WormBase and PhytoPath. As<br />

part of VectorBase, we contributed to the publications<br />

of the genome of Anopheles stephensi, the primary<br />

mosquito vector of malaria in urban India.<br />

In WormBase, we made substantial progress towards<br />

the implementation of a new database framework<br />

that should allow for improved performance and more<br />

rapid updates to the public site. In both WormBase and<br />

PhytoPath, we released new data-mining solutions.<br />

In each project there are specific challenges, but by<br />

re-using infrastructural components in different<br />

contexts we have gained efficiencies of scale.<br />

Community curation is a good way of capturing<br />

high-value data from the experts. We are also now<br />

running community curation portals for 30 insect vector<br />

species using the Web Apollo framework, allowing<br />

scientists to modify gene models directly for subsequent<br />

incorporation into VectorBase and Ensembl.<br />

In PomBase, we collect functional annotations using<br />

the Canto tool. In <strong>2015</strong> we extended our use of Web<br />

Apollo to plant pathogens for the first time, working<br />

with the community to improve the annotation of the<br />

necrotrophic fungus Botrytis cinerea, and prepared to<br />

deploy Canto for these phytopathogenic species.<br />

In December, we released a new “pre-site” offering<br />

access to a new genomic assembly for bread wheat.<br />

Bread wheat has a large, complex genome and we have<br />

been working as part of a BBSRC-funded project to<br />

develop and disseminate a new assembly through a<br />

collaboration with The Genome Analysis Centre, The<br />

John Innes Centre, and Rothamsted Research. The new<br />

assembly is the most complete, contiguous assembly<br />

yet released for this species and we will be working to<br />

annotate it fully over the course of 2016.<br />

In the context of the transPLANT project, we continued<br />

to work with the plant science community to develop<br />

standards for phenotypic data, and set out our findings<br />

with a publication (Krajewski et al., <strong>2015</strong>).<br />

91<br />

<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!