Annual Scientific Report 2015
EMBL_EBI_ASR_2015_DigitalEdition
EMBL_EBI_ASR_2015_DigitalEdition
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Non-vertebrate Genomics<br />
High-throughput sequencing is transforming both understanding and<br />
application of the biology of many organisms. Our team integrates, analyses<br />
and disseminates these data for scientists working in domains as diverse as<br />
agriculture, pathogen-mediated disease and the study of model organisms.<br />
We run services for bacterial, protist, fungal, plant<br />
and invertebrate metazoan genomes, mostly using<br />
the power of the Ensembl software suite, and usually<br />
in partnership with interested communities. In such<br />
collaborations we contribute to the development<br />
of many resources, including VectorBase (Giraldo-<br />
Calderon et al., <strong>2015</strong>) for invertebrate vectors of human<br />
disease, WormBase (Howe et al. 2016) for nematode<br />
biology, PomBase (McDowall et al., <strong>2015</strong>) for fission<br />
yeast Schizosaccharomyces pombe, and PhytoPath<br />
(Pedro et al. 2016) for plant pathogens. In the plant<br />
domain, we collaborate closely with Gramene in the US<br />
and with a range of European groups in the transPLANT<br />
and ELIXIR-EXCELERATE projects.<br />
By collaborating with EMBL-EBI and re-using our<br />
established toolset, small communities with little<br />
informatics infrastructure can perform and interpret<br />
highly complex and data-generative experiments—<br />
the type of work once the sole domain of large,<br />
internationally co-ordinated sequencing projects. We<br />
also work on large, complex genomes like hexaploid<br />
bread wheat, establishing informatics frameworks for<br />
the analysis of species for which genomic data is only<br />
now gaining traction as technologies improve.<br />
Our major activities include genome annotation,<br />
broad-range comparative genomics and the<br />
visualisation and interpretation of genomic variation,<br />
which is studied increasingly in species throughout<br />
the taxonomy.<br />
Major achievements<br />
In <strong>2015</strong> we issued six public releases of Ensembl<br />
Genomes. Ensembl Bacteria now includes almost 30<br />
000 genomes from over 5000 distinct species; while<br />
the number of fungal and protist genomes included<br />
have increased approximately 10-fold and 5-fold,<br />
respectively, in one year. It is likely that we will deploy<br />
a similar, automated approach to that currently taken<br />
for incorporating microorganism genomes for those of<br />
multicellular species in 2016.<br />
With each release we have updated cross-references<br />
and comparative genomics, introduced improved<br />
assemblies and annotations, and sourced additional<br />
data sets, mapping them onto the relevant genomes and<br />
incorporating them into the resource.<br />
We contributed to the regular data releases of and<br />
PomBase, VectorBase, WormBase and PhytoPath. As<br />
part of VectorBase, we contributed to the publications<br />
of the genome of Anopheles stephensi, the primary<br />
mosquito vector of malaria in urban India.<br />
In WormBase, we made substantial progress towards<br />
the implementation of a new database framework<br />
that should allow for improved performance and more<br />
rapid updates to the public site. In both WormBase and<br />
PhytoPath, we released new data-mining solutions.<br />
In each project there are specific challenges, but by<br />
re-using infrastructural components in different<br />
contexts we have gained efficiencies of scale.<br />
Community curation is a good way of capturing<br />
high-value data from the experts. We are also now<br />
running community curation portals for 30 insect vector<br />
species using the Web Apollo framework, allowing<br />
scientists to modify gene models directly for subsequent<br />
incorporation into VectorBase and Ensembl.<br />
In PomBase, we collect functional annotations using<br />
the Canto tool. In <strong>2015</strong> we extended our use of Web<br />
Apollo to plant pathogens for the first time, working<br />
with the community to improve the annotation of the<br />
necrotrophic fungus Botrytis cinerea, and prepared to<br />
deploy Canto for these phytopathogenic species.<br />
In December, we released a new “pre-site” offering<br />
access to a new genomic assembly for bread wheat.<br />
Bread wheat has a large, complex genome and we have<br />
been working as part of a BBSRC-funded project to<br />
develop and disseminate a new assembly through a<br />
collaboration with The Genome Analysis Centre, The<br />
John Innes Centre, and Rothamsted Research. The new<br />
assembly is the most complete, contiguous assembly<br />
yet released for this species and we will be working to<br />
annotate it fully over the course of 2016.<br />
In the context of the transPLANT project, we continued<br />
to work with the plant science community to develop<br />
standards for phenotypic data, and set out our findings<br />
with a publication (Krajewski et al., <strong>2015</strong>).<br />
91<br />
<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>