22.08.2016 Views

Annual Scientific Report 2015

EMBL_EBI_ASR_2015_DigitalEdition

EMBL_EBI_ASR_2015_DigitalEdition

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Vertebrate Annotation<br />

The Vertebrate Annotation team aims to create comprehensive, up-to-date gene<br />

annotation and comparative genomics resources that further our understanding<br />

of biology, evolution and the mechanisms of disease. The data from these resources<br />

are distributed by the Ensembl project, and provide a foundation for clinical and<br />

research communities.<br />

While the reference genome assembly is an increasingly<br />

important tool for research, most scientists working in<br />

this area need to link genomic sequence to biological<br />

function. This is made possible by gene annotations,<br />

which identify the location, structure and expression of<br />

genes. Valuable insights can be gained by comparing the<br />

annotated sequences of individuals of the same species,<br />

or across a wide range of species.<br />

Our team produces reference gene annotation that is<br />

used by clinical, agricultural and research communities<br />

as well as by other data services at EMBL-EBI. Our<br />

gene annotation service provides high-quality gene<br />

sets for human, mouse and almost 100 other vertebrate<br />

species, including key model organisms and farmed<br />

animals. These are the primary annotations used in the<br />

initial genomic analyses of many international genome<br />

projects. In collaboration with GENCODE, we produce<br />

the gold-standard gene annotation for human and<br />

mouse.<br />

For every genome assembly in Ensembl, we also produce<br />

comparative genomics resources that link diverse<br />

species at the DNA and gene level. These data are used<br />

for investigating gene function and evolution, adaptive<br />

traits and conservation biology.<br />

We are responsible for TreeFam, which clusters<br />

similar gene sequences into homologous families and<br />

indicates gene history events such as duplication and<br />

speciation. Our comparative annotations include gene<br />

families, gene orthologues, whole genome multi-species<br />

alignments and conserved genomic regions across<br />

many species.<br />

We develop alignment and annotation methods<br />

that integrate diverse data from the public archives.<br />

We collaborate closely with other service teams at<br />

EMBl-EBI including the ENA, UniProt, and Expression<br />

Atlas to annotate new assemblies and to update<br />

annotation on existing assemblies as new data<br />

become available.<br />

Major achievements<br />

We are proud to maintain the reference human and<br />

mouse gene sets through our GENCODE collaboration.<br />

In <strong>2015</strong>, we improved and updated these gene resources<br />

by annotating assembly updates provided by the<br />

Genome Reference Consortium (GRC), incorporating<br />

new manual annotation, identifying gene alleles across<br />

the various haplotypes, and contributing to the CCDS<br />

project.<br />

We released major updates to the rat and zebrafish<br />

resources. We made new assemblies available for both<br />

species, and produced genome-wide annotation for<br />

them using our evidence-based methods to identify<br />

protein-coding genes, noncoding RNA genes, and<br />

pseudogenes. We produced tissue- and sample-specific<br />

transcript sets from RNA-seq data in the public<br />

archives. We updated the multi-species whole-genome<br />

alignments, gene trees and orthologues in Ensembl to<br />

include these new assemblies.<br />

We extended our gene-annotation methods to include<br />

annotation on lincRNA genes. We applied this method<br />

to human, mouse, rat and sheep and will be producing<br />

lincRNA annotation for more species in 2016.<br />

TreeFam produces phylogenetic trees and orthology<br />

predictions for all Ensembl eukaryotes. The number<br />

of publicly available genomes is increasingly rapidly,<br />

providing an opportunity for new insights via<br />

comparative genomics. To achieve scalability, we<br />

designed a novel workflow that will classify protein<br />

sequences from thousands of genomes into gene<br />

families in a quick and robust manner. This workflow is<br />

now partially in production and uses our new library of<br />

Hidden Markov Model (HMM) profiles.<br />

Our comprehensive genome annotations are the<br />

foundation for myriad downstream analysis tools<br />

and research, including the Ensembl Variation Effect<br />

Predictor (VEP). Access to consistent gene annotation<br />

for a wide range of vertebrates is important for<br />

evolutionary studies. Members of our team collaborated<br />

with others in studying gene families in the vervet<br />

(African green monkey) lineage, using freely available<br />

data from ten of our annotation projects.<br />

87<br />

<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!