Annual Scientific Report 2015

Recommendations

Info

Vertebrate Annotation The Vertebrate Annotation team aims to create comprehensive, up-to-date gene annotation and comparative genomics resources that further our understanding of biology, evolution and the mechanisms of disease. The data from these resources are distributed by the Ensembl project, and provide a foundation for clinical and research communities. While the reference genome assembly is an increasingly important tool for research, most scientists working in this area need to link genomic sequence to biological function. This is made possible by gene annotations, which identify the location, structure and expression of genes. Valuable insights can be gained by comparing the annotated sequences of individuals of the same species, or across a wide range of species. Our team produces reference gene annotation that is used by clinical, agricultural and research communities as well as by other data services at EMBL-EBI. Our gene annotation service provides high-quality gene sets for human, mouse and almost 100 other vertebrate species, including key model organisms and farmed animals. These are the primary annotations used in the initial genomic analyses of many international genome projects. In collaboration with GENCODE, we produce the gold-standard gene annotation for human and mouse. For every genome assembly in Ensembl, we also produce comparative genomics resources that link diverse species at the DNA and gene level. These data are used for investigating gene function and evolution, adaptive traits and conservation biology. We are responsible for TreeFam, which clusters similar gene sequences into homologous families and indicates gene history events such as duplication and speciation. Our comparative annotations include gene families, gene orthologues, whole genome multi-species alignments and conserved genomic regions across many species. We develop alignment and annotation methods that integrate diverse data from the public archives. We collaborate closely with other service teams at EMBl-EBI including the ENA, UniProt, and Expression Atlas to annotate new assemblies and to update annotation on existing assemblies as new data become available. Major achievements We are proud to maintain the reference human and mouse gene sets through our GENCODE collaboration. In 2015, we improved and updated these gene resources by annotating assembly updates provided by the Genome Reference Consortium (GRC), incorporating new manual annotation, identifying gene alleles across the various haplotypes, and contributing to the CCDS project. We released major updates to the rat and zebrafish resources. We made new assemblies available for both species, and produced genome-wide annotation for them using our evidence-based methods to identify protein-coding genes, noncoding RNA genes, and pseudogenes. We produced tissue- and sample-specific transcript sets from RNA-seq data in the public archives. We updated the multi-species whole-genome alignments, gene trees and orthologues in Ensembl to include these new assemblies. We extended our gene-annotation methods to include annotation on lincRNA genes. We applied this method to human, mouse, rat and sheep and will be producing lincRNA annotation for more species in 2016. TreeFam produces phylogenetic trees and orthology predictions for all Ensembl eukaryotes. The number of publicly available genomes is increasingly rapidly, providing an opportunity for new insights via comparative genomics. To achieve scalability, we designed a novel workflow that will classify protein sequences from thousands of genomes into gene families in a quick and robust manner. This workflow is now partially in production and uses our new library of Hidden Markov Model (HMM) profiles. Our comprehensive genome annotations are the foundation for myriad downstream analysis tools and research, including the Ensembl Variation Effect Predictor (VEP). Access to consistent gene annotation for a wide range of vertebrates is important for evolutionary studies. Members of our team collaborated with others in studying gene families in the vervet (African green monkey) lineage, using freely available data from ten of our annotation projects. 87 2015 EMBL-EBI Annual Scientific Report
Future plans We will continue to develop methods for producing high-quality genome annotation, and to produce world-leading reference gene sets and comparative resources including TreeFam gene trees, orthologues, whole-genome multi-species alignments, and conserved regions. In 2016 we plan to release the first genome assemblies annotated using our new large-scale annotation pipeline. Our goal is to improve scalability so that we can produce gene annotation in a fraction of the time it currently takes. This will accomodate the increased number of genomes being sequenced (Genome 10K Community of Scientists, 2009), which require consistent, efficient, highly automated annotation solutions that enable intraand inter-species genome comparisons. For human and mouse, we will update the gold standard annotation regularly, including producing gene annotation on new alternate sequences from the GRC as they arrive. The GRC has expanded the definition of the reference human genome to include genomic sequence for additional haplotypes and gene alleles, and releases new alternate sequences on a regular basis. We will provide access to the most up-to-date gene annotation, and identify genes and alleles on the new alternate sequence that could not otherwise be represented. New technologies are giving rise to an increasing amount of genome-wide information on how transcript isoforms are expressed in various tissues, cells or developmental stages. Our transcript-reconstruction method builds transcript models using only genomic sequence and transcriptome reads as input, allowing us to identify novel genes. We will refine this method and use it to annotate incoming genome assemblies. We will also develop methods for long-read transcriptome data such as Bronwen Aken Ensembl Vertebrate Annotation BSc in Molecular and Cell Biology, University of Cape Town, South Africa. MSc in Bioinformatics and Computational Biology, Rhodes University, South Africa. Ensembl team member since 2005; at EMBL-EBI since 2014. Team leader since 2015. PacBio. This will allow us to better annotate full-length transcript isoforms, mapping them directly to the genome. We will further develop our scalable TreeFam workflow to release gene families for all Ensembl eukaryotes, and update our gene trees, orthologues, and other comparative resources. We will explore future applications for our TreeFam HMM resource, including analysis and annotation of incoming genome assemblies. As Ensembl aims to incorporate externally annotated gene sets, our scalability enhancements to TreeFam will allow us to link a broad range of eukaryotic species. As more species are added to a whole-genome alignment, scalability of storage and accessibility have become an issue. Together with colleagues at the University of California Santa Cruz (UCSC), who are developing a new aligner (Cactus) and a new file format (HAL) to address scalability issues, we are committed to creating a shared alignment process that will scale well and ensure consistent whole-genome alignment data between the UCSC browser and Ensembl. As we develop our methods and workflows, we will continue to distribute our software to all groups who wish to run them on their species of choice and make the process of deploying these pipelines progressively easier. Selected publications Warren WC, et al. (2015) The genome of the vervet (Chlorocebus aethiops sabaeus). Genome Res. 25:1921-33 Eöry L, et al. (2015) Avianbase: a community resource for bird genomics. Genome Biol. 16:21. Church DM, et al. (2015) Extending reference assembly models. Genome Biol. 16:13 Boeckmann B, et al (2015) Quest for orthologs entails quest for tree of life: In Search of the Gene Stream. Genome Biol Evol. 7:1988-99 Tan G, Muffato M, et al (2015). Current methods for automated filtering of Multiple Sequence Alignments frequently worsen single-gene phylogenetic inference. Syst Biol. 64:778-91 Cunningham F, Amode MR, Barrell D, et al. (2015) Ensembl 2015. Nucleic Acids Res. 43:D662-9 Figure 1: We released a new, interactive gene gain/loss widget in the Ensembl browser. The coloured nodes can be clicked to reveal the time of divergence between taxa. 2015 EMBL-EBI Annual Scientific Report 88
Page 1 and 2:
The European Bioinformatics Institu
Page 3 and 4:
SERVICE TEAMS TRAINING PROGRAMME RE
Page 5 and 6:
Foreword We are pleased to present
Page 7 and 8:
awareness amongst some of our stron
Page 9 and 10:
Chemical biology The 17 million nov
Page 11 and 12:
The most extensive catalogue of str
Page 13 and 14:
“ EMBL -EBI services are the back
Page 15 and 16:
European Nucleotide Archive The ENA
Page 17 and 18:
Vertebrate Genomics Paul Flicek Bro
Page 19 and 20:
Functional Genomics Alvis Brazma
Page 21 and 22:
Pfam Pfam is a database of protein
Page 23 and 24:
Protein Data Bank in Europe Gerard
Page 25 and 26:
MetaboLights MetaboLights is a data
Page 27 and 28:
Proteomics Services and Molecular I
Page 29 and 30:
BioSamples The BioSamples database
Page 31 and 32:
“ EMBL -EBI is a critical mass of
Page 33 and 34:
EMBL International PhD Programme at
Page 35 and 36:
“ It would be a considerable loss
Page 37 and 38: The Birney group used methods devel
Page 39 and 40: Marioni group • Improved and exte
Page 41 and 42: “ Because I work for a micro biot
Page 43 and 44: Industry workshops • In silico AD
Page 45 and 46: The work of our institute relies on
Page 47 and 48: Web production Rodrigo Lopez System
Page 49 and 50: 2015 EMBL-EBI Annual Scientific Rep
Page 51 and 52: Capital investment Support from the
Page 53 and 54: In 2015 our core data resources con
Page 55 and 56: Joint publications Most of our 299
Page 57 and 58: One from Many: Perspectives on a Mu
Page 61 and 62: European Nucleotide Archive • Mar
Page 63 and 64: Technical Services Cluster Scientif
Page 65 and 66: Expression Atlas • Oregon State U
Page 67 and 68: Photo: Uma Maheswari 2015 EMBL-EBI
Page 71 and 72: 037. Chiapparino A, Maeda K, Turei
Page 73 and 74: 115. Jakubec D, Hostas J, Laskowski
Page 75 and 76: 192. Perez-Riverol Y, Xu QW, Wang R
Page 77 and 78: 269. van den Berg BA, Reinders MJ,
Page 79 and 80: Director Ewan Birney Admininstratio
Page 83 and 84: Guy Cochrane European Nucleotide Ar
Page 85 and 86: Vertebrate Genomics Research The mo
Page 87: Daniel Zerbino Ensembl Genome Analy
Page 91 and 92: Andy Yates Genome Technology and In
Page 93 and 94: Paul Kersey Non-vertebrate Genomics
Page 95 and 96: Justin Paschall Variation Archive M
Page 97 and 98: Alvis Brazma Functional Genomics Ph
Page 99 and 100: Ugis Sarkans Functional Genomics De
Page 101 and 102: Robert Petryszak Gene Expression MP
Page 103 and 104: Rob Finn Sequence Families PhD in B
Page 105 and 106: Maria-Jesus Martin Protein Function
Page 107 and 108: Claire O’Donovan Protein Function
Page 109 and 110: (such as the on-going EMDataBank Ma
Page 111 and 112: Sameer Velankar PDBe Content and In
Page 113 and 114: containing the mapping between comp
Page 115 and 116: of 14 leading European labs in Meta
Page 117 and 118: Henning Hermjakob Proteomic service
Page 119 and 120: coimmunoprecipitation coimmunopreci
Page 121 and 122: development of Europe PMC as a plat
Page 123 and 124: Mouse informatics In 2015 we contin
Page 127 and 128: Train online, EMBL-EBI’s web-base
Page 129 and 130: Nils Koelling Quantitative genetics
Page 133 and 134: Pedro Beltrao PhD in Biology, Unive
Page 135 and 136: Ewan Birney PhD 2000, Wellcome Trus
Page 137 and 138: Anton Enright PhD in Computational
Page 139 and 140:
Nick Goldman PhD University of Camb
Page 141 and 142:
John Marioni PhD in Applied Mathema
Page 143 and 144:
Julio-Saez Rodriguez PhD University
Page 145 and 146:
Oliver Stegle PhD in Physics, Unive
Page 147 and 148:
Future plans The Teichmann group wi
Page 149 and 150:
findings regarding association were
Page 151 and 152:
2015 EMBL-EBI Annual Scientific Rep
Page 153 and 154:
Future plans The Industry Programme
Page 155 and 156:
2015 EMBL-EBI Annual Scientific Rep
Page 157 and 158:
Reporting on usage We further devel
Page 159 and 160:
to find the support they need. The
Page 161 and 162:
Petteri Jokinen Systems & Networkin
Page 163 and 164:
Standby Facility and Database Disas
Page 165 and 166:
External Relations leads on brand a
Page 167 and 168:
Mark Green EMBL-EBI Administration
show all

Annual Scientific Report 2015

Create successful ePaper yourself

Delete template?

Save as template?