Functional genomicsThe Functional Genomics team provides bioinformatics services andconducts research in gene expression and high-throughputsequencing applications.We participate in software development related to biomedical informatics and systems microscopy and are responsible forthe Expression Atlas, which is growing to include proteomics and metabolomics data. Together with our Production andDevelopment teams, we develop ArrayExpress, the archive of functional genomics data and the <strong>EBI</strong> BioSamples Database.We also contribute substantially to training in transcriptomics and the use of <strong>EMBL</strong>-<strong>EBI</strong> bioinformatics tools.Our research efforts centre on developing new methods and algorithms for analysing gene expression data and integratingdifferent types of data across multiple platforms. We are particularly interested in cancer genomics and transcript isoformusage, and collaborate closely with the Marioni group and others throughout <strong>EMBL</strong>.Major achievementsThe co-ordination of Expression Atlas development was takenover by Robert Petryszak, following the departure of MishaKaphushesky in early <strong>2012</strong>. The major focus was on planningand prototype development for the baseline Expression Atlas,which utilises high-throughput-sequencing-based expressiondata to report absolute (rather than relative) gene expressionlevels. We also started working on the close integration ofgene-expression, protein-expression and metabolitemeasurementdata.Major developments in the ArrayExpress Archive and the <strong>EBI</strong>BioSamples Database are described by Helen Parkinson andUgis Sarkans (see also Rustici et al., 2013 and Gostev atal., <strong>2012</strong>).In <strong>2012</strong> we organised and participated in over 25 trainingevents, including Bioinformatics Roadshows and on-sitecourses. These included the EMBO practical course on theanalysis of high-throughput sequencing data, which was themost popular and oversubscribed training event in <strong>2012</strong>.Our team developed a prototype database for systemsmicroscopy data and loaded seven datasets from projectpartners in the Systems Microscopy Network of Excellence,which was funded under the EU’s Seventh FrameworkProgramme (FP7).As a part of our participation in the GEUVADIS project(funded by the FP7), we analysed mRNA and small RNAfrom lymphoblastoid cell lines of 465 individuals whoparticipated in the 1000 Genomes Project. Our group led theanalysis of transcript isoform use and fusion gene discovery.By integrating RNA and DNA sequencing data, we wereable to link gene expression and genetic variation, and tocharacterise mRNA and miRNA variation in several humanpopulations. All of the data generated in the project areavailable through ArrayExpress.The human transcriptome contains in excess of 100 000different transcripts. We analysed transcript compositionin 16 human tissues and five cell lines to show that, in agiven condition, most protein coding genes have one majortranscript expressed at significantly higher level than others,and that in human tissues the major transcripts contributealmost 85% to the total mRNA. We also found that thesame major transcript is often expressed in many tissues.These observations can help prioritise candidate targets inproteomics research and help predict the functional impactof the detected changes in variation studies. Our findings,submitted for publication in <strong>2012</strong>, point towards a lowerdegree of transcriptome complexity than recently estimated.Other research includes an exploration of the utility of geneexpression data in the public domain (Rung and Brazma,<strong>2012</strong>).Angela Goncalves, a PhD student in the Functional Genomicsgroup, gained her doctorate in <strong>2012</strong> and published some ofher major findings in Nature Genetics and Genome Research(Goncalves, et al. <strong>2012</strong>).22 <strong>2012</strong> <strong>EMBL</strong>-<strong>EBI</strong> <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>
Alvis BrazmaPhD in Computer Science, Moscow StateUniversity, 1987. Postdoctoral research at NewMexico State University, US.At <strong>EMBL</strong>-<strong>EBI</strong> since 1997.Future plansIn <strong>2012</strong> we began the process of integrating ExpressionAtlas data with PRIDE proteomics data, and this processwill continue in 2013. We will also undertake to integratethe metabolomics data in MetaboLights with the ExpressionAtlas. Our research will continue to focus on large-scale dataintegration and systems biology. We will develop methodsfor RNA-seq data analysis and processing, and apply theseto address important biological questions, such as the roleof alternative splicing and splicing mechanisms. Togetherwith our colleagues at the International Cancer GenomeConsortium, we will investigate the impact of cancer genomeson functional changes in cancer development and explorefusion genes and their role in cancer development.References citedRustici, G., et al. (2013) ArrayExpress update—trends indatabase growth and links to data analysis tools. NucleicAcids Res 41(D1), D987-D990.Selected publicationsRung, J. and Brazma, A. (<strong>2012</strong>) Reuse of public genome-widegene expression data. Nat Rev Genet doi: 10.1038/nrg3394.Fonseca, N.A., et al. (<strong>2012</strong>) Tools for mapping high-throughputsequencing data. Bioinformatics 28, 3169-3177.Goncalves, A., et al. (<strong>2012</strong>) Extensive compensatory cis-transregulation in the evolution of mouse gene expression. GenomeRes 22, 2376-2384.Gostev, M., et al. (<strong>2012</strong>) The BioSample Database (BioSD) atthe European Bioinformatics Institute. Nucleic Acids Res 40(D1), D64-D70. doi: 10.1093/nar/gkr937.Kapushesky, M., et al. (<strong>2012</strong>) Gene Expression Atlasupdate—a value-added database of microarray andsequencing-based functional genomics experiments. NucleicAcids Res 40 (D1), D1077-D1081.Kutter, C., et al. (2011) Pol III binding in six mammals showsconservation among amino acid isotypes despite divergenceamong tRNA genes. Nat Genet 43, 948-955.Figure. Prototype of thebaseline Expression Atlas.<strong>2012</strong> <strong>EMBL</strong>-<strong>EBI</strong> <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>23
- Page 1 and 2: EMBL-European Bioinformatics Instit
- Page 3: Table of contentsIntroduction & ove
- Page 6 and 7: EMBL-EBI 2012It was a year of trans
- Page 8 and 9: New service developments• Underst
- Page 10 and 11: Organisation ofEMBL-EBI Leadership
- Page 12 and 13: dGenes, genomes and variationThe Eu
- Page 14 and 15: dGenes, genomes and variationSummar
- Page 16 and 17: European Nucleotide ArchiveOur team
- Page 18 and 19: Vertebrate genomicsThe Vertebrate G
- Page 20 and 21: Nonvertebrate genomicsWe provide to
- Page 22 and 23: gMolecular atlasLife scientists are
- Page 26 and 27: Functional genomics productionOur t
- Page 28 and 29: Functional genomics developmentOur
- Page 30 and 31: PProteins and protein familiesUniPr
- Page 32 and 33: UniProt contentOne of the central a
- Page 34 and 35: UniProt developmentOur team provide
- Page 36 and 37: InterProOur team co-ordinates the I
- Page 38 and 39: sMolecular and cellular structureUn
- Page 40 and 41: Protein Data Bank in EuropeThe majo
- Page 42 and 43: PDBe content and integrationOur goa
- Page 44 and 45: PDBe databases and servicesOur team
- Page 46 and 47: yMolecular systemsThe genes and gen
- Page 48 and 49: Proteomics servicesThe Proteomics S
- Page 50 and 51: Chemical biologyThe importance of s
- Page 52 and 53: ChEMBLThe ChEMBL team develops and
- Page 54 and 55: Cheminformatics and metabolismOur t
- Page 56 and 57: cCross-domain toolsand resourcesSci
- Page 58 and 59: Literature servicesScientific liter
- Page 60 and 61: Research2012 has seen the further t
- Page 62 and 63: Bertone groupPluripotency, reprogra
- Page 64 and 65: Birney groupNucleotide dataDNA sequ
- Page 66 and 67: Enright groupFunctional genomics an
- Page 68 and 69: Goldman groupEvolutionary tools for
- Page 70 and 71: Le Novère groupComputational syste
- Page 72 and 73: Luscombe groupGenomics and regulato
- Page 74 and 75:
Marioni groupComputational and evol
- Page 76 and 77:
Rebholz groupPhenotypes and multili
- Page 78 and 79:
Saez-Rodriguez groupSystems biomedi
- Page 80 and 81:
Thornton groupProteins: structure,
- Page 82 and 83:
The EMBL International PhDProgramme
- Page 84 and 85:
SupportOur support teams provide fo
- Page 86 and 87:
T TrainingAs part of EMBL-EBI’s m
- Page 88 and 89:
IIndustry programmeSince 1996 the I
- Page 90 and 91:
NExternal relationsAs a European In
- Page 92 and 93:
sExternal servicesOur team manages
- Page 94 and 95:
SSystems and networkingOur team man
- Page 96 and 97:
q AdministrationThe EMBL-EBI Admini
- Page 98 and 99:
Funding and resource allocationDesp
- Page 100 and 101:
Growth of core resourcesIn 2012 the
- Page 102 and 103:
CollaborationsEMBL-EBI is a highly
- Page 104 and 105:
Staff growthOur organisational stru
- Page 106 and 107:
Scientific advisory commiteesEMBL S
- Page 108 and 109:
The International Nucleotide Sequen
- Page 110 and 111:
EMDataBank Advisory Committee• Jo
- Page 112 and 113:
Major database collaborationsARRAYE
- Page 114 and 115:
THE GENE ONTOLOGY CONSORTIUM• Agb
- Page 116 and 117:
REACTOME• New York University Med
- Page 118 and 119:
Publications in 2012In 2012, EMBL-E
- Page 120 and 121:
Doreleijers, J. F., Vranken W. F.,
- Page 122 and 123:
Kruger, F. A., Rostom R. and Overin
- Page 124 and 125:
Sahakyan, Aleksandr B., Cavalli And
- Page 128:
EMBL - European Bioinformatics Inst