EMBL-EBI Annual Scientific Report 2012

More documents

Recommendations

Info

InterProOur team co-ordinates the InterPro and Metagenomics projects atEMBL-EBI. InterPro integrates protein data from 11 major sources,classifying them into families and predicting the presence of domains andfunctionally important sites.InterPro has a number of important applications, including the automatic annotation of proteins for UniProtKB/TrEMBL andgenome annotation projects. InterPro is used by Ensembl and in the GOA project to provide large-scale mapping of proteins toGO terms.Metagenomics is the study of the sum of genetic material found in an environmental sample or host species, typically usingnext-generation sequencing (NGS) technology. The Metagenomics Portal, a resource established at EMBL-EBI in 2011,enables metagenomics researchers to submit sequence data and associated descriptive metadata to the public nucleotidearchives. Deposited data is subsequently functionally analysed using an InterPro-based pipeline, and the results generated arevisualised via a web interface.Major achievementsWe redesigned and re-launched the InterPro website inlate 2012, and played a key role in the EMBL-EBI websiteredesign process. We also built a new InterPro search facilitythat utilises the central EBI search engine. Search results arenow much easier to interpret and browse: the engine behavesin a Google-like manner, allowing users to enter wildcards(e.g., * and ?), use logic (AND or NOT), search with singlewords or phrases and quickly select subsets of the resultsusing faceted filtering. InterPro results are now paginated andhighlight the context of the query terms.The new EMBL-EBI website, which will launch in early 2013,features improved discoverability of InterPro and otherresources. Global EBI search results are shown in categorieson local search pages to encourage users to explore the datain different ways.In 2012 we moved the InterPro DAS and BioMart services tothe London Data Centres; the main InterPro website will jointhem there shortly.The InterPro database continues to benefit from improvedcoverage of UniProtKB proteins, increasing to 80.8% in thelatest release (v. 40.0). This is partly due to significant datacuration and integration efforts, which led to an additional2355 signatures being incorporated into the databasein 2012.Focussed curation of InterPro2GO term associations ledto 334 additional entries being assigned GO terms; 44%of entries now have at least one term associated. The totalnumber of GO mappings has increased by 838, despite aconcerted effort to remove terms that are too general (andtherefore uninformative) or erroneously mapped. In 2012 wepublished the first paper describing how this highly utilisedannotation resource is created and maintained.InterProScan5 is poised to take over as the main InterProscanning software in 2013. Multiple release candidates weremade publicly available in 2012, each containing new featuresand improved implementation.InterPro Scan 5:release candidate 4 features• Search all 11 member databases, plus four additionalalgorithms: Phobius, TMHMM, Coils and SignalPv4;• Predict potential membership of a protein in a pathwaybased on InterPro results;• Use a BerkeleyDB-based protein match look-up servicethat reduces calculation overheads by only searchingsequences not already found in UniProtKB (install thislocally or query the EBI-hosted service);34 2012 EMBL-EBI Annual Scientific Report
Sarah HunterMSc University of Manchester, 1998.Pharmaceutical and Biotech Industry (Sweden),1999–2005.At EMBL-EBI since 2005. Team Leader since 2007.• Use multiple output formats: HTML, GFF3, XML, TSV andSVG;• Run it ‘out of the box’ on any Linux machine with minimalconfiguration, and utilise cluster-queuing technologies;• Handle both protein and nucleotide sequences, withresults mapped back to the original sequence.EBI Metagenomics reached 20 public metagenomics projectsin 2012, comprising 131 separate samples and a significantnumber of privately held studies. In collaboration with theEuropean Nucleotide Archive, we developed a system forthe submission of sequence files and minimum-standardscompliantmetadata. We expanded the initial analysis pipelinefrom quality control, clustering, CDS prediction and functionalclassification steps to include an rRNA prediction step (usingrRNAselector) and taxonomic diversity estimation, usingthe Qiime software. We are investigating Taverna for thestructuring and managing the complex workflows used in theanalysis pipeline (see Figure) and in 2012 developed a utility tointegrate Taverna processes with the LSF queue system.Our work on the organisation and display of data on thewebsite has made it easier for users to access analysisresults. In addition, we developed a metagenomics ‘GO slim’(a subset of GO terms particularly useful to metagenomics) toassist users in their interpretation of function prediction results.The data can be downloaded in a variety of formats, andwe have made it possible to download sequences that arefunctionally classified by the resource or remain of unknownfunction.Future plansTo facilitate the move of the InterPro website to the LondonData Centres in early 2013, we have re-written the InterProrelational database into a data warehouse structure. Thissimplifies the web application code written to access the data,and greatly reduces the amount of down-time experiencedby our curation team during release. Together with the officialrelease of InterProScan5, we expect these developments tosimplify our data-production processes. InterProScan5 will beused by the EBI-hosted installation, completing the five-yeareffort to re-architecture the InterPro resource.We are designing and testing new EBI Metagenomicswebpages that will help users visualise taxonomic predictiondata from a variety of experiment types (i.e., shotgunFigure. The analysis workflow for a shotgun metagenomicsexperiment, as processed by EBI Metagenomics.metagenomics, amplicon-based marker gene analysis,metatranscriptomics). We believe these changes, to beimplemented in 2013, will provide a more complete suite ofanalysis tools, bringing us in line with competing resources.We will transition our pipeline fully into the Taverna software,simplifying maintenance and offering multiple workflows,depending on the environment that has been sequenced.Finally, we will encourage data submission to the repository toincrease the coverage of the experiments carried out by themetagenomics community.Selected publicationsBurge, S., et al. (2012) Manual GO annotation of predictiveprotein signatures: the InterPro approach to GO curation.Database (Oxford) 2012, bar068.Lewis, T.E., et al. (2012) Genome3D: a UK collaborativeproject to annotate genomic sequences with predicted 3Dstructures based on SCOP and CATH domains. Nucleic AcidsRes 41 (D1), D499-507.Salazar, G.A., et al. (2012) MyDas, an Extensible Java DASServer. PLoS One 7, e44180.Hunter, C., et al. (2012) Metagenomic analysis: the challengeof the data bonanza. Brief Bioinform 13, 743-746.2012 EMBL-EBI Annual Scientific Report35
Page 1 and 2: EMBL-European Bioinformatics Instit
Page 3: Table of contentsIntroduction & ove
Page 6 and 7: EMBL-EBI 2012It was a year of trans
Page 8 and 9: New service developments• Underst
Page 10 and 11: Organisation ofEMBL-EBI Leadership
Page 12 and 13: dGenes, genomes and variationThe Eu
Page 14 and 15: dGenes, genomes and variationSummar
Page 16 and 17: European Nucleotide ArchiveOur team
Page 18 and 19: Vertebrate genomicsThe Vertebrate G
Page 20 and 21: Nonvertebrate genomicsWe provide to
Page 22 and 23: gMolecular atlasLife scientists are
Page 24 and 25: Functional genomicsThe Functional G
Page 26 and 27: Functional genomics productionOur t
Page 28 and 29: Functional genomics developmentOur
Page 30 and 31: PProteins and protein familiesUniPr
Page 32 and 33: UniProt contentOne of the central a
Page 34 and 35: UniProt developmentOur team provide
Page 38 and 39: sMolecular and cellular structureUn
Page 40 and 41: Protein Data Bank in EuropeThe majo
Page 42 and 43: PDBe content and integrationOur goa
Page 44 and 45: PDBe databases and servicesOur team
Page 46 and 47: yMolecular systemsThe genes and gen
Page 48 and 49: Proteomics servicesThe Proteomics S
Page 50 and 51: Chemical biologyThe importance of s
Page 52 and 53: ChEMBLThe ChEMBL team develops and
Page 54 and 55: Cheminformatics and metabolismOur t
Page 56 and 57: cCross-domain toolsand resourcesSci
Page 58 and 59: Literature servicesScientific liter
Page 60 and 61: Research2012 has seen the further t
Page 62 and 63: Bertone groupPluripotency, reprogra
Page 64 and 65: Birney groupNucleotide dataDNA sequ
Page 66 and 67: Enright groupFunctional genomics an
Page 68 and 69: Goldman groupEvolutionary tools for
Page 70 and 71: Le Novère groupComputational syste
Page 72 and 73: Luscombe groupGenomics and regulato
Page 74 and 75: Marioni groupComputational and evol
Page 76 and 77: Rebholz groupPhenotypes and multili
Page 78 and 79: Saez-Rodriguez groupSystems biomedi
Page 80 and 81: Thornton groupProteins: structure,
Page 82 and 83: The EMBL International PhDProgramme
Page 84 and 85: SupportOur support teams provide fo
Page 86 and 87:
T TrainingAs part of EMBL-EBI’s m
Page 88 and 89:
IIndustry programmeSince 1996 the I
Page 90 and 91:
NExternal relationsAs a European In
Page 92 and 93:
sExternal servicesOur team manages
Page 94 and 95:
SSystems and networkingOur team man
Page 96 and 97:
q AdministrationThe EMBL-EBI Admini
Page 98 and 99:
Funding and resource allocationDesp
Page 100 and 101:
Growth of core resourcesIn 2012 the
Page 102 and 103:
CollaborationsEMBL-EBI is a highly
Page 104 and 105:
Staff growthOur organisational stru
Page 106 and 107:
Scientific advisory commiteesEMBL S
Page 108 and 109:
The International Nucleotide Sequen
Page 110 and 111:
EMDataBank Advisory Committee• Jo
Page 112 and 113:
Major database collaborationsARRAYE
Page 114 and 115:
THE GENE ONTOLOGY CONSORTIUM• Agb
Page 116 and 117:
REACTOME• New York University Med
Page 118 and 119:
Publications in 2012In 2012, EMBL-E
Page 120 and 121:
Doreleijers, J. F., Vranken W. F.,
Page 122 and 123:
Kruger, F. A., Rostom R. and Overin
Page 124 and 125:
Sahakyan, Aleksandr B., Cavalli And
Page 128:
EMBL - European Bioinformatics Inst
show all

EMBL-EBI Annual Scientific Report 2012

Create successful ePaper yourself

Delete template?

Save as template?