Computational tools and Interoperability in Comparative ... - CBS

More documents

Recommendations

Info

Web Services and Interoperability in Genomics 4.4 ENCODE pipeline: applying Web Services ENCODE (the Encyclopedia Of DNA Elements) was launched in September 2003 by the National Human Genome Research Institute. The goal was to identify all functional elements in the human genome sequence. In the pilot phase 1 percent (30 Mb) from 44 selected regions of the human genome has been analysed by ENCODE consortium researchers (Birney et al., 2007). GENCODE is a sub-project of ENCODE, which seeks to identify all protein-coding genes in the ENCODE selected regions. For each protein coding gene this means the delineation of a complete mRNA sequence for at least one splice isoform, and often for a number of additional alternative splice forms. The contributions from the BioSapiens partners are focused on information from a protein annotation perspective. Special attention is given to the potential aspect of alternative splicing and the putative effect it has on functional diversification of genes. In the pilot phase of the Biosapiens project the properties of the coding sequences for the 44 regions have been analyzed by the Biosapiens partners separately. The results from single groups were collected and the main findings were published (Tress et al., 2007). Furthermore the entire collection of annotations created by all partners was made available as supplementary material for the publication. In the current phase of the BioSapiens project the goal is establish a scale-up of the annotation approach applied to the pilot ENCODE sequences to cover the 100% of the human genome, including all the isoforms. For the scale-up, the ENCODE Pipeline (EPipe) was constructed (this Biosapiens deliverable), which is a WWW service that allows researchers to compare functional annotations for all splice variants of a given gene in an automatic way, or alternatively use it for analysis of mutated sequence variants containing SNPs. The author of this thesis. This author has been responsible for the development of the main parts of the EPipe software as well as for implementing a large part of the modules (feature predictors). The EPipe projects is an ongoing effort which has involved a number of people during its development. 4.4.1 Collecting Web Services clients in EPipe EPipe uses a number of local and remote resources for protein feature prediction. The ability of EPipe to connect to remote resources via Web Services is incorporated within the individual modules. This put a great deal of flexibility as to which resourses to support (e.g. BioMoby, SOAP etc). The pipeline is shown in figure 4.3. EPipe itself is offered both as a SOAP web service (http://www.cbs.dtu.dk/ws/ EPipe and a traditional web interfece (http://www.cbs.dtu.dk/services/EPipe). A schematic overview of the workflow in EPipe is shown in figure 4.4. 4.4.2 Mapping Pfam annotations to protein structure: mecA In Staphylococcus aureus the mecA gene encodes a penicillin-binding protein (PBP2a), resulting in Methicillin resistance (Ender et al., 2009). The EPipe software can be used to map a range of different relevant features onto the protein structure, in order to visualize differences between homologs of this protein. In this example however, a single MecR1 protein from Staphylococcus aureus strain A5937, GenBank accession no. EEV85461, is processed. Figure 4.5 shows the structure browser of EPipe which allows the user to browse the different features that are predicted, by showing the mapping onto the protein structure. Here, the three Pfam domains Transpeptidase, MecA N, and PBP dimer appear as significant hits. 151
ENCODE pipeline: applying Web Services Input sequences Cache filter BLAST against PDB individually Cache filter Cache filter Cache filter Cache filter module IV alignment module I module II module III Positional features Non-positional features Alignment dependent module X Map feature coordinates to alignment Map features onto best structure XML of all results Cache filter Render images in parallel and present to output pages Table of nonpositional features Conclusion table Plot alignment and positions having different feature configuration Plot alignment and features with remapped coordinates Similarity in feature space Figure 4.3: Schematic layout of the ENCODE pipeline, EPipe. The main program ensures that as much as possible is dispatched in parrallel. Modules may either be alignment dependent or not. If the alignment is required to predict the protein features, the module is not launched until the alignment algorithm has finished. Modules may either return global features of the entire protein (e.g. cellular localization), or return positional features (e.g. phosphorylation sites). 152
Page 1 and 2:
Peter Fischer Hallin | 2009 Peter F
Page 4:
Preface This Ph.D. thesis is writte
Page 7 and 8:
thesis, the work is just being publ
Page 9 and 10:
ved at blive publiceret i Standards
Page 11 and 12:
viii
Page 13 and 14:
Paper VI [Lagesen K, Hallin P] 1 ,
Page 15 and 16:
xii
Page 17 and 18:
3.3.3 Refining E. coli and Shigella
Page 19 and 20:
xvi
Page 21 and 22:
xviii 2.17 Pan- and core-genome plo
Page 24 and 25:
Chapter 1 Introduction Introduction
Page 26 and 27:
Chapter 2 Comparative Genomics 2.1
Page 28 and 29:
Comparative Genomics the publicly a
Page 30 and 31:
Comparative Genomics source CDS tot
Page 32 and 33:
Comparative Genomics 1 mysql -N -B
Page 34 and 35:
Listing 2.8: R code to generate a 2
Page 36 and 37:
1st U C A G U 2nd position C A G 3r
Page 38 and 39:
1st U C A G U 2nd position C A G 3r
Page 40 and 41:
Escherichia coli strain K-12, subst
Page 42 and 43:
Comparative Genomics
Page 44 and 45:
3M 2.5M 3.5M 2.5M 2M 0M 2M 0.5M B.
Page 46 and 47:
Streptococcus Escherichia Bacillus
Page 48 and 49:
2.4 Summary Comparative Genomics Th
Page 50 and 51:
Comparative Genomics 2.5 Instant in
Page 52 and 53:
‘ReSourCe is he best online submi
Page 54 and 55:
up to a total of 41 different E. co
Page 56 and 57:
Fig. 2 Genes (or segments) from eac
Page 58 and 59:
Fig. 5 BLASTatlas of Clostridium bo
Page 60 and 61:
different applications, such as ide
Page 62 and 63:
1 Comparative Genomics 2.7 Paper II
Page 64 and 65:
166 literally millions of bacterial
Page 66 and 67:
168
Page 68 and 69:
170 resistance genes on mobile gene
Page 70 and 71:
172 involved in generating diversit
Page 72 and 73:
174 recipient DNA. A feature observ
Page 74 and 75:
176 Fig. 5 Genome length distributi
Page 76 and 77:
178
Page 78 and 79:
180 reasons why organisms remain un
Page 80 and 81:
182 A final problem has to do with
Page 82 and 83:
184 Middendorf B, Hochhut B, Leipol
Page 84 and 85:
1 Comparative Genomics 2.8 Paper II
Page 86 and 87:
2 O. N. Reva et al. Fig. 1. Genome
Page 88 and 89:
4 O. N. Reva et al. decrease of the
Page 90 and 91:
6 O. N. Reva et al. of which are kn
Page 92 and 93:
8 O. N. Reva et al. compiled into a
Page 94 and 95:
10 O. N. Reva et al. encoded by a c
Page 96 and 97:
12 O. N. Reva et al. systems and ef
Page 98 and 99:
1 2.9 Paper IV: The origins of Vibr
Page 100 and 101:
phylogenies based on alternative ho
Page 102 and 103:
Figure 1 Phylogenetic tree of the 1
Page 104 and 105:
25000 20000 15000 10000 5000 0 Pan
Page 106 and 107:
Gap F 2M 2.5M Gap E 875k 750k 625k
Page 108 and 109:
Table 2 A selection of genes locate
Page 110 and 111:
Open Access This article is distrib
Page 112 and 113:
1 Comparative Genomics 2.10 Paper V
Page 114 and 115:
4314 74 Tools Abstract: Of the plet
Page 116 and 117:
4316 74 Tools Size distribution of
Page 118 and 119:
4318 74 Tools Genome atlas Intrinsi
Page 120 and 121:
4320 74 Tools Genome atlas Intrinsi
Page 122 and 123:
4322 74 Tools for Comparison of Bac
Page 124 and 125: 4324 74 Tools for Comparison of Bac
Page 126 and 127: 4326 74 Tools information, as genet
Page 128 and 129: Chapter 3 rRNA operons and promoter
Page 130 and 131: tuB murI Fis III Fis II Fis I UP -3
Page 132 and 133: Bits 2.0 1.5 1.0 0.5 0.0 Bits 2.0 1
Page 134 and 135: RNA operons and promoter analysis O
Page 136 and 137: Bits 2.0 1.5 1.0 0.5 0.0 Bits T A T
Page 138 and 139: Code Meaning Example C Coding CCCCC
Page 140 and 141: RNA operons and promoter analysis 3
Page 142 and 143: P2 -10 -35 UP P1 -10 -35 UP FIS FIS
Page 144 and 145: 1 rRNA operons and promoter analysi
Page 146 and 147: Using HMMs also simplifies the use
Page 148 and 149: Information content Information con
Page 150 and 151: of the annotation. Some of the majo
Page 152 and 153: where match states stop around 10 c
Page 154 and 155: 1 rRNA operons and promoter analysi
Page 156 and 157: synthesis in flow cells to simultan
Page 158 and 159: Read absence. A boolean where ‘on
Page 160 and 161: Hallin, et al. Figure 4 | The dataf
Page 162 and 163: Genome homology: Comparing multiple
Page 164 and 165: ing platform-‐independent Java
Page 166 and 167: 34. Wang H, Noordewier M, Benham CJ
Page 168 and 169: Chapter 4 Web Services and Interope
Page 170 and 171: Web Services and Interoperability i
Page 178 and 179: Chapter 5 Conclusion and perspectiv
Page 180 and 181: Appendix A Appendix: Workshops, tea
Page 182 and 183: Appendix B Appendix: Ph.D. study pl
Page 184 and 185: Danmarks Tekniske Universitet AFI,
Page 186 and 187: Danmarks Tekniske Universitet AFI,
Page 188 and 189: Appendix C Appendix: Courses C.1 Gl
Page 190 and 191: D.2 Sample output from queryGenomes
Page 192 and 193: Appendix: Software 13 w a r n " $ o
Page 194 and 195: Appendix: Software 109 m y ( $ m i
Page 196 and 197: Appendix: Software 25 [ ] A l t e r
Page 198 and 199: BIBLIOGRAPHY J. Rogers, P. F. Stadl
Page 200 and 201: BIBLIOGRAPHY Q. Jin, Z. Yuan, J. Xu
Page 202 and 203: BIBLIOGRAPHY Velicer, F.-J. Vorholt
show all

Computational tools and Interoperability in Comparative ... - CBS

Create successful ePaper yourself

Delete template?

Save as template?