Computational tools and Interoperability in Comparative ... - CBS
Computational tools and Interoperability in Comparative ... - CBS
Computational tools and Interoperability in Comparative ... - CBS
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Web Services <strong>and</strong> <strong>Interoperability</strong> <strong>in</strong> Genomics<br />
4.4 ENCODE pipel<strong>in</strong>e: apply<strong>in</strong>g Web Services<br />
ENCODE (the Encyclopedia Of DNA Elements) was launched <strong>in</strong> September 2003 by<br />
the National Human Genome Research Institute. The goal was to identify all functional<br />
elements <strong>in</strong> the human genome sequence. In the pilot phase 1 percent (30 Mb) from<br />
44 selected regions of the human genome has been analysed by ENCODE consortium<br />
researchers (Birney et al., 2007).<br />
GENCODE is a sub-project of ENCODE, which seeks to identify all prote<strong>in</strong>-cod<strong>in</strong>g<br />
genes <strong>in</strong> the ENCODE selected regions. For each prote<strong>in</strong> cod<strong>in</strong>g gene this means the<br />
del<strong>in</strong>eation of a complete mRNA sequence for at least one splice isoform, <strong>and</strong> often for<br />
a number of additional alternative splice forms. The contributions from the BioSapiens<br />
partners are focused on <strong>in</strong>formation from a prote<strong>in</strong> annotation perspective. Special attention<br />
is given to the potential aspect of alternative splic<strong>in</strong>g <strong>and</strong> the putative effect it has<br />
on functional diversification of genes.<br />
In the pilot phase of the Biosapiens project the properties of the cod<strong>in</strong>g sequences<br />
for the 44 regions have been analyzed by the Biosapiens partners separately. The results<br />
from s<strong>in</strong>gle groups were collected <strong>and</strong> the ma<strong>in</strong> f<strong>in</strong>d<strong>in</strong>gs were published (Tress et al., 2007).<br />
Furthermore the entire collection of annotations created by all partners was made available<br />
as supplementary material for the publication.<br />
In the current phase of the BioSapiens project the goal is establish a scale-up of the<br />
annotation approach applied to the pilot ENCODE sequences to cover the 100% of the human<br />
genome, <strong>in</strong>clud<strong>in</strong>g all the isoforms. For the scale-up, the ENCODE Pipel<strong>in</strong>e (EPipe)<br />
was constructed (this Biosapiens deliverable), which is a WWW service that allows researchers<br />
to compare functional annotations for all splice variants of a given gene <strong>in</strong> an<br />
automatic way, or alternatively use it for analysis of mutated sequence variants conta<strong>in</strong><strong>in</strong>g<br />
SNPs. The author of this thesis. This author has been responsible for the development<br />
of the ma<strong>in</strong> parts of the EPipe software as well as for implement<strong>in</strong>g a large part of the<br />
modules (feature predictors). The EPipe projects is an ongo<strong>in</strong>g effort which has <strong>in</strong>volved<br />
a number of people dur<strong>in</strong>g its development.<br />
4.4.1 Collect<strong>in</strong>g Web Services clients <strong>in</strong> EPipe<br />
EPipe uses a number of local <strong>and</strong> remote resources for prote<strong>in</strong> feature prediction. The<br />
ability of EPipe to connect to remote resources via Web Services is <strong>in</strong>corporated with<strong>in</strong><br />
the <strong>in</strong>dividual modules. This put a great deal of flexibility as to which resourses to support<br />
(e.g. BioMoby, SOAP etc). The pipel<strong>in</strong>e is shown <strong>in</strong> figure 4.3.<br />
EPipe itself is offered both as a SOAP web service (http://www.cbs.dtu.dk/ws/<br />
EPipe <strong>and</strong> a traditional web <strong>in</strong>terfece (http://www.cbs.dtu.dk/services/EPipe). A<br />
schematic overview of the workflow <strong>in</strong> EPipe is shown <strong>in</strong> figure 4.4.<br />
4.4.2 Mapp<strong>in</strong>g Pfam annotations to prote<strong>in</strong> structure: mecA<br />
In Staphylococcus aureus the mecA gene encodes a penicill<strong>in</strong>-b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> (PBP2a),<br />
result<strong>in</strong>g <strong>in</strong> Methicill<strong>in</strong> resistance (Ender et al., 2009). The EPipe software can be used to<br />
map a range of different relevant features onto the prote<strong>in</strong> structure, <strong>in</strong> order to visualize<br />
differences between homologs of this prote<strong>in</strong>. In this example however, a s<strong>in</strong>gle MecR1<br />
prote<strong>in</strong> from Staphylococcus aureus stra<strong>in</strong> A5937, GenBank accession no. EEV85461, is<br />
processed. Figure 4.5 shows the structure browser of EPipe which allows the user to<br />
browse the different features that are predicted, by show<strong>in</strong>g the mapp<strong>in</strong>g onto the prote<strong>in</strong><br />
structure. Here, the three Pfam doma<strong>in</strong>s Transpeptidase, MecA N, <strong>and</strong> PBP dimer appear<br />
as significant hits.<br />
151