29.07.2013 Views

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Web Services <strong>and</strong> <strong>Interoperability</strong> <strong>in</strong> Genomics<br />

4.4 ENCODE pipel<strong>in</strong>e: apply<strong>in</strong>g Web Services<br />

ENCODE (the Encyclopedia Of DNA Elements) was launched <strong>in</strong> September 2003 by<br />

the National Human Genome Research Institute. The goal was to identify all functional<br />

elements <strong>in</strong> the human genome sequence. In the pilot phase 1 percent (30 Mb) from<br />

44 selected regions of the human genome has been analysed by ENCODE consortium<br />

researchers (Birney et al., 2007).<br />

GENCODE is a sub-project of ENCODE, which seeks to identify all prote<strong>in</strong>-cod<strong>in</strong>g<br />

genes <strong>in</strong> the ENCODE selected regions. For each prote<strong>in</strong> cod<strong>in</strong>g gene this means the<br />

del<strong>in</strong>eation of a complete mRNA sequence for at least one splice isoform, <strong>and</strong> often for<br />

a number of additional alternative splice forms. The contributions from the BioSapiens<br />

partners are focused on <strong>in</strong>formation from a prote<strong>in</strong> annotation perspective. Special attention<br />

is given to the potential aspect of alternative splic<strong>in</strong>g <strong>and</strong> the putative effect it has<br />

on functional diversification of genes.<br />

In the pilot phase of the Biosapiens project the properties of the cod<strong>in</strong>g sequences<br />

for the 44 regions have been analyzed by the Biosapiens partners separately. The results<br />

from s<strong>in</strong>gle groups were collected <strong>and</strong> the ma<strong>in</strong> f<strong>in</strong>d<strong>in</strong>gs were published (Tress et al., 2007).<br />

Furthermore the entire collection of annotations created by all partners was made available<br />

as supplementary material for the publication.<br />

In the current phase of the BioSapiens project the goal is establish a scale-up of the<br />

annotation approach applied to the pilot ENCODE sequences to cover the 100% of the human<br />

genome, <strong>in</strong>clud<strong>in</strong>g all the isoforms. For the scale-up, the ENCODE Pipel<strong>in</strong>e (EPipe)<br />

was constructed (this Biosapiens deliverable), which is a WWW service that allows researchers<br />

to compare functional annotations for all splice variants of a given gene <strong>in</strong> an<br />

automatic way, or alternatively use it for analysis of mutated sequence variants conta<strong>in</strong><strong>in</strong>g<br />

SNPs. The author of this thesis. This author has been responsible for the development<br />

of the ma<strong>in</strong> parts of the EPipe software as well as for implement<strong>in</strong>g a large part of the<br />

modules (feature predictors). The EPipe projects is an ongo<strong>in</strong>g effort which has <strong>in</strong>volved<br />

a number of people dur<strong>in</strong>g its development.<br />

4.4.1 Collect<strong>in</strong>g Web Services clients <strong>in</strong> EPipe<br />

EPipe uses a number of local <strong>and</strong> remote resources for prote<strong>in</strong> feature prediction. The<br />

ability of EPipe to connect to remote resources via Web Services is <strong>in</strong>corporated with<strong>in</strong><br />

the <strong>in</strong>dividual modules. This put a great deal of flexibility as to which resourses to support<br />

(e.g. BioMoby, SOAP etc). The pipel<strong>in</strong>e is shown <strong>in</strong> figure 4.3.<br />

EPipe itself is offered both as a SOAP web service (http://www.cbs.dtu.dk/ws/<br />

EPipe <strong>and</strong> a traditional web <strong>in</strong>terfece (http://www.cbs.dtu.dk/services/EPipe). A<br />

schematic overview of the workflow <strong>in</strong> EPipe is shown <strong>in</strong> figure 4.4.<br />

4.4.2 Mapp<strong>in</strong>g Pfam annotations to prote<strong>in</strong> structure: mecA<br />

In Staphylococcus aureus the mecA gene encodes a penicill<strong>in</strong>-b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> (PBP2a),<br />

result<strong>in</strong>g <strong>in</strong> Methicill<strong>in</strong> resistance (Ender et al., 2009). The EPipe software can be used to<br />

map a range of different relevant features onto the prote<strong>in</strong> structure, <strong>in</strong> order to visualize<br />

differences between homologs of this prote<strong>in</strong>. In this example however, a s<strong>in</strong>gle MecR1<br />

prote<strong>in</strong> from Staphylococcus aureus stra<strong>in</strong> A5937, GenBank accession no. EEV85461, is<br />

processed. Figure 4.5 shows the structure browser of EPipe which allows the user to<br />

browse the different features that are predicted, by show<strong>in</strong>g the mapp<strong>in</strong>g onto the prote<strong>in</strong><br />

structure. Here, the three Pfam doma<strong>in</strong>s Transpeptidase, MecA N, <strong>and</strong> PBP dimer appear<br />

as significant hits.<br />

151

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!