03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: P<br />

PosterBeNeLux Bioinformatics Conference – Antwerp,<br />

December 7-8 <strong>2015</strong><br />

Abstract 10th ID: Benelux 000 Bioinformatics Category: Conference Abstract template<br />

<strong>bbc</strong> <strong>2015</strong><br />

P41. RIGAPOLLO, A HMM-SVM BASED APPROACH TO SEQUENCE<br />

ALIGNMENT<br />

Gabriele Orlando 1,2,3,4 , Wim Vranken 1,2,3 and & Tom Lenaerts 1,4,5 .<br />

1 Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, CP 263 1 ; 2 Structural<br />

Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2 2 ; 3 Structural Biology Research Center, VIB,1050 Brussels,<br />

Belgium 3 ;. 4 Machine Learning group, Université Libre de Bruxelles, Brussels, 1050, Belgium 4 ;. 5 Artificial Intelligence<br />

lab, Vrije Universiteit Brussel, Brussels, 1050, Belgium 5 .<br />

INTRODUCTION<br />

Reliable protein alignments are a central problem for<br />

many bioinformatics tools, such as homology modelling.<br />

Over the years many different algorithms have been<br />

developed and different kinds of information have been<br />

used to align very divergent sequences [1]. Here we<br />

present a pairwise alignment tool, called Rigapollo, based<br />

on pairwise HMM-SVM, which includes backbone<br />

dynamics predictions [2] in the alignment process: recent<br />

work suggests that protein backbone dynamics is often<br />

evolutionary conserved and contains information<br />

orthogonal to the amino acid conservation..<br />

METHODS<br />

Rigapollo uses a pairwise HMM-SVM alignment<br />

approach to infer the optimal alignment between two<br />

proteins, taking into consideration both sequence and<br />

dynamic information. The model (described in Figure 1) is<br />

composed by 3 states: M (match), G1 (gap in the first<br />

sequence) and G2 (gap in the second sequence). The<br />

transition probabilities are defined in the same way as a<br />

standard HMM. This new alignment tool is further<br />

designed in the following manner:<br />

Defining the N-dimensional feature vectors:<br />

Each amino acid in the sequences is described by an N-<br />

dimensional feature vector. That vector can be defined<br />

using any kind of information, ranging from evolutionary<br />

information (i.e. PSSM calculated with HHblits [3])) to<br />

dynamics predictions (using the DynaMine predictor [2]).<br />

While standard pairwise HMMs require the definition of a<br />

finite and discrete alphabet of observable states, our model<br />

works directly using these feature vectors (that can be both<br />

orthonormal or not orthonormal), evaluating the emission<br />

probability with a support vector machine (SVM).<br />

Definition of the emisisonemission probability:<br />

We define the emission probability using a SVM trained<br />

to discriminate matches from mismatches. We define as<br />

matches all the positions in the reference pairwise<br />

alignments that do not contain gaps and we use the<br />

concatenation of the previously defined feature vectors to<br />

describe them. These matches are considered positive hits.<br />

For what concerns the mismatches, we perform the same<br />

procedure, but couple positions that, in the reference<br />

alignment, are shifted a number of amino acids, varying<br />

between 5 and 10. After the training, the predicted<br />

emission probabilities for the M state, given the<br />

concatenation of two feature vectors, will be a function of<br />

the distance from the decision hyperplane of the SVM<br />

(called f(D)). The corresponding emission probabilities for<br />

the states G1 and G2 will be modeled as 1-f(D)<br />

RESULTS & DISCUSSION<br />

For the evaluation of the performances of Rigapollo, we<br />

adopted two publicly available subsets of the Balibase and<br />

SABmark alignmenta datasets, already used to evaluate<br />

other pairwise alignment tools [1]; from the MSAs, allpair<br />

pairwise alignments has been extracted, and all these<br />

that shared a percentage of sequence equal to the median<br />

of the one of the full database has been put in the subset.<br />

The datasets consist respectively in 38 and 123 manually<br />

curated, structure based pairwise alignments and they<br />

share very low sequence identity. For the evaluation of the<br />

performances we performed a 10 folds randomized crossvalidtion.<br />

Rigapollo increases the quality of low sequence<br />

identity pairwise alignment from 5 to 10% respect to the<br />

state of the art methods and it seams appears that the<br />

increase in the performancewse is more marked in very<br />

Figure 1: Structure of the pairwise HMM-SVM model<br />

divergent sequences, such as the onesthose in the<br />

SABmark dataset , where the dynamics information seams<br />

to significantly increase the quality of the alignment. This<br />

is probably due to the fact that dynamics are often well<br />

conserved in functional patterns, also when the sequence<br />

is not preserved [2].<br />

REFERENCES<br />

[1] Do Chuong B.et al. Research in Computational Molecular Biology.<br />

Springer Berlin Heidelberg, 2006<br />

[2] Cilia, Elisa, et al. Nucleic acids research 42.W1 (2014): W264-W270<br />

[3] Remmert, Michael, et al.Nature methods 9.2 (2012): 173-175.<br />

85

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!