bbc 2015
BBC2015_booklet
BBC2015_booklet
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />
Abstract ID: P<br />
PosterBeNeLux Bioinformatics Conference – Antwerp,<br />
December 7-8 <strong>2015</strong><br />
Abstract 10th ID: Benelux 000 Bioinformatics Category: Conference Abstract template<br />
<strong>bbc</strong> <strong>2015</strong><br />
P41. RIGAPOLLO, A HMM-SVM BASED APPROACH TO SEQUENCE<br />
ALIGNMENT<br />
Gabriele Orlando 1,2,3,4 , Wim Vranken 1,2,3 and & Tom Lenaerts 1,4,5 .<br />
1 Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, CP 263 1 ; 2 Structural<br />
Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2 2 ; 3 Structural Biology Research Center, VIB,1050 Brussels,<br />
Belgium 3 ;. 4 Machine Learning group, Université Libre de Bruxelles, Brussels, 1050, Belgium 4 ;. 5 Artificial Intelligence<br />
lab, Vrije Universiteit Brussel, Brussels, 1050, Belgium 5 .<br />
INTRODUCTION<br />
Reliable protein alignments are a central problem for<br />
many bioinformatics tools, such as homology modelling.<br />
Over the years many different algorithms have been<br />
developed and different kinds of information have been<br />
used to align very divergent sequences [1]. Here we<br />
present a pairwise alignment tool, called Rigapollo, based<br />
on pairwise HMM-SVM, which includes backbone<br />
dynamics predictions [2] in the alignment process: recent<br />
work suggests that protein backbone dynamics is often<br />
evolutionary conserved and contains information<br />
orthogonal to the amino acid conservation..<br />
METHODS<br />
Rigapollo uses a pairwise HMM-SVM alignment<br />
approach to infer the optimal alignment between two<br />
proteins, taking into consideration both sequence and<br />
dynamic information. The model (described in Figure 1) is<br />
composed by 3 states: M (match), G1 (gap in the first<br />
sequence) and G2 (gap in the second sequence). The<br />
transition probabilities are defined in the same way as a<br />
standard HMM. This new alignment tool is further<br />
designed in the following manner:<br />
Defining the N-dimensional feature vectors:<br />
Each amino acid in the sequences is described by an N-<br />
dimensional feature vector. That vector can be defined<br />
using any kind of information, ranging from evolutionary<br />
information (i.e. PSSM calculated with HHblits [3])) to<br />
dynamics predictions (using the DynaMine predictor [2]).<br />
While standard pairwise HMMs require the definition of a<br />
finite and discrete alphabet of observable states, our model<br />
works directly using these feature vectors (that can be both<br />
orthonormal or not orthonormal), evaluating the emission<br />
probability with a support vector machine (SVM).<br />
Definition of the emisisonemission probability:<br />
We define the emission probability using a SVM trained<br />
to discriminate matches from mismatches. We define as<br />
matches all the positions in the reference pairwise<br />
alignments that do not contain gaps and we use the<br />
concatenation of the previously defined feature vectors to<br />
describe them. These matches are considered positive hits.<br />
For what concerns the mismatches, we perform the same<br />
procedure, but couple positions that, in the reference<br />
alignment, are shifted a number of amino acids, varying<br />
between 5 and 10. After the training, the predicted<br />
emission probabilities for the M state, given the<br />
concatenation of two feature vectors, will be a function of<br />
the distance from the decision hyperplane of the SVM<br />
(called f(D)). The corresponding emission probabilities for<br />
the states G1 and G2 will be modeled as 1-f(D)<br />
RESULTS & DISCUSSION<br />
For the evaluation of the performances of Rigapollo, we<br />
adopted two publicly available subsets of the Balibase and<br />
SABmark alignmenta datasets, already used to evaluate<br />
other pairwise alignment tools [1]; from the MSAs, allpair<br />
pairwise alignments has been extracted, and all these<br />
that shared a percentage of sequence equal to the median<br />
of the one of the full database has been put in the subset.<br />
The datasets consist respectively in 38 and 123 manually<br />
curated, structure based pairwise alignments and they<br />
share very low sequence identity. For the evaluation of the<br />
performances we performed a 10 folds randomized crossvalidtion.<br />
Rigapollo increases the quality of low sequence<br />
identity pairwise alignment from 5 to 10% respect to the<br />
state of the art methods and it seams appears that the<br />
increase in the performancewse is more marked in very<br />
Figure 1: Structure of the pairwise HMM-SVM model<br />
divergent sequences, such as the onesthose in the<br />
SABmark dataset , where the dynamics information seams<br />
to significantly increase the quality of the alignment. This<br />
is probably due to the fact that dynamics are often well<br />
conserved in functional patterns, also when the sequence<br />
is not preserved [2].<br />
REFERENCES<br />
[1] Do Chuong B.et al. Research in Computational Molecular Biology.<br />
Springer Berlin Heidelberg, 2006<br />
[2] Cilia, Elisa, et al. Nucleic acids research 42.W1 (2014): W264-W270<br />
[3] Remmert, Michael, et al.Nature methods 9.2 (2012): 173-175.<br />
85