03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: P<br />

Poster<br />

10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />

P46. ANALYSIS OF BIAS AND ASYMMETRY IN THE PROTEIN STABILITY<br />

PREDICTION<br />

Fabrizio Pucci 1,* , Katrien Bernaerts 1,2 , Fabian Teheux 1 , Dimitri Gilis 1 & Marianne Rooman 1 .<br />

Department of BioModeling, BioInformatics & BioProcesses 1 , Université Libre de Bruxelles, 1050 Brussels, Belgium;<br />

BioBased Materials, Faculty of Humanities and Sciences 2 , Maastricht University, 6200 Maastricht, The Netherlands.<br />

* fapucci@ulb.ac.be<br />

In many bioinformatics analyses avoiding biases towards the training dataset is one of the most intricate issue. Here we<br />

focus on the specific case of the prediction of protein thermodynamic stability changes upon point mutations (G). In a<br />

first instance we measure the bias towards the destabilizing mutations of some widely used G-prediction algorithms<br />

described in the literature. Then we show how important is the use of the symmetry of the model to avoid biasing. In the<br />

last step we briefly discuss the distribution of the G values for all possible point mutations in a series of proteins with<br />

the aim of understanding whether the distribution is universal and how much it is biased towards the training dataset.<br />

INTRODUCTION<br />

The accurate prediction of the stability changes on a large<br />

scale is still a challenge in protein science. Despite the<br />

large amount of work done in the last years, the results<br />

frequently suffer from hidden biases towards the training<br />

dataset and this makes the evaluation of the real<br />

performances a difficult task.<br />

Here we study the “bias problem” in the case of the<br />

prediction of protein thermodynamic stability changes<br />

upon point mutations and more precisely of its best<br />

descriptor G that is the change of folding free energy<br />

upon mutation from the wild type protein W to the mutant<br />

M. In principle the predicted G value of the inverse<br />

mutation (M to W) has to be exactly equal to minus the<br />

G of the direct mutation (W to M), since the free energy<br />

is a state function.<br />

Unfortunately the asymmetry of the training dataset<br />

towards the destabilizing mutations (reflecting the<br />

evolutionary optimization of protein stability) makes the<br />

prediction of inverse mutations less accurate with respect<br />

to the direct ones. This introduces a series of distortions in<br />

the prediction model that we will analyze here.<br />

METHODS<br />

We computed the G value for a set of almost 200<br />

mutations in which both the structure of the wild type<br />

protein and mutant are known, using a series of prediction<br />

tools, i.e. PoPMuSiC [1], I-Mutant, FoldX, Duet,<br />

AutoMute, CupSat, Eris and ProSMS. We then computed<br />

the Ratio (RID) of the standard deviation between the<br />

predicted and the experimental values of G for the<br />

Inverse mutations to for the Direct mutations (which<br />

should be one in the case of a perfect symmetric<br />

prediction) and compared the results of the different<br />

programs.<br />

If the functional structure of the model is known as in the<br />

case of the artificial neural network of PoPMuSiC, one<br />

can further understand which terms contribute more than<br />

others to deviate the RID from unit and thus propose new<br />

model structures in which the biases are correctly avoided<br />

[2].<br />

In the more blind machine learning approaches (as the<br />

methods based on Random Forest or Support Vector<br />

Machine) in which the functional form is not explicitly<br />

known, the asymmetry correction is less obvious.<br />

In a second part, we investigated how the symmetry of the<br />

G values distribution in the training dataset influences<br />

the prediction of the G distribution for all possible<br />

mutations in a series of proteins with known structures.<br />

RESULTS & DISCUSSION<br />

The estimation of the asymmetry computed for a<br />

series of available prediction methods gives a RID<br />

values between 1 for bias-corrected methods and<br />

about 3 for the most biased programs. From these<br />

results we have shown that the correct use of the<br />

symmetry in setting up the model structure helps to<br />

avoid unwanted biases towards the destabilizing<br />

mutations.<br />

Furthermore the distribution of the G values for all<br />

point mutations in some proteins has been analyzed<br />

and showed a dependence from the G distribution<br />

of the training dataset when the RID deviate<br />

significantly from one. The understanding of the<br />

relation between the two distrubutions is an<br />

important step to comprehend the universality of the<br />

distribution [3] and how much the proteins are<br />

optimized to minimize the impact of single-site<br />

aminoacid substitution.<br />

REFERENCES<br />

[1] Y. Dehouck, Jean Marc Kwasigroch, D. Gilis, M. Rooman (2011),<br />

PopMusic 2.1 : a web server for the estimation of the protein<br />

stability changes upon mutation and sequence optimality. BMC<br />

Bioinformatics. 12, 151<br />

[2] F. Pucci, K. Bernaerts, F. Teheux, D. Gilis, M. Rooman, Symmetry<br />

Principles in Optimization Problems: an application to Protein<br />

Stability Prediction (<strong>2015</strong>), IFAC-PapersOnLine 48-1, 458-463<br />

[3] Tokuriki N, Stricher F, Schymkowitz J, Serrano L, Tawfik DS, The<br />

stability effects of protein mutations appear to be universally<br />

distributed (2007), J Mol Biol, 356, 1318-1332.<br />

90

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!