03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: O2<br />

10th Benelux Bioinformatics Conference Oral presentation<br />

<strong>bbc</strong> <strong>2015</strong><br />

O2. PREDICTING OLIGOGENIC EFFECTS USING DIGENIC DISEASE DATA<br />

Andrea M. Gazzo 1,2,3* , Dorien Daneels 1,3 , Maryse Bonduelle 3 , Sonia Van Dooren 1,3 , Guillaume Smits 1,4 & Tom<br />

Lenaerts 1,2,5 .<br />

Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium 1 ; MLG, Departement d'Informatique,<br />

Universite Libre de Bruxelles, Brussels, Belgium 2 ; Center for Medical Genetics, Reproduction and Genetics,<br />

Reproduction Genetics and Regenerative Medicine, Vrije Universiteit Brussel, UZ Brussel, Brussel, Belgium 3 ; Genetics,<br />

Hopital Universitaire des Enfants Reine Fabiola, Universite Libre de Bruxelles, Brussels, Belgium 4 ;<br />

Computerwetenschappen, Vrije Universiteit Brussel, Brussel, Belgium 5 . * Andrea.Gazzo@ulb.ac.be<br />

Recent research has shown that disorders may be better described by more complex inheritance mechanisms, advocating<br />

that some of the monogenic disease may in fact be oligogenic. Understanding how the combined interplay and weight of<br />

variants leads to disease may provide improved and novel insights into diseases classically considered being monogenic.<br />

Here we present a unique classification method that separates two types of digenic diseases, i.e. those that requires<br />

variants in both genes to induce the disease and those where one is causative and the second increases the severity. Our<br />

results show that a clear separation can be made between both classes using gene and variant-level features extracted<br />

from DIDA.<br />

INTRODUCTION<br />

DIDA is a novel database that provides for the first time<br />

detailed information on genes and associated genetic<br />

variants involved in digenic diseases, the simplest form of<br />

oligogenic inheritance 1 . The database is accessible via<br />

http://dida.ibsquare.be and currently includes 213 digenic<br />

combinations involved in 44 different digenic diseases 2 .<br />

These combinations are composed of 364 distinct variants,<br />

which are distributed over 136 distinct genes. Creating this<br />

new repository was essential, as current databases do not<br />

allow one to retrieve detailed records regarding digenic<br />

combinations. Genes, variants, diseases and digenic<br />

combinations in DIDA are annotated with manually<br />

curated information and information mined from other<br />

online resources. Each digenic combination was<br />

categorized into one of two effect classes: either ``on/off'',<br />

in which variant combinations in both genes are required<br />

to develop the disease, or ``severity'', where variants in<br />

one gene are enough to develop the disease and carrying<br />

variant combinations in two genes increases the severity or<br />

affects its age of onset. In this work we present a predictor<br />

capable of distinguishing between the digenic effect<br />

classes. We analyse the result of this predictor in relation<br />

to specific features collected for the different digenic<br />

combinations in DIDA, as for instance the<br />

haploinsufficiency of the genes, their zygosity and the<br />

relationship between them, providing insight into the<br />

biological meaning of the result.<br />

METHODS<br />

We used a machine learning approach to determine the<br />

classes, i.e. "severity" or "on/off", of a digenic<br />

combination. Starting with feature selection we chose the<br />

most informative features to classify the digenic<br />

combination in either 2 classes. For each of the two genes<br />

involved in a digenic combination: Zygosity<br />

(Heterozygote, Homozygote, etc.), recessiveness<br />

probability, haploinsufficiency score, known recessive<br />

information, if the gene is essential or not (based on<br />

Mouse knock out experimental data) are used as features<br />

in the predictor. At variant level, we used as features the<br />

pathogenicity predictions from SIFT and Polyphen 2 tools.<br />

Finally, we encode also the relationship between the two<br />

genes, defining the relation "Similar function", "Directly<br />

interacting" and "Pathway membership". After different<br />

tests we decided to use a Random forest algorithm, as this<br />

approach gave the best results.<br />

RESULTS & DISCUSSION<br />

After a 10-fold cross validation we obtained promising<br />

performances, with an MCC of 0,67 and 0,92 as AUROC.<br />

Regretfully, this performance is an overestimation since,<br />

as the gene-based features are the most important, many<br />

examples with mutations mapped on the same gene pair<br />

lead to the same oligogenic effect class. A stratification<br />

that ensures that the same pair of genes are never in both<br />

the training and in the testing set was required. We<br />

manually created 5 subsets, where the instances with the<br />

same gene-pair belong to the same subset. . After this<br />

procedure we assessed again the performances, obtaining<br />

an MCC of 0,36 and as AUROC 0,78. In order to verify<br />

the significance of the performances we retrained the<br />

random forest on a randomization of the data. This<br />

randomization was obtained by shuffling all the features<br />

for each instance but maintaining class unchanged. This<br />

reshuffling resulted in an MCC close to zero and a<br />

AUROC near to 0.5, as expected. This additional test<br />

confirms the significance of the stratified results.<br />

In a next stage we are analysing the relationship between<br />

the oligogenic effect and the features used, particularly in<br />

terms of biological and molecular interpretation. As a<br />

future perspective, the benefit at clinical level is very<br />

promising: one goal of medical genetics is to assign<br />

predictive value to the genotype, in order to it to assist in<br />

diagnosis and disease management. If we can infer, based<br />

on the genotype, what the digenic/oligogenic effect will be,<br />

we can potentially anticipate the treatment.<br />

REFERENCES<br />

[1] Gazzo, A. et al., DIDA: a curated and annotated digenic diseases<br />

database, under review on NAR database issue (2016).<br />

[2] Schäffer, A. A. (2013) Digenic inheritance in medical genetics.<br />

J. Med. Genet., 50, 641–652.<br />

22

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!