bbc 2015
BBC2015_booklet
BBC2015_booklet
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />
Abstract ID: O2<br />
10th Benelux Bioinformatics Conference Oral presentation<br />
<strong>bbc</strong> <strong>2015</strong><br />
O2. PREDICTING OLIGOGENIC EFFECTS USING DIGENIC DISEASE DATA<br />
Andrea M. Gazzo 1,2,3* , Dorien Daneels 1,3 , Maryse Bonduelle 3 , Sonia Van Dooren 1,3 , Guillaume Smits 1,4 & Tom<br />
Lenaerts 1,2,5 .<br />
Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium 1 ; MLG, Departement d'Informatique,<br />
Universite Libre de Bruxelles, Brussels, Belgium 2 ; Center for Medical Genetics, Reproduction and Genetics,<br />
Reproduction Genetics and Regenerative Medicine, Vrije Universiteit Brussel, UZ Brussel, Brussel, Belgium 3 ; Genetics,<br />
Hopital Universitaire des Enfants Reine Fabiola, Universite Libre de Bruxelles, Brussels, Belgium 4 ;<br />
Computerwetenschappen, Vrije Universiteit Brussel, Brussel, Belgium 5 . * Andrea.Gazzo@ulb.ac.be<br />
Recent research has shown that disorders may be better described by more complex inheritance mechanisms, advocating<br />
that some of the monogenic disease may in fact be oligogenic. Understanding how the combined interplay and weight of<br />
variants leads to disease may provide improved and novel insights into diseases classically considered being monogenic.<br />
Here we present a unique classification method that separates two types of digenic diseases, i.e. those that requires<br />
variants in both genes to induce the disease and those where one is causative and the second increases the severity. Our<br />
results show that a clear separation can be made between both classes using gene and variant-level features extracted<br />
from DIDA.<br />
INTRODUCTION<br />
DIDA is a novel database that provides for the first time<br />
detailed information on genes and associated genetic<br />
variants involved in digenic diseases, the simplest form of<br />
oligogenic inheritance 1 . The database is accessible via<br />
http://dida.ibsquare.be and currently includes 213 digenic<br />
combinations involved in 44 different digenic diseases 2 .<br />
These combinations are composed of 364 distinct variants,<br />
which are distributed over 136 distinct genes. Creating this<br />
new repository was essential, as current databases do not<br />
allow one to retrieve detailed records regarding digenic<br />
combinations. Genes, variants, diseases and digenic<br />
combinations in DIDA are annotated with manually<br />
curated information and information mined from other<br />
online resources. Each digenic combination was<br />
categorized into one of two effect classes: either ``on/off'',<br />
in which variant combinations in both genes are required<br />
to develop the disease, or ``severity'', where variants in<br />
one gene are enough to develop the disease and carrying<br />
variant combinations in two genes increases the severity or<br />
affects its age of onset. In this work we present a predictor<br />
capable of distinguishing between the digenic effect<br />
classes. We analyse the result of this predictor in relation<br />
to specific features collected for the different digenic<br />
combinations in DIDA, as for instance the<br />
haploinsufficiency of the genes, their zygosity and the<br />
relationship between them, providing insight into the<br />
biological meaning of the result.<br />
METHODS<br />
We used a machine learning approach to determine the<br />
classes, i.e. "severity" or "on/off", of a digenic<br />
combination. Starting with feature selection we chose the<br />
most informative features to classify the digenic<br />
combination in either 2 classes. For each of the two genes<br />
involved in a digenic combination: Zygosity<br />
(Heterozygote, Homozygote, etc.), recessiveness<br />
probability, haploinsufficiency score, known recessive<br />
information, if the gene is essential or not (based on<br />
Mouse knock out experimental data) are used as features<br />
in the predictor. At variant level, we used as features the<br />
pathogenicity predictions from SIFT and Polyphen 2 tools.<br />
Finally, we encode also the relationship between the two<br />
genes, defining the relation "Similar function", "Directly<br />
interacting" and "Pathway membership". After different<br />
tests we decided to use a Random forest algorithm, as this<br />
approach gave the best results.<br />
RESULTS & DISCUSSION<br />
After a 10-fold cross validation we obtained promising<br />
performances, with an MCC of 0,67 and 0,92 as AUROC.<br />
Regretfully, this performance is an overestimation since,<br />
as the gene-based features are the most important, many<br />
examples with mutations mapped on the same gene pair<br />
lead to the same oligogenic effect class. A stratification<br />
that ensures that the same pair of genes are never in both<br />
the training and in the testing set was required. We<br />
manually created 5 subsets, where the instances with the<br />
same gene-pair belong to the same subset. . After this<br />
procedure we assessed again the performances, obtaining<br />
an MCC of 0,36 and as AUROC 0,78. In order to verify<br />
the significance of the performances we retrained the<br />
random forest on a randomization of the data. This<br />
randomization was obtained by shuffling all the features<br />
for each instance but maintaining class unchanged. This<br />
reshuffling resulted in an MCC close to zero and a<br />
AUROC near to 0.5, as expected. This additional test<br />
confirms the significance of the stratified results.<br />
In a next stage we are analysing the relationship between<br />
the oligogenic effect and the features used, particularly in<br />
terms of biological and molecular interpretation. As a<br />
future perspective, the benefit at clinical level is very<br />
promising: one goal of medical genetics is to assign<br />
predictive value to the genotype, in order to it to assist in<br />
diagnosis and disease management. If we can infer, based<br />
on the genotype, what the digenic/oligogenic effect will be,<br />
we can potentially anticipate the treatment.<br />
REFERENCES<br />
[1] Gazzo, A. et al., DIDA: a curated and annotated digenic diseases<br />
database, under review on NAR database issue (2016).<br />
[2] Schäffer, A. A. (2013) Digenic inheritance in medical genetics.<br />
J. Med. Genet., 50, 641–652.<br />
22