bbc 2015
BBC2015_booklet
BBC2015_booklet
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />
Abstract ID: O6<br />
Oral presentation<br />
10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />
O6. COMBINING TREE-BASED AND DYNAMICAL SYSTEMS<br />
FOR THE INFERENCE OF GENE REGULATORY NETWORKS<br />
Vân Anh Huynh-Thu 1* & Guido Sanguinetti 2,3 .<br />
GIGA-R & Department of Electrical Engineering and Computer Science, University of Liège 1 ; School of Informatics,<br />
University of Edinburgh 2 ; SynthSys – Systems and Synthetic Biology, University of Edinburgh 3 . * vahuynh@ulg.ac.be<br />
INTRODUCTION<br />
Reconstructing the topology of gene regulatory networks<br />
(GRNs) from time series of gene expression data remains<br />
an important open problem in computational systems<br />
biology. Current approaches can be broadly divided into<br />
model-based and model-free approaches, and face one of<br />
two limitations: model-free methods are scalable but<br />
suffer from a lack of interpretability, and cannot in general<br />
be used for out of sample predictions. On the other hand,<br />
model-based methods focus on identifying a dynamical<br />
model of the system; these are clearly interpretable and<br />
can be used for predictions, however they rely on strong<br />
assumptions and are typically very demanding<br />
computationally. Here, we aim to bridge the gap between<br />
model-based and model-free methods by proposing a<br />
hybrid approach to the GRN inference problem, called<br />
Jump3 (Huynh-Thu & Sanguinetti, <strong>2015</strong>). Our approach<br />
combines formal dynamical modelling with the efficiency<br />
of a nonparametric, tree-based method, allowing the<br />
reconstruction of GRNs of hundreds of genes.<br />
METHODS<br />
Gene expression model. At the heart of the Jump3<br />
framework, we use the on/off model of gene expression<br />
(Ptashne & Gann, 2002), where the rate of transcription of<br />
a gene can vary between two levels depending on the<br />
activity state μ of the promoter of the gene. The expression<br />
x of a gene is modelled through the following stochastic<br />
differential equation:<br />
dx i = (A i μ i (t) + b i – λ i x i )dt + σdω(t),<br />
where subscript i refers to the i-th target gene. Here, the<br />
promoter state μ i (t) is a binary variable (the promoter is<br />
either active or inactive) that depends on the expression<br />
levels of the transcription factors (TFs) that bind to the<br />
promoter. A i , b i and λ i are kinetic parameters, and the term<br />
σdω(t) represents a white noise-driving process with<br />
variance σ 2 .<br />
Network reconstruction with jump trees. Recovering<br />
the regulatory links pointing to gene i amounts to finding<br />
the genes whose expression is predictive of the promoter<br />
state μ i . To achieve this goal, we propose a procedure that<br />
learns, for each target gene i, an ensemble of decision trees<br />
predicting the promoter state μ i at any time t from the<br />
expression levels of the candidate regulators at the same<br />
time t. However, standard tree-based methods cannot be<br />
applied here since the output μ i (t) is a latent variable. We<br />
therefore propose a new decision tree algorithm called<br />
“jump tree”, which splits the observations by maximising<br />
the marginal likelihood of the dynamical on/off model.<br />
The learned tree-based model is then used to derive an<br />
importance score for each candidate regulator, computed<br />
as the sum of the likelihood gains that are obtained at all<br />
the tree nodes where this regulator was selected to split the<br />
observations. The importance of a candidate regulator j is<br />
used as weight for the putative regulatory link of the<br />
network that is directed from gene j to gene i.<br />
RESULTS & DISCUSSION<br />
We evaluated Jump3 on the networks of the DREAM4 In<br />
Silico Network challenge (Prill et al., 2010). For each<br />
network topology, two types of simulated expression data<br />
were used: data simulated using the on/off model (toy<br />
data) and the time series data that was provided in the<br />
context of the DREAM4 challenge. We compared Jump3<br />
to other GRN inference methods: two model-free methods,<br />
which are time-lagged variants of GENIE3 (Huynh-Thu et<br />
al., 2010) and CLR (Faith et al., 2007) respectively; two<br />
model-based methods, namely Inferelator (Greenfield et<br />
al., 2010) and TSNI (Bansal et al., 2006), and G1DBN<br />
(Lèbre, 2009), a method based on dynamic Bayesian<br />
networks. Areas Under the Precision-Recall curves<br />
(AUPRs) obtained for size-100 networks are shown in<br />
Table 1. Jump3 yields the highest AUPR in the case of the<br />
toy data. As expected, its performance decreases when the<br />
networks are inferred from the DREAM4 data, due to the<br />
mismatch between the on/off model and the one used to<br />
simulate the data. However, Jump3 still outperforms the<br />
other methods.<br />
Toy<br />
DREAM4<br />
Jump3 0.272 ± 0.060 0.187 ± 0.058<br />
GENIE3-lag 0.114 ± 0.010 0.176 ± 0.056<br />
CLR-lag 0.088 ± 0.008 0.169 ± 0.047<br />
Inferelator 0.069 ± 0.006 0.144 ± 0.036<br />
TSNI 0.020 ± 0.003 0.042 ± 0.010<br />
G1DBN 0.104 ± 0.024 0.114 ± 0.043<br />
TABLE 1. Comparison of network inference methods (mean AUPR and<br />
standard deviation).<br />
We also applied Jump3 to gene expression data from<br />
murine bone marrow-derived macrophages treated with<br />
interferon gamma (Blanc et al., 2011). Several of the hub<br />
TFs in the predicted network have biologically relevant<br />
annotations. They include interferon genes, one gene<br />
associated with cytomegalovirus infection, and cancerassociated<br />
genes, showing the potential of Jump3 for<br />
biologically meaningful hypothesis generation.<br />
REFERENCES<br />
Bansal M et al. Bioinformatics 22, 815-822 (2006).<br />
Blanc M et al. PLoS Biol 9, e1000598 (2011).<br />
Faith JJ et al. PLoS Biol 5, e8 (2007).<br />
Greenfield A. PLoS ONE 5, e13397 (2010).<br />
Huynh-Thu VA & Sanguinetti G. Bioinformatics 31, 1614-1622 (<strong>2015</strong>).<br />
Huynh-Thu VA et al. PLoS ONE 5, e12776 (2010).<br />
Lèbre S. Stat Appl Genet Mol Biol 8, Article 9 (2009).<br />
Prill RJ et al. PLoS ONE 5, e9202 (2010).<br />
Ptashne M & Gann A. Genes and Signals. Cold Harbor Spring<br />
Laboratory Press (2002).<br />
26