bbc 2015
BBC2015_booklet
BBC2015_booklet
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />
Abstract ID: P<br />
Poster<br />
10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />
P38. MINING ACROSS “OMICS” DATA FOR DRUG PRIORITIZATION<br />
Stefan Naulaerts 1,2* , Pieter Meysman 1,2 , Bart Goethals 1 , Wim Vanden Berghe ,3 & Kris Laukens 1,2 .<br />
Advanced Database Research and Modeling (ADReM), University of Antwerp 1 ; Biomedical informatics research center<br />
Antwerpen (biomina) 2 ; Department for Biomedical Sciences, University of Antwerp 3 . * stefan.naulaerts@uantwerpen.be<br />
Drug resistance and response have traditionally been investigated by means of case-by-case studies. The process to<br />
profile drug compounds is time and resource intensive. Large scale information on gene expression and protein<br />
abundance, protein interactions, as well as functional and pathways annotations exist nowadays, as well as freely<br />
accessible repositories for drug targets. Also structural evidence of select drug compounds is publicly available. These<br />
data offer an enormous opportunity for data integration and pattern mining efforts across each of these levels. Here, we<br />
apply frequent itemset mining to identify structurally similar compounds, and to detect patterns within the biological<br />
effect profiles of these chemical compound families. Next, we explore how we can link both types of patterns to metainformation<br />
(such as drug interactions) in a bid to identify promising compounds and speed up the drug discovery<br />
process by means of candidate prioritization.<br />
INTRODUCTION<br />
In the last decades, several widely used databases have<br />
emerged. These vary from gene expression data and massspectrometric<br />
protein identifications to resources covering<br />
interaction graphs or functional annotations of proteins<br />
and chemicals.<br />
The presence of these resources offers interesting<br />
opportunities to gain deeper insight in drug mode of action,<br />
as well as help reduce important bottlenecks with regards<br />
to the speed of novel drug discovery or drug repurposing,<br />
by intelligently prioritizing potentially interesting<br />
compounds.<br />
METHODS<br />
To integrate the listed kinds of data, we use pattern mining<br />
methods that are collectively known as “frequent itemset<br />
mining”. This set of techniques uses clever heuristics to<br />
efficiently find items that occur more often together than a<br />
minimal threshold. In this work, we identified several<br />
pattern types based on their source:<br />
<br />
<br />
<br />
Expression itemsets<br />
Metadata itemsets<br />
Graph patterns (protein-protein, protein-drug and<br />
chemical structures)<br />
For subgraph mining, we used GASTON 1 . All other data<br />
sources were analysed with Apriori 2 .<br />
To deal with the extreme numbers of patterns that result<br />
from mining this kind of data, we used a filter which<br />
incorporates several quality measures based on objective<br />
data mining measures properties (e.g. lift), as well as more<br />
biologically inspired methods (e.g. functional coherence in<br />
the Gene Ontology 3 tree).<br />
Simple classification based on the patterns was performed<br />
with CBA 4 .<br />
RESULTS & DISCUSSION<br />
We were able to identify several backbone patterns within<br />
the chemical structures studied and used these to define<br />
“chemical compound families”. Next, we used this<br />
classification as starting point to group experimental<br />
evidence (bio-assays, interactions and metadata). After<br />
applying cut-offs based on the quality measures, all<br />
patterns remaining were significant and made sense<br />
biologically.<br />
Unsurprisingly, structurally similar compound families<br />
show significant pattern overlaps in drug-drug interactions,<br />
gene expression, term co-occurrence and conserved<br />
protein-protein interactions. We found that specific<br />
patterns in the biological profile often correlate with<br />
specific discriminative structural patterns. Moreover, these<br />
collections of structural frequent subgraphs seemed highly<br />
relevant for the mode in which a compound connects to<br />
the “core” proteome. This central proteome performs<br />
essential functions of the cell (e.g. energy metabolism) and<br />
it is known to be conserved across cell types. Structurally<br />
distinct compound families converge much later (if at all)<br />
to the same “core proteins” than more similar chemicals<br />
do. This observation corresponds to currently known<br />
pathway knowledge and tissue biology.<br />
We were further able to associate previously unseen<br />
compounds to chemicals present in the database, based on<br />
the subgraph collection and by extension to the biological<br />
profile patterns. Manual survey of literature indicated that<br />
several compounds not covered by our database have<br />
recently been approved or are in testing as alternative<br />
drugs to the compounds we hypothesized as being<br />
substantially similar.<br />
FIGURE 1. Visualizing the dexamethasone environment. Both predictions<br />
and experimental evidence (drug-target and protein-protein interactions)<br />
are shown.<br />
REFERENCES<br />
1. Nijssen S & Kok J. ENTCS 127, 77-87 (2005).<br />
2. Agrawal R & Srikant R. Proc 20th Int Conf on Very Large Databases<br />
(1994).<br />
3. Ashburner M et al. Nat Genet 25, 25-29 (2000).<br />
4. Liu B et al. KDD (1998).<br />
82