03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: P<br />

Poster<br />

10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />

P38. MINING ACROSS “OMICS” DATA FOR DRUG PRIORITIZATION<br />

Stefan Naulaerts 1,2* , Pieter Meysman 1,2 , Bart Goethals 1 , Wim Vanden Berghe ,3 & Kris Laukens 1,2 .<br />

Advanced Database Research and Modeling (ADReM), University of Antwerp 1 ; Biomedical informatics research center<br />

Antwerpen (biomina) 2 ; Department for Biomedical Sciences, University of Antwerp 3 . * stefan.naulaerts@uantwerpen.be<br />

Drug resistance and response have traditionally been investigated by means of case-by-case studies. The process to<br />

profile drug compounds is time and resource intensive. Large scale information on gene expression and protein<br />

abundance, protein interactions, as well as functional and pathways annotations exist nowadays, as well as freely<br />

accessible repositories for drug targets. Also structural evidence of select drug compounds is publicly available. These<br />

data offer an enormous opportunity for data integration and pattern mining efforts across each of these levels. Here, we<br />

apply frequent itemset mining to identify structurally similar compounds, and to detect patterns within the biological<br />

effect profiles of these chemical compound families. Next, we explore how we can link both types of patterns to metainformation<br />

(such as drug interactions) in a bid to identify promising compounds and speed up the drug discovery<br />

process by means of candidate prioritization.<br />

INTRODUCTION<br />

In the last decades, several widely used databases have<br />

emerged. These vary from gene expression data and massspectrometric<br />

protein identifications to resources covering<br />

interaction graphs or functional annotations of proteins<br />

and chemicals.<br />

The presence of these resources offers interesting<br />

opportunities to gain deeper insight in drug mode of action,<br />

as well as help reduce important bottlenecks with regards<br />

to the speed of novel drug discovery or drug repurposing,<br />

by intelligently prioritizing potentially interesting<br />

compounds.<br />

METHODS<br />

To integrate the listed kinds of data, we use pattern mining<br />

methods that are collectively known as “frequent itemset<br />

mining”. This set of techniques uses clever heuristics to<br />

efficiently find items that occur more often together than a<br />

minimal threshold. In this work, we identified several<br />

pattern types based on their source:<br />

<br />

<br />

<br />

Expression itemsets<br />

Metadata itemsets<br />

Graph patterns (protein-protein, protein-drug and<br />

chemical structures)<br />

For subgraph mining, we used GASTON 1 . All other data<br />

sources were analysed with Apriori 2 .<br />

To deal with the extreme numbers of patterns that result<br />

from mining this kind of data, we used a filter which<br />

incorporates several quality measures based on objective<br />

data mining measures properties (e.g. lift), as well as more<br />

biologically inspired methods (e.g. functional coherence in<br />

the Gene Ontology 3 tree).<br />

Simple classification based on the patterns was performed<br />

with CBA 4 .<br />

RESULTS & DISCUSSION<br />

We were able to identify several backbone patterns within<br />

the chemical structures studied and used these to define<br />

“chemical compound families”. Next, we used this<br />

classification as starting point to group experimental<br />

evidence (bio-assays, interactions and metadata). After<br />

applying cut-offs based on the quality measures, all<br />

patterns remaining were significant and made sense<br />

biologically.<br />

Unsurprisingly, structurally similar compound families<br />

show significant pattern overlaps in drug-drug interactions,<br />

gene expression, term co-occurrence and conserved<br />

protein-protein interactions. We found that specific<br />

patterns in the biological profile often correlate with<br />

specific discriminative structural patterns. Moreover, these<br />

collections of structural frequent subgraphs seemed highly<br />

relevant for the mode in which a compound connects to<br />

the “core” proteome. This central proteome performs<br />

essential functions of the cell (e.g. energy metabolism) and<br />

it is known to be conserved across cell types. Structurally<br />

distinct compound families converge much later (if at all)<br />

to the same “core proteins” than more similar chemicals<br />

do. This observation corresponds to currently known<br />

pathway knowledge and tissue biology.<br />

We were further able to associate previously unseen<br />

compounds to chemicals present in the database, based on<br />

the subgraph collection and by extension to the biological<br />

profile patterns. Manual survey of literature indicated that<br />

several compounds not covered by our database have<br />

recently been approved or are in testing as alternative<br />

drugs to the compounds we hypothesized as being<br />

substantially similar.<br />

FIGURE 1. Visualizing the dexamethasone environment. Both predictions<br />

and experimental evidence (drug-target and protein-protein interactions)<br />

are shown.<br />

REFERENCES<br />

1. Nijssen S & Kok J. ENTCS 127, 77-87 (2005).<br />

2. Agrawal R & Srikant R. Proc 20th Int Conf on Very Large Databases<br />

(1994).<br />

3. Ashburner M et al. Nat Genet 25, 25-29 (2000).<br />

4. Liu B et al. KDD (1998).<br />

82

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!