03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: P<br />

Poster<br />

10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />

P34. FUNCTIONAL SUBGRAPH ENRICHMENTS<br />

FOR NODE SETS IN REGULATORY NETWORKS<br />

Pieter Meysman 1,2* , Yvan Saeys 3,4 , Ehsan Sabaghian 5,6 , Wout Bittremieux 1,2 ,<br />

Yves van de Peer 5,6 , Bart Goethals 1 & Kris Laukens 1,2 .<br />

Advanced Database Research and Modeling (ADReM), University of Antwerp 1 ; Biomedical informatics research center<br />

Antwerpen (biomina) 2 ; VIB Inflammation Research Center 3 ; Department of Respiratory Medicine, Ghent University 4 ;<br />

Department of Plant Biotechnology and Bioinformatics, Ghent University 5 ; Department of Plant Systems Biology,<br />

VIB/Ghent University 6 . * pieter.meysman@uantwerpen.be<br />

We have developed a subgroup discovery algorithm to find subgraphs in a single graph that are associated with a given<br />

set of nodes. The association between a subgraph pattern and a set of vertices is defined by its significant enrichment<br />

based on a Bonferroni-corrected hypergeometric probability value, and can therefore be considered as a network-focused<br />

extension of traditional gene ontology enrichment analysis. We demonstrate the operation of this algorithm by applying it<br />

on two transcriptional regulatory networks and show that we can find relevant functional subgraphs enriched for the<br />

selected nodes.<br />

INTRODUCTION<br />

Frequent subgraph mining (FSM) is a common but<br />

complex problem within the data mining field that has<br />

gained in importance as more graph data has become<br />

available. However traditional FSM finds all frequent<br />

subgraphs within the graph dataset, while often a more<br />

interesting query is to find the subgraphs that are most<br />

associated with a specific set of nodes. Nodes of interest<br />

might be those that are associated with a specific disease,<br />

or those that are differentially expressed in an omics<br />

experiment.<br />

METHODS<br />

To address this issue, we developed a novel subgraph<br />

mining algorithm that can efficiently construct, match and<br />

test candidate subgraphs against the given graph for<br />

enrichment within a specific set of nodes (Meysman et al.<br />

<strong>2015</strong>). To allow the enrichment testing, each candidate<br />

subgraph is built around a ‘source’ node. A subgraph<br />

match where the source node corresponds to a node of<br />

interest is counted as a ‘hit’. If the source node is not a<br />

node of interest, it is counted as a background hit. In this<br />

manner the problem of enrichment can be easily tested<br />

using a hypergeometric test. Furthermore, we show that<br />

this definition of enrichment allows us to drastically prune<br />

the search space that the algorithm must traverse to find all<br />

enriched subgraphs.<br />

An implementation of the algorithm is available at<br />

http://adrem.ua.ac.be/sigsubgraph.<br />

RESULTS & DISCUSSION<br />

The first data set concerned the yeast genes that have<br />

remained in duplicate following the most recent whole<br />

genome duplication. Within the yeast transcriptional<br />

network, we found that these duplicate genes were<br />

enriched for self-regulating motifs (e.g. feedback loops,<br />

self edges, etc.), which matches the duplicated nature of<br />

these genes (Figure 1).<br />

FIGURE 1. Enriched subgraphs for yeast duplicated genes<br />

The second data set concerned mining the subgraphs<br />

associated with the homologs of the PhoR transcription<br />

factor across seven different inferred bacterial regulatory<br />

networks from Colombos expression data (Meysman et al.<br />

2014). These PhoR homologs were found to be<br />

significantly associated with several complex regulatory<br />

motifs.<br />

REFERENCES<br />

Meysman P et al. Discovery of Significantly Enriched<br />

Subgraphs Associated with Selected Vertices in a<br />

Single Graph. Proceedings of the 14th International<br />

Workshop on Data Mining in Bioinformatics (<strong>2015</strong>).<br />

Meysman P et al. COLOMBOS v2. 0: an ever expanding<br />

collection of bacterial expression compendia. Nucleic<br />

acids research 42 (D1), D649-D653 (2014).<br />

78

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!