03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: O4<br />

Oral presentation<br />

10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />

O4. LATEBICLUSTERING: EFFICIENT DISCOVERY OF TEMPORAL LOCAL<br />

PATTERNS WITH POTENTIAL DELAYS<br />

Joana P. Gonçalves 1,2* & Sara C. Madeira 3,4 .<br />

Pattern Recognition and Bioinformatics Group, Department of Intelligent Systems, Delft University of Technology 1 ;<br />

Division of Molecular Carcinogenesis, The Netherlands Cancer Institute 2 ; Department of Computer Science and<br />

Engineering, Instituto Superior Técnico, Universidade de Lisboa 3 ; INESC-ID 4 . * research@joanagoncalves.org<br />

Temporal transcriptomes can provide valuable insight into the dynamics of transcriptional response and gene regulation.<br />

In particular, many studies seek to uncover functional biological units by identifying and grouping genes with common<br />

expression patterns. Nevertheless, most analytical tools available for this purpose fall short in their ability to consider<br />

biologically reasonable models and adequately incorporate the temporal dimension. Each biological task is likely to<br />

occur within a time period that does not necessarily span the whole time course of the experiment, and genes involved in<br />

such a task are expected to coordinate only while the task is ongoing. LateBiclustering is an efficient algorithm to<br />

identify this type of coordinated activity, while allowing genes to participate in distinct biological tasks with multiple<br />

partners over time. Additionally, LateBiclustering is able to capture temporal delays suggestive of transcriptional<br />

cascades: one of the hallmarks of gene expression and regulation.<br />

INTRODUCTION<br />

The discovery of patterns in temporal transcriptomes<br />

exposes gene expression dynamics and contributes to<br />

understand the machinery involved in its modulation.<br />

Various analytical tools are employed in this regard.<br />

Differential expression summarizes an entire time course<br />

into one feature, thus lacking detail. Clustering maintains<br />

respects the chronological order, but focuses on global<br />

similarities and tends to identify rather broad patterns,<br />

associated with unspecific functions. Biclustering offers<br />

increased granularity by additionally searching for local<br />

patterns, but allows for arbitrary jumps in time, eventually<br />

leading to patterns that are incoherent from a temporal<br />

perspective.<br />

METHODS<br />

LateBiclustering is an efficient algorithm for the<br />

identification of transcriptional modules, here termed<br />

LateBiclusters. Each LateBicluster is a group of genes<br />

showing a similar expression pattern with potential delays,<br />

within a particular time frame that does not necessarily<br />

span the whole time course of the transciptome.<br />

LateBiclustering only reports maximal LateBiclusters, that<br />

is, those that cannot be extended and are not fully<br />

contained in any other LateBicluster.<br />

LateBiclustering takes as input a gene-time expression<br />

matrix of real values. Each gene expression profile is first<br />

normalized to zero mean and unit standard deviation. A<br />

discretization is further applied to discern variations<br />

between consecutive time points into three levels: downtrend,<br />

no-change and up-trend. Upon discretization each<br />

gene profile can be seen as a string.<br />

<br />

<br />

A generalized suffix tree is built to find common<br />

patterns in the gene profiles. Internal nodes<br />

satisfying certain properties are marked for their<br />

potential to denote LateBiclusters.<br />

When an internal node does not satisfy the basic<br />

conditions for LateBicluster maximality, a<br />

procedure is applied to remove occurrences<br />

leading to non-maximal LateBiclusters. For this<br />

purpose, LateBiclustering uses a bit array<br />

representing the occurrences underlying each<br />

<br />

internal node. During the maximality update<br />

procedure, the bit array of the inspected node is<br />

compared against those of internal children nodes<br />

(right-max) and nodes from which the inspected<br />

node receives suffix links (left-max).<br />

Finally, LateBiclustering comes with different<br />

heuristics to report a single pattern occurrence per<br />

gene in each maximal LateBicluster. A heuristic<br />

is necessary because there may be multiple<br />

occurrences of a pattern in the profile of a given<br />

gene, which is a direct consequence of allowing<br />

the discovery of delayed patterns.<br />

RESULTS & DISCUSSION<br />

LateBiclustering is the first efficient algorithm suitable for<br />

the discovery of biclusters with temporal delays. It runs in<br />

polynomial time, while previous methods yielded<br />

exponential time complexity. LateBiclustering was able to<br />

find planted biclusters in synthetic data. It also identified<br />

biologically relevant LateBiclusters associated with<br />

Saccharomyces cerevisiae’s response to heat stress, and<br />

interesting time-lagged responses.<br />

FIGURE 1. Schematic of the LateBiclustering algorithm.<br />

REFERENCES<br />

Gonçalves JP & Madeira SC. IEEE/ACM Transactions on<br />

Computational Biology and Bioinformatics, 11(5), 801–813<br />

(2014).<br />

24

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!