bbc 2015
BBC2015_booklet
BBC2015_booklet
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />
Abstract ID: O4<br />
Oral presentation<br />
10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />
O4. LATEBICLUSTERING: EFFICIENT DISCOVERY OF TEMPORAL LOCAL<br />
PATTERNS WITH POTENTIAL DELAYS<br />
Joana P. Gonçalves 1,2* & Sara C. Madeira 3,4 .<br />
Pattern Recognition and Bioinformatics Group, Department of Intelligent Systems, Delft University of Technology 1 ;<br />
Division of Molecular Carcinogenesis, The Netherlands Cancer Institute 2 ; Department of Computer Science and<br />
Engineering, Instituto Superior Técnico, Universidade de Lisboa 3 ; INESC-ID 4 . * research@joanagoncalves.org<br />
Temporal transcriptomes can provide valuable insight into the dynamics of transcriptional response and gene regulation.<br />
In particular, many studies seek to uncover functional biological units by identifying and grouping genes with common<br />
expression patterns. Nevertheless, most analytical tools available for this purpose fall short in their ability to consider<br />
biologically reasonable models and adequately incorporate the temporal dimension. Each biological task is likely to<br />
occur within a time period that does not necessarily span the whole time course of the experiment, and genes involved in<br />
such a task are expected to coordinate only while the task is ongoing. LateBiclustering is an efficient algorithm to<br />
identify this type of coordinated activity, while allowing genes to participate in distinct biological tasks with multiple<br />
partners over time. Additionally, LateBiclustering is able to capture temporal delays suggestive of transcriptional<br />
cascades: one of the hallmarks of gene expression and regulation.<br />
INTRODUCTION<br />
The discovery of patterns in temporal transcriptomes<br />
exposes gene expression dynamics and contributes to<br />
understand the machinery involved in its modulation.<br />
Various analytical tools are employed in this regard.<br />
Differential expression summarizes an entire time course<br />
into one feature, thus lacking detail. Clustering maintains<br />
respects the chronological order, but focuses on global<br />
similarities and tends to identify rather broad patterns,<br />
associated with unspecific functions. Biclustering offers<br />
increased granularity by additionally searching for local<br />
patterns, but allows for arbitrary jumps in time, eventually<br />
leading to patterns that are incoherent from a temporal<br />
perspective.<br />
METHODS<br />
LateBiclustering is an efficient algorithm for the<br />
identification of transcriptional modules, here termed<br />
LateBiclusters. Each LateBicluster is a group of genes<br />
showing a similar expression pattern with potential delays,<br />
within a particular time frame that does not necessarily<br />
span the whole time course of the transciptome.<br />
LateBiclustering only reports maximal LateBiclusters, that<br />
is, those that cannot be extended and are not fully<br />
contained in any other LateBicluster.<br />
LateBiclustering takes as input a gene-time expression<br />
matrix of real values. Each gene expression profile is first<br />
normalized to zero mean and unit standard deviation. A<br />
discretization is further applied to discern variations<br />
between consecutive time points into three levels: downtrend,<br />
no-change and up-trend. Upon discretization each<br />
gene profile can be seen as a string.<br />
<br />
<br />
A generalized suffix tree is built to find common<br />
patterns in the gene profiles. Internal nodes<br />
satisfying certain properties are marked for their<br />
potential to denote LateBiclusters.<br />
When an internal node does not satisfy the basic<br />
conditions for LateBicluster maximality, a<br />
procedure is applied to remove occurrences<br />
leading to non-maximal LateBiclusters. For this<br />
purpose, LateBiclustering uses a bit array<br />
representing the occurrences underlying each<br />
<br />
internal node. During the maximality update<br />
procedure, the bit array of the inspected node is<br />
compared against those of internal children nodes<br />
(right-max) and nodes from which the inspected<br />
node receives suffix links (left-max).<br />
Finally, LateBiclustering comes with different<br />
heuristics to report a single pattern occurrence per<br />
gene in each maximal LateBicluster. A heuristic<br />
is necessary because there may be multiple<br />
occurrences of a pattern in the profile of a given<br />
gene, which is a direct consequence of allowing<br />
the discovery of delayed patterns.<br />
RESULTS & DISCUSSION<br />
LateBiclustering is the first efficient algorithm suitable for<br />
the discovery of biclusters with temporal delays. It runs in<br />
polynomial time, while previous methods yielded<br />
exponential time complexity. LateBiclustering was able to<br />
find planted biclusters in synthetic data. It also identified<br />
biologically relevant LateBiclusters associated with<br />
Saccharomyces cerevisiae’s response to heat stress, and<br />
interesting time-lagged responses.<br />
FIGURE 1. Schematic of the LateBiclustering algorithm.<br />
REFERENCES<br />
Gonçalves JP & Madeira SC. IEEE/ACM Transactions on<br />
Computational Biology and Bioinformatics, 11(5), 801–813<br />
(2014).<br />
24