An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
64 CHAPTER 3. A C++ FRAMEWORK FOR HIGH-THROUGHPUT DNA SEQUENCING<br />
Supported procedures include false discovery rate (FDR) calculation following Benjamini <strong>and</strong><br />
Hochberg [143] as well as Benjamini <strong>and</strong> Yekutieli [147] <strong>and</strong> further family wise error rate<br />
(FWER) control via Bonferroni [148], Hochberg [149], Holm [150] as well as Sidak [151] correction<br />
algorithms.<br />
3.2 A Modular Signal-Slot Processing Framework<br />
3.2.1 Reader <strong>and</strong> Writer Concepts<br />
The requirement to support large numbers of dierent input le formats emphasizes the advantage<br />
of consistent interfaces to provide access to the data. The <strong>lib</strong>shore framework therefore<br />
denes the concept of a Reader as a class providing three methods named has_data, current<br />
<strong>and</strong> next which allow linear iteration over the data set's elements.<br />
The has_data method returns a boolean value indicating whether further data set elements<br />
are available to the respective Reader object. The method current provides access to the element<br />
of the data set at the current position of the iteration, <strong>and</strong> next indicates the current element<br />
has been processed <strong>and</strong> may be discarded. Initialization of the following element for retrieval by<br />
current may be h<strong>and</strong>led by either of the next or has_data methods.<br />
Conversely, we dene the concept of a Writer as a class supporting methods append <strong>and</strong><br />
flush. The method append is used to pass the object a data set element for processing, whereas<br />
flush serves as a notication that no more data are available for processing, triggering nalizing<br />
actions such as addition of footer sequences for compressed data sets.<br />
Intermediary processing <strong>and</strong> ltering modules can thus be realized as classes conforming to<br />
both the Writer <strong>and</strong> Reader concepts. Method flush in this case adopts the role of nishing data<br />
processing <strong>and</strong> propagating all remaining data to the object's Reader interface. For example, in a<br />
sliding window analysis output of data will be delayed until enough input data become available<br />
to cover the entire width of the window. On receiving a ush request, processing of remaining<br />
elements must however be completed without waiting for further data associated with the sliding<br />
window.<br />
3.2.2 Denition of Processing Network Topology<br />
With the Reader <strong>and</strong> Writer concepts, propagation of data through a network of processing<br />
modules might be accomplished through a series of nested processing loops (listing 3.1). The<br />
example illustrates propagation of data read from a single source of input through two cascaded<br />
ltering operations. Processing results are to be written to the application's st<strong>and</strong>ard output<br />
stream, whereas intermediate results following the rst step of ltering are additionally recorded<br />
as a separate output le. The rst set of loops of lines 827 constitute the actual propagation of<br />
input data. Read mapping data provided by the reader object are appended to the rst ltering<br />
module (line 10). As a result, the lter object may or may not produce an arbitrary amount of<br />
data (e. g. due to sliding window based removal of PCR duplicated sequences), which are passed<br />
to the intermediate le writer as well as the second lter object (lines 1224). <strong>Data</strong> produced by<br />
the second lter object must be transmitted to the nal output accordingly through a further<br />
nested loop (lines 1721). Additional sets of nested loops correspond to propagation of remaining<br />
data following appropriately stacked ush requests to each of the modules (lines 3056).<br />
The demonstrated mode of explicit data propagation is verbose <strong>and</strong> error prone. Our programming<br />
framework therefore implements a set of class templates providing a signal-slot pipeline<br />
architecture to allow more concise <strong>and</strong> comprehensible representations. A signal constitutes a<br />
function object that may be connected to an arbitrary number of function objects supporting