28.02.2014 Views

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

64 CHAPTER 3. A C++ FRAMEWORK FOR HIGH-THROUGHPUT DNA SEQUENCING<br />

Supported procedures include false discovery rate (FDR) calculation following Benjamini <strong>and</strong><br />

Hochberg [143] as well as Benjamini <strong>and</strong> Yekutieli [147] <strong>and</strong> further family wise error rate<br />

(FWER) control via Bonferroni [148], Hochberg [149], Holm [150] as well as Sidak [151] correction<br />

algorithms.<br />

3.2 A Modular Signal-Slot Processing Framework<br />

3.2.1 Reader <strong>and</strong> Writer Concepts<br />

The requirement to support large numbers of dierent input le formats emphasizes the advantage<br />

of consistent interfaces to provide access to the data. The <strong>lib</strong>shore framework therefore<br />

denes the concept of a Reader as a class providing three methods named has_data, current<br />

<strong>and</strong> next which allow linear iteration over the data set's elements.<br />

The has_data method returns a boolean value indicating whether further data set elements<br />

are available to the respective Reader object. The method current provides access to the element<br />

of the data set at the current position of the iteration, <strong>and</strong> next indicates the current element<br />

has been processed <strong>and</strong> may be discarded. Initialization of the following element for retrieval by<br />

current may be h<strong>and</strong>led by either of the next or has_data methods.<br />

Conversely, we dene the concept of a Writer as a class supporting methods append <strong>and</strong><br />

flush. The method append is used to pass the object a data set element for processing, whereas<br />

flush serves as a notication that no more data are available for processing, triggering nalizing<br />

actions such as addition of footer sequences for compressed data sets.<br />

Intermediary processing <strong>and</strong> ltering modules can thus be realized as classes conforming to<br />

both the Writer <strong>and</strong> Reader concepts. Method flush in this case adopts the role of nishing data<br />

processing <strong>and</strong> propagating all remaining data to the object's Reader interface. For example, in a<br />

sliding window analysis output of data will be delayed until enough input data become available<br />

to cover the entire width of the window. On receiving a ush request, processing of remaining<br />

elements must however be completed without waiting for further data associated with the sliding<br />

window.<br />

3.2.2 Denition of Processing Network Topology<br />

With the Reader <strong>and</strong> Writer concepts, propagation of data through a network of processing<br />

modules might be accomplished through a series of nested processing loops (listing 3.1). The<br />

example illustrates propagation of data read from a single source of input through two cascaded<br />

ltering operations. Processing results are to be written to the application's st<strong>and</strong>ard output<br />

stream, whereas intermediate results following the rst step of ltering are additionally recorded<br />

as a separate output le. The rst set of loops of lines 827 constitute the actual propagation of<br />

input data. Read mapping data provided by the reader object are appended to the rst ltering<br />

module (line 10). As a result, the lter object may or may not produce an arbitrary amount of<br />

data (e. g. due to sliding window based removal of PCR duplicated sequences), which are passed<br />

to the intermediate le writer as well as the second lter object (lines 1224). <strong>Data</strong> produced by<br />

the second lter object must be transmitted to the nal output accordingly through a further<br />

nested loop (lines 1721). Additional sets of nested loops correspond to propagation of remaining<br />

data following appropriately stacked ush requests to each of the modules (lines 3056).<br />

The demonstrated mode of explicit data propagation is verbose <strong>and</strong> error prone. Our programming<br />

framework therefore implements a set of class templates providing a signal-slot pipeline<br />

architecture to allow more concise <strong>and</strong> comprehensible representations. A signal constitutes a<br />

function object that may be connected to an arbitrary number of function objects supporting

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!