An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
16 CHAPTER 1. INTRODUCTION<br />
Our application programming framework <strong>lib</strong>shore was implemented as a common foundation<br />
to all SHORE processing modules. In addition to elementary functionality such as automatic<br />
input <strong>and</strong> output decoding <strong>and</strong> encoding, algorithms for indexing sequencing-related data <strong>and</strong><br />
alignment of biological sequences are provided. Furthermore, we developed a generic processing<br />
framework with the aim of facilitating breakdown <strong>and</strong> implementation of rather complex<br />
sequencing data analysis algorithms as collections of manageable <strong>and</strong> self-contained processing<br />
modules. We further extended on this framework for straightforward parallelization of basic<br />
sequencing data processing.<br />
The fundamental data analysis workow introduced with the initial version of SHORE consists<br />
of subsequent application of read ltering, read mapping <strong>and</strong> application-specic analysis<br />
modules. While this basic approach has been maintained, it has been extended in many areas<br />
for simplication of a variety of routine tasks. <strong>An</strong> overview of the modied workow is therefore<br />
given in section 2.1.<br />
Signicant modications to core pipeline modules concern raw sequencing read import <strong>and</strong><br />
ltering as well as read mapping. <strong>Data</strong> import was modied to enable recovery of raw data<br />
or iterative data set re-ltering (section 2.3), <strong>and</strong> complemented with extended functionality to<br />
address e. g. the more <strong>and</strong> more widespread use of a variety of <strong>lib</strong>rary multiplexing <strong>and</strong> further<br />
adapter ligation protocols. For this purpose, exible sequence-specic read partitioning <strong>and</strong><br />
clipping facilities have been added (sections 2.4, 2.5). SHORE's read mapping module initially<br />
constituted a basic parallelization wrapper providing a unied interface to dierent short read<br />
alignment tools. To take advantage of an increasing diversity of read mapping algorithms, features<br />
have been added to enable integration of the results obtained using dierent alignment<br />
algorithms or parameters (section 2.6).<br />
Application-specic data analysis modules initially implemented in SHORE mainly addressed<br />
the assessment of genomic variation. This functionality has largely been maintained, with only<br />
slight modications to take advantage of the <strong>lib</strong>shore infrastructure. A focus of this work however<br />
was on quantitative application of high-throughput sequencing, <strong>and</strong> primarily ChIP-Seq<br />
data analysis. Within the SHORE analysis environment, we have implemented an enrichment<br />
detection module targeted at the analysis of transcription factor immunoprecipitation data (section<br />
2.7). In comparison to other approaches, our method emphasizes robustness towards read<br />
mapping artifacts <strong>and</strong> simplies h<strong>and</strong>ling of replicate experiments. Our ChIP-Seq enrichment<br />
detection module is complemented by congurable auxiliary utilities allowing composition of custom<br />
workows applicable to various expression proling or enrichment detection applications.<br />
In conclusion, with this work the SHORE software has been developed from an analysis pipeline<br />
consisting of a set of largely independent submodules targeted primarily at genomic variation<br />
assessment into a tightly integrated, general-purpose sequencing data analysis suite supporting<br />
a set of coordinated, interdependent features. The SHORE sequencing data analysis suite is open<br />
source software freely available 1 under the GNU General Public License (GPL) version 3.<br />
1 http://shore.sf.net