bbc 2015
BBC2015_booklet
BBC2015_booklet
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />
Abstract ID: O16<br />
Oral presentation<br />
10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />
O16. BIOINFORMATICS TOOLS FOR ACCURATE ANALYSIS OF AMPLICON<br />
SEQUENCING DATA FOR BIODIVERSITY ANALYSIS<br />
Mohamed Mysara 1-3 , Yvan Saeys 4,5 , Natalie Leys 1 , Jeroen Raes 2,6 & Pieter Monsieurs 1* .<br />
Unit of Microbiology, Belgian Nuclear Research Centre SCK•CEN, Mol; Belgium 1; Department of Bioscience<br />
Engineering, Vrije Universiteit Brussel VUB, Brussels, Belgium 2 ; Department of Structural Biology, Vlaams Instituut<br />
voor Biotechnologie VIB, Brussels, Belgium 3 ; Data Mining and Modeling Group, VIB Inflammation Research Center,<br />
Ghent, Belgium 4 , Department of RespiratoryMedicine, Ghent University Hospital, Ghent, Belgium 5 , Department of<br />
Microbiology and Immunology, REGA institute, KU Leuven, Belgium 6 . * pmonsieu@sckcen.be<br />
High-throughput sequencing technologies have created a wide range of new applications, also in the field of microbial<br />
ecology. Yet when used in 16S rRNA biodiversity studies, it suffers from two important problems: the presence of PCR<br />
artefacts (called chimera) and sequencing errors resulting from the sequencing sequencing technologies. In this work<br />
three artificial intelligence-based algorithms are proposed, CATCh, NoDe and IPED, to handle these two problems. A<br />
benchmarking study was performed comparing CATCh/NoDe (for 454 pyrosequencing) or CATCh/IPED (for Illumina<br />
MiSeq sequencing) with other state-of-the art tools, showing a clear improvement in chimera detection and reduction of<br />
sequencing errors respectively, and in general leading to more accurate clustering of the sequencing reads in Operational<br />
Taxonomic Units (OTUs). All algorithms are available via http://science.sckcen.be/en/Institutes/EHS/MCB/MIC<br />
/Bioinformatics/.<br />
INTRODUCTION<br />
The revolution in new sequencing technologies has led to<br />
an explosion of possible applications, including new<br />
opportunities for microbial ecological studies via the<br />
usage of 16S rDNA amplicon sequencing. However,<br />
within such studies, all sequencing technologies suffer<br />
from the presence of erroneous sequences, i.e. (i) chimera,<br />
introduced by wrong target amplification in PCR, and (ii)<br />
sequencing errors originating from different factors during<br />
the sequencing process. As such, there is a need for<br />
effective algorithms to remove those erroneous sequences<br />
to be able to accurately assess the microbial diversity.<br />
METHODS<br />
First, a new algorithm called CATCh (Combining<br />
Algorithms to Track Chimeras) was developed by<br />
integrating the output of existing chimera detection tools<br />
into a new more powerful method. Second, NoDe (Noise<br />
Detector) was introduced, an algorithm that identifies and<br />
corrects erroneous positions in 454-pyrosequencing reads.<br />
Third, IPED (Illumina Paired End Denoiser) algorithm<br />
was developed to handle error correction in Illumina<br />
MiSeq sequencing data as the first tool in the field. After<br />
identifying those positions likely to contain an error, those<br />
sequencing reads are subsequently clustered with correct<br />
reads resulting in error-free consensus reads. The three<br />
algorithms were benchmarked with state-of-the-art tools.<br />
RESULTS & DISCUSSION<br />
Via a comparative study with other chimera detection<br />
tools, CATCh was shown to outperform all other tools,<br />
thereby increasing the sensitivity with up to 14% (see<br />
Figure 1).<br />
FIGURE 1. Plot indicating the effect of applying 5% indels (shown on the<br />
left) and 5% mismatches (shown on the right), on the performance of<br />
different chimera detection tools. CATCh was found to outperform other<br />
existing tools.<br />
Similarly, NoDe and IPED were benchmarked against<br />
other denoising algorithms, thereby showing a significant<br />
improvement in reduction of the error rate up to 55% and<br />
75% respectively (see Figure 2). The combined effect of<br />
our algorithms for chimera removal and error correction<br />
also had a positive effect on the clustering of reads in<br />
operational taxonomic units (OTUs), with an almost<br />
perfect correlation between the number of OTUs and the<br />
number of species present in the mock communities.<br />
Indeed, when applying our improved pipeline containing<br />
CATCh and NoDe on a 454 pyrosequencing mock dataset,<br />
our pipeline could reduce the number of OTUs to 28 (i.e.<br />
close 18, the correct number of species). In contrast,<br />
running the straightforward pipeline without our<br />
algorithms included would inflate the number of OTUs to<br />
98. Similarly, when tested on Illumina MiSeq sequencing<br />
data obtained for a mock community, using a pipeline<br />
integrating CATCh and IPED, the number of OTUs<br />
returned was 33 (i.e. close to the real number of 21<br />
species), while 86 OTUs was obtained using the default<br />
mothur pipeline.<br />
REFERENCES<br />
Mysara M., Leys N., Raes J., Monsieurs P.- NoDe: a fast error-correction<br />
algorithm for pyrosequencing amplicon reads.- In: BMC<br />
Bioinformatics, 16:88(<strong>2015</strong>), p. 1-15.- ISSN 1471-2105<br />
Mysara M., Saeys Y., Leys N., Raes J., Monsieurs P.- CATCh, an<br />
Ensemble Classifier for Chimera Detection in 16S rRNA Sequencing<br />
Studies.- In: Applied and Environmental Microbiology, 81:5(<strong>2015</strong>),<br />
p. 1573-1584.- ISSN 0099-2240<br />
36