03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: O16<br />

Oral presentation<br />

10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />

O16. BIOINFORMATICS TOOLS FOR ACCURATE ANALYSIS OF AMPLICON<br />

SEQUENCING DATA FOR BIODIVERSITY ANALYSIS<br />

Mohamed Mysara 1-3 , Yvan Saeys 4,5 , Natalie Leys 1 , Jeroen Raes 2,6 & Pieter Monsieurs 1* .<br />

Unit of Microbiology, Belgian Nuclear Research Centre SCK•CEN, Mol; Belgium 1; Department of Bioscience<br />

Engineering, Vrije Universiteit Brussel VUB, Brussels, Belgium 2 ; Department of Structural Biology, Vlaams Instituut<br />

voor Biotechnologie VIB, Brussels, Belgium 3 ; Data Mining and Modeling Group, VIB Inflammation Research Center,<br />

Ghent, Belgium 4 , Department of RespiratoryMedicine, Ghent University Hospital, Ghent, Belgium 5 , Department of<br />

Microbiology and Immunology, REGA institute, KU Leuven, Belgium 6 . * pmonsieu@sckcen.be<br />

High-throughput sequencing technologies have created a wide range of new applications, also in the field of microbial<br />

ecology. Yet when used in 16S rRNA biodiversity studies, it suffers from two important problems: the presence of PCR<br />

artefacts (called chimera) and sequencing errors resulting from the sequencing sequencing technologies. In this work<br />

three artificial intelligence-based algorithms are proposed, CATCh, NoDe and IPED, to handle these two problems. A<br />

benchmarking study was performed comparing CATCh/NoDe (for 454 pyrosequencing) or CATCh/IPED (for Illumina<br />

MiSeq sequencing) with other state-of-the art tools, showing a clear improvement in chimera detection and reduction of<br />

sequencing errors respectively, and in general leading to more accurate clustering of the sequencing reads in Operational<br />

Taxonomic Units (OTUs). All algorithms are available via http://science.sckcen.be/en/Institutes/EHS/MCB/MIC<br />

/Bioinformatics/.<br />

INTRODUCTION<br />

The revolution in new sequencing technologies has led to<br />

an explosion of possible applications, including new<br />

opportunities for microbial ecological studies via the<br />

usage of 16S rDNA amplicon sequencing. However,<br />

within such studies, all sequencing technologies suffer<br />

from the presence of erroneous sequences, i.e. (i) chimera,<br />

introduced by wrong target amplification in PCR, and (ii)<br />

sequencing errors originating from different factors during<br />

the sequencing process. As such, there is a need for<br />

effective algorithms to remove those erroneous sequences<br />

to be able to accurately assess the microbial diversity.<br />

METHODS<br />

First, a new algorithm called CATCh (Combining<br />

Algorithms to Track Chimeras) was developed by<br />

integrating the output of existing chimera detection tools<br />

into a new more powerful method. Second, NoDe (Noise<br />

Detector) was introduced, an algorithm that identifies and<br />

corrects erroneous positions in 454-pyrosequencing reads.<br />

Third, IPED (Illumina Paired End Denoiser) algorithm<br />

was developed to handle error correction in Illumina<br />

MiSeq sequencing data as the first tool in the field. After<br />

identifying those positions likely to contain an error, those<br />

sequencing reads are subsequently clustered with correct<br />

reads resulting in error-free consensus reads. The three<br />

algorithms were benchmarked with state-of-the-art tools.<br />

RESULTS & DISCUSSION<br />

Via a comparative study with other chimera detection<br />

tools, CATCh was shown to outperform all other tools,<br />

thereby increasing the sensitivity with up to 14% (see<br />

Figure 1).<br />

FIGURE 1. Plot indicating the effect of applying 5% indels (shown on the<br />

left) and 5% mismatches (shown on the right), on the performance of<br />

different chimera detection tools. CATCh was found to outperform other<br />

existing tools.<br />

Similarly, NoDe and IPED were benchmarked against<br />

other denoising algorithms, thereby showing a significant<br />

improvement in reduction of the error rate up to 55% and<br />

75% respectively (see Figure 2). The combined effect of<br />

our algorithms for chimera removal and error correction<br />

also had a positive effect on the clustering of reads in<br />

operational taxonomic units (OTUs), with an almost<br />

perfect correlation between the number of OTUs and the<br />

number of species present in the mock communities.<br />

Indeed, when applying our improved pipeline containing<br />

CATCh and NoDe on a 454 pyrosequencing mock dataset,<br />

our pipeline could reduce the number of OTUs to 28 (i.e.<br />

close 18, the correct number of species). In contrast,<br />

running the straightforward pipeline without our<br />

algorithms included would inflate the number of OTUs to<br />

98. Similarly, when tested on Illumina MiSeq sequencing<br />

data obtained for a mock community, using a pipeline<br />

integrating CATCh and IPED, the number of OTUs<br />

returned was 33 (i.e. close to the real number of 21<br />

species), while 86 OTUs was obtained using the default<br />

mothur pipeline.<br />

REFERENCES<br />

Mysara M., Leys N., Raes J., Monsieurs P.- NoDe: a fast error-correction<br />

algorithm for pyrosequencing amplicon reads.- In: BMC<br />

Bioinformatics, 16:88(<strong>2015</strong>), p. 1-15.- ISSN 1471-2105<br />

Mysara M., Saeys Y., Leys N., Raes J., Monsieurs P.- CATCh, an<br />

Ensemble Classifier for Chimera Detection in 16S rRNA Sequencing<br />

Studies.- In: Applied and Environmental Microbiology, 81:5(<strong>2015</strong>),<br />

p. 1573-1584.- ISSN 0099-2240<br />

36

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!