bbc 2015
BBC2015_booklet
BBC2015_booklet
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />
Abstract ID: P<br />
Poster<br />
10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />
P14. NOVOPLASTY: IN SILICO ASSEMBLY OF PLASTID GENOMES FROM<br />
WHOLE GENOME NGS DATA<br />
Nicolas Dierckxsens 1,2* , Olivier Hardy 2 , Ludwig Triest 3 , Patrick Mardulyn 2 & Guillaume Smits 1,4 .<br />
Interuniversity Institute of Bioinformatics Brussels (IB2), ULB-VUB, Triomflaan CP 263, 1050 Brussels, Belgium 1 ;<br />
Evolutionary Biology and Ecology Unit, CP 160/12, Faculté des Sciences, Université Libre de Bruxelles, Av. F. D.<br />
Roosevelt 50, B-1050 Brussels, Belgium 2 ; Plant Biology and Nature Management, Vrije Universiteit Brussel, Brussels,<br />
Belgium 3 ; Department of Paediatrics, Hôpital Universitaire des Enfants Reine Fabiola (HUDERF), Université Libre de<br />
Bruxelles (ULB), Brussels, Belgium 4 . * nicolasdierckxsens@hotmail.com<br />
Thanks to the evolution in next-generation sequencer (NGS) technology, whole genome data can be readily obtained<br />
from a variety of samples. There are many algorithms available to assemble these reads, but few of them focus on<br />
assembling the plastid genomes. Therefore we developed a new algorithm that solely assembles the plastid genomes<br />
from whole genome data, starting from a single seed. The algorithm is capable of utilizing the full advantage of very high<br />
coverage, which makes it even capable of assembling through problematic regions (AT-rich). The algorithm has been<br />
tested on several whole genome Illumina datasets and it outperformed other assemblers in runtime and specificity. Every<br />
assembly resulted in a single contig for any chloroplast or mitochondrial genome and this always within a timeframe of<br />
30 minutes.<br />
INTRODUCTION<br />
Chloroplasts and mitochondria are both responsible for<br />
generating metabolic energy within eukaryotic cells. Both<br />
plastids are maternally inherited and have a persistent gene<br />
organization, what makes them ideal for phylogenetic<br />
studies or as a barcode in plant and food identification<br />
(Brozynska et al., 2014). But assembling these plastids<br />
genomes is not always that straightforward with the<br />
currently available tools. Therefore we developed a new<br />
algorithm, specifically for the assembly of plastid<br />
genomes from whole genome data.<br />
METHODS<br />
The algorithm is written in Perl. All assemblies were<br />
executed on Intel Xeon CPU machine containing 24 cores<br />
of 2.93 GHz with a total of 96,8 GB of RAM. All nonhuman<br />
samples were sequenced on the Illumina HiSeq<br />
platform (101 bp paired-end reads). The human<br />
mitochondria samples (PCR-free) were sequenced on the<br />
Illumina HiSeqX platform (150 bp paired-end reads). The<br />
Gonioctena intermedia sample was also sequenced on the<br />
PacBio platform.<br />
RESULTS & DISCUSSION<br />
Algorithm. The algorithm is similar to string overlap<br />
algorithms like SSAKE (Warren et al., 2007) and VCAKE<br />
(Jeck et al., 2007). It starts with reading the sequences into<br />
a hash table, which facilitates a quick accessibility. The<br />
assembly has to be initiated by a seed that will be<br />
extended bidirectionally in iterations. The seed input is<br />
quite flexible, it can be one sequence read, a conserved<br />
gene or even a complete mitochondrial genome from a<br />
distant species. Every base extension is determined by a<br />
consensus between the overlapping reads. Unlike most<br />
assemblers, NOVOPlasty doesn’t try to assemble every<br />
read, but will extend the given seed until the circular<br />
plastid is formed.<br />
Assemblies. NOVOPlasty has currently been tested for the<br />
assembly of 8 chloroplasts and 6 mitochondria. Since<br />
chloroplasts contain an inverted repeat, two versions of the<br />
assembly are generated. The differ only in the orientation<br />
of the region between the two repeats; the correct one will<br />
have to be resolved manually. Besides the mitochondrion<br />
of the leaf beetle Gonioctena intermedia, all assemblies<br />
resulted in a complete circular genome. A comparative<br />
study of four assemblers for the mitochondrial genome of<br />
G. intermedia clearly shows the speed and specificity of<br />
NOVOPlasty (Table 1).<br />
NOVO<br />
Plasty<br />
MIRA MITO bim ARC<br />
Duration (min) 12 536 4777* 586<br />
Memory (GB) 15 57,6 63,4 1,9<br />
Storage (GB) 0 144 418 12<br />
Total contigs 1 3434 2221 2502<br />
Mitochondrial contigs 1 1 4 48<br />
Coverage (%) 98 94 94 84<br />
Mismatches 10 25 26 2<br />
Unidentified nucleotides 43 194 197 0<br />
TABLE 1. Benchmarking results between four assemblies of the<br />
mitochondrial genome of Gonioctena intermedia. The assemblies were<br />
constructed with MITObim (Hahn et al., 2013), MIRA (Chevreux et al.,<br />
1999), ARC (Hunter et al., <strong>2015</strong>) and NOVOPlasty.*manually terminated<br />
Discussion. Despite the many available assemblers, many<br />
researchers still struggle to find a good assembler for<br />
plastids genomes. NOVOPlasty offers an assembler<br />
specifically designed for plastids that will deliver the<br />
complete genome within 30 minutes. The algorithm will<br />
be tested on more datasets and a comparative study with<br />
other assemblers is in progress.<br />
REFERENCES<br />
Brozynska et al. PLoS One 9 (2014).<br />
Chevreux et al. Computer Science and Biology: Proceedings of the<br />
German Conference on Bioinformatics (GCB) (1999).<br />
Hahn et al. Nucleic Acids Research, 1-9 (2013).<br />
Hunter et al. http://dx.doi.org/10.1101/014662 (<strong>2015</strong>).<br />
Jeck et al. BMC Bioinformatics 23, 2942-2944 (2007).<br />
Warren et al. BMC Bioinformatics 23, 500-501 (2007).<br />
58