03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: P<br />

Poster<br />

10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />

P14. NOVOPLASTY: IN SILICO ASSEMBLY OF PLASTID GENOMES FROM<br />

WHOLE GENOME NGS DATA<br />

Nicolas Dierckxsens 1,2* , Olivier Hardy 2 , Ludwig Triest 3 , Patrick Mardulyn 2 & Guillaume Smits 1,4 .<br />

Interuniversity Institute of Bioinformatics Brussels (IB2), ULB-VUB, Triomflaan CP 263, 1050 Brussels, Belgium 1 ;<br />

Evolutionary Biology and Ecology Unit, CP 160/12, Faculté des Sciences, Université Libre de Bruxelles, Av. F. D.<br />

Roosevelt 50, B-1050 Brussels, Belgium 2 ; Plant Biology and Nature Management, Vrije Universiteit Brussel, Brussels,<br />

Belgium 3 ; Department of Paediatrics, Hôpital Universitaire des Enfants Reine Fabiola (HUDERF), Université Libre de<br />

Bruxelles (ULB), Brussels, Belgium 4 . * nicolasdierckxsens@hotmail.com<br />

Thanks to the evolution in next-generation sequencer (NGS) technology, whole genome data can be readily obtained<br />

from a variety of samples. There are many algorithms available to assemble these reads, but few of them focus on<br />

assembling the plastid genomes. Therefore we developed a new algorithm that solely assembles the plastid genomes<br />

from whole genome data, starting from a single seed. The algorithm is capable of utilizing the full advantage of very high<br />

coverage, which makes it even capable of assembling through problematic regions (AT-rich). The algorithm has been<br />

tested on several whole genome Illumina datasets and it outperformed other assemblers in runtime and specificity. Every<br />

assembly resulted in a single contig for any chloroplast or mitochondrial genome and this always within a timeframe of<br />

30 minutes.<br />

INTRODUCTION<br />

Chloroplasts and mitochondria are both responsible for<br />

generating metabolic energy within eukaryotic cells. Both<br />

plastids are maternally inherited and have a persistent gene<br />

organization, what makes them ideal for phylogenetic<br />

studies or as a barcode in plant and food identification<br />

(Brozynska et al., 2014). But assembling these plastids<br />

genomes is not always that straightforward with the<br />

currently available tools. Therefore we developed a new<br />

algorithm, specifically for the assembly of plastid<br />

genomes from whole genome data.<br />

METHODS<br />

The algorithm is written in Perl. All assemblies were<br />

executed on Intel Xeon CPU machine containing 24 cores<br />

of 2.93 GHz with a total of 96,8 GB of RAM. All nonhuman<br />

samples were sequenced on the Illumina HiSeq<br />

platform (101 bp paired-end reads). The human<br />

mitochondria samples (PCR-free) were sequenced on the<br />

Illumina HiSeqX platform (150 bp paired-end reads). The<br />

Gonioctena intermedia sample was also sequenced on the<br />

PacBio platform.<br />

RESULTS & DISCUSSION<br />

Algorithm. The algorithm is similar to string overlap<br />

algorithms like SSAKE (Warren et al., 2007) and VCAKE<br />

(Jeck et al., 2007). It starts with reading the sequences into<br />

a hash table, which facilitates a quick accessibility. The<br />

assembly has to be initiated by a seed that will be<br />

extended bidirectionally in iterations. The seed input is<br />

quite flexible, it can be one sequence read, a conserved<br />

gene or even a complete mitochondrial genome from a<br />

distant species. Every base extension is determined by a<br />

consensus between the overlapping reads. Unlike most<br />

assemblers, NOVOPlasty doesn’t try to assemble every<br />

read, but will extend the given seed until the circular<br />

plastid is formed.<br />

Assemblies. NOVOPlasty has currently been tested for the<br />

assembly of 8 chloroplasts and 6 mitochondria. Since<br />

chloroplasts contain an inverted repeat, two versions of the<br />

assembly are generated. The differ only in the orientation<br />

of the region between the two repeats; the correct one will<br />

have to be resolved manually. Besides the mitochondrion<br />

of the leaf beetle Gonioctena intermedia, all assemblies<br />

resulted in a complete circular genome. A comparative<br />

study of four assemblers for the mitochondrial genome of<br />

G. intermedia clearly shows the speed and specificity of<br />

NOVOPlasty (Table 1).<br />

NOVO<br />

Plasty<br />

MIRA MITO bim ARC<br />

Duration (min) 12 536 4777* 586<br />

Memory (GB) 15 57,6 63,4 1,9<br />

Storage (GB) 0 144 418 12<br />

Total contigs 1 3434 2221 2502<br />

Mitochondrial contigs 1 1 4 48<br />

Coverage (%) 98 94 94 84<br />

Mismatches 10 25 26 2<br />

Unidentified nucleotides 43 194 197 0<br />

TABLE 1. Benchmarking results between four assemblies of the<br />

mitochondrial genome of Gonioctena intermedia. The assemblies were<br />

constructed with MITObim (Hahn et al., 2013), MIRA (Chevreux et al.,<br />

1999), ARC (Hunter et al., <strong>2015</strong>) and NOVOPlasty.*manually terminated<br />

Discussion. Despite the many available assemblers, many<br />

researchers still struggle to find a good assembler for<br />

plastids genomes. NOVOPlasty offers an assembler<br />

specifically designed for plastids that will deliver the<br />

complete genome within 30 minutes. The algorithm will<br />

be tested on more datasets and a comparative study with<br />

other assemblers is in progress.<br />

REFERENCES<br />

Brozynska et al. PLoS One 9 (2014).<br />

Chevreux et al. Computer Science and Biology: Proceedings of the<br />

German Conference on Bioinformatics (GCB) (1999).<br />

Hahn et al. Nucleic Acids Research, 1-9 (2013).<br />

Hunter et al. http://dx.doi.org/10.1101/014662 (<strong>2015</strong>).<br />

Jeck et al. BMC Bioinformatics 23, 2942-2944 (2007).<br />

Warren et al. BMC Bioinformatics 23, 500-501 (2007).<br />

58

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!