03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: P<br />

Poster<br />

10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />

P8. IDENTIFICATION OF NUMTS THROUGH NGS DATA<br />

Vincent Branders 1,2* , Chedly Kastally 2 & Patrick Mardulyn 2 .<br />

Machine Learning Group, Institute of Information and Communication Technologies, Electronics and Applied<br />

Mathematics (ICTEAM), Université catholique de Louvain 1 ; Evolutionary Biology and Ecology, Université libre de<br />

Bruxelles 2 . * vincent.branders@uclouvain.be<br />

Numts are copies of mitochondrial DNA sequences that have been transferred into the nuclear genome. Due to their<br />

similarity with mitochondrial DNA sequences, numts have led to many misinterpretations from overestimation of<br />

diversity to wrong association between cystic fibrosis and mitochondrial genome variation. To avoid such bias induced<br />

by numts, theses sequences have to be identified. Current methodologies are based on comparisons of existing nuclear<br />

and mitochondrial sequences and searches for similarities. The Pacific Biosciences (PacBio) new technology generates<br />

sequencing reads that span thousands of base pairs, which gives the opportunity to identify numts by looking for reads<br />

with regions similar to mitochondrial sequences and surrounded by regions highly different from it. It should allow the<br />

systematic identification of numts without a complete known nuclear reference.<br />

INTRODUCTION<br />

The transfer of DNA from mitochondria to the nucleus<br />

generates nuclear copies of mitochondrial DNA (numts).<br />

Numts have been found in many species including yeasts,<br />

rodents and plants. Due to their similarity to mitochondrial<br />

DNA, numts are responsible for many misinterpretations,<br />

both in mitochondrial disease studies and phylogenetic<br />

reconstructions (Hazkani-Covo et al., 2010). Numt<br />

variation have commonly been misreported as<br />

mitochondrial mutations in patients (Yao et al., 2008).<br />

Moreover, DNA barcoding was found to overestimate the<br />

number of species when numts are coamplified (Song et<br />

al., 2008). Current methods identify such sequences by<br />

aligning mitochondrial sequences against the nuclear<br />

genome and identifying similar regions (Figure 1, left).<br />

The PacBio technology allows the sequencing of DNA<br />

fragments spanning thousands of bases pairs. This size<br />

should allow the identification of numts without the need<br />

of a complete nuclear reference (the insect species<br />

Gonioctena intermedia for example). Indeed, it should be<br />

possible to use a mitochondrial assembly to identify<br />

PacBio reads with a central region similar to the<br />

mitochondrial sequence enclosed by nuclear regions that<br />

are dissimilar to it (Figure 1, right).<br />

FIGURE 1. Identification of numts – Existing methods (left) and proposed<br />

method (right). Comparison of mitochondrial sequence to nuclear<br />

sequence (left) or long reads (right).<br />

METHODS<br />

The proposed approach aligns PacBio reads to a<br />

mitochondrial genome (here de novo assemblies of PacBio<br />

reads and Illumina HiSeq 2000 reads are used). In these<br />

long reads, numts are identified with one region similar<br />

to the mitochondrial genome but surrounded by regions<br />

that are not similar. We introduce different criteria to<br />

distinguish reads that are presumably numts and reads of<br />

mitochondrial origin (Figure 2). DNA sequences comes<br />

from an insect (Gonioctena intermedia) without reference<br />

genome.<br />

FIGURE 2. Mitochondrial reads and numts with nuclear borders.<br />

RESULTS & DISCUSSION<br />

A systematic identification of potential numts is proposed:<br />

through alignments, we identify 10 mitochondrial reads<br />

and 34 reads with potential numt for one particular<br />

mitochondrial region (the widely studied cytochrome<br />

oxidase I gene). As an exploratory research, we highlight<br />

the usefulness of Pacific Biosciences data in the<br />

identification of numts when no nuclear reference is<br />

available. It only requires PacBio reads and a<br />

mitochondrial assembly. The proposed approach is more<br />

efficient than an identification of numts through short<br />

reads that would require the complete reconstruction of<br />

both mitochondrial and nuclear genomes. A systematic<br />

identification of numts in non-models organisms should<br />

avoid misinterpretations in studies where numts could be<br />

sources of bias. Our current distinction of numts and<br />

mitochondrial reads is quite simple. A detailed analysis of<br />

this distinction could be a perspective of improvements.<br />

REFERENCES<br />

Hazkani-Covo E. et al. PLOS Genetics 6, 1-11 (2010).<br />

Song H. et al. PNAS 105, 13486-13491 (2008).<br />

Yao Y. G. et al. Journal of Medical Genetics 45, 769-772 (2008).<br />

52

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!