bbc 2015
BBC2015_booklet
BBC2015_booklet
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />
Abstract ID: P<br />
Poster<br />
10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />
P8. IDENTIFICATION OF NUMTS THROUGH NGS DATA<br />
Vincent Branders 1,2* , Chedly Kastally 2 & Patrick Mardulyn 2 .<br />
Machine Learning Group, Institute of Information and Communication Technologies, Electronics and Applied<br />
Mathematics (ICTEAM), Université catholique de Louvain 1 ; Evolutionary Biology and Ecology, Université libre de<br />
Bruxelles 2 . * vincent.branders@uclouvain.be<br />
Numts are copies of mitochondrial DNA sequences that have been transferred into the nuclear genome. Due to their<br />
similarity with mitochondrial DNA sequences, numts have led to many misinterpretations from overestimation of<br />
diversity to wrong association between cystic fibrosis and mitochondrial genome variation. To avoid such bias induced<br />
by numts, theses sequences have to be identified. Current methodologies are based on comparisons of existing nuclear<br />
and mitochondrial sequences and searches for similarities. The Pacific Biosciences (PacBio) new technology generates<br />
sequencing reads that span thousands of base pairs, which gives the opportunity to identify numts by looking for reads<br />
with regions similar to mitochondrial sequences and surrounded by regions highly different from it. It should allow the<br />
systematic identification of numts without a complete known nuclear reference.<br />
INTRODUCTION<br />
The transfer of DNA from mitochondria to the nucleus<br />
generates nuclear copies of mitochondrial DNA (numts).<br />
Numts have been found in many species including yeasts,<br />
rodents and plants. Due to their similarity to mitochondrial<br />
DNA, numts are responsible for many misinterpretations,<br />
both in mitochondrial disease studies and phylogenetic<br />
reconstructions (Hazkani-Covo et al., 2010). Numt<br />
variation have commonly been misreported as<br />
mitochondrial mutations in patients (Yao et al., 2008).<br />
Moreover, DNA barcoding was found to overestimate the<br />
number of species when numts are coamplified (Song et<br />
al., 2008). Current methods identify such sequences by<br />
aligning mitochondrial sequences against the nuclear<br />
genome and identifying similar regions (Figure 1, left).<br />
The PacBio technology allows the sequencing of DNA<br />
fragments spanning thousands of bases pairs. This size<br />
should allow the identification of numts without the need<br />
of a complete nuclear reference (the insect species<br />
Gonioctena intermedia for example). Indeed, it should be<br />
possible to use a mitochondrial assembly to identify<br />
PacBio reads with a central region similar to the<br />
mitochondrial sequence enclosed by nuclear regions that<br />
are dissimilar to it (Figure 1, right).<br />
FIGURE 1. Identification of numts – Existing methods (left) and proposed<br />
method (right). Comparison of mitochondrial sequence to nuclear<br />
sequence (left) or long reads (right).<br />
METHODS<br />
The proposed approach aligns PacBio reads to a<br />
mitochondrial genome (here de novo assemblies of PacBio<br />
reads and Illumina HiSeq 2000 reads are used). In these<br />
long reads, numts are identified with one region similar<br />
to the mitochondrial genome but surrounded by regions<br />
that are not similar. We introduce different criteria to<br />
distinguish reads that are presumably numts and reads of<br />
mitochondrial origin (Figure 2). DNA sequences comes<br />
from an insect (Gonioctena intermedia) without reference<br />
genome.<br />
FIGURE 2. Mitochondrial reads and numts with nuclear borders.<br />
RESULTS & DISCUSSION<br />
A systematic identification of potential numts is proposed:<br />
through alignments, we identify 10 mitochondrial reads<br />
and 34 reads with potential numt for one particular<br />
mitochondrial region (the widely studied cytochrome<br />
oxidase I gene). As an exploratory research, we highlight<br />
the usefulness of Pacific Biosciences data in the<br />
identification of numts when no nuclear reference is<br />
available. It only requires PacBio reads and a<br />
mitochondrial assembly. The proposed approach is more<br />
efficient than an identification of numts through short<br />
reads that would require the complete reconstruction of<br />
both mitochondrial and nuclear genomes. A systematic<br />
identification of numts in non-models organisms should<br />
avoid misinterpretations in studies where numts could be<br />
sources of bias. Our current distinction of numts and<br />
mitochondrial reads is quite simple. A detailed analysis of<br />
this distinction could be a perspective of improvements.<br />
REFERENCES<br />
Hazkani-Covo E. et al. PLOS Genetics 6, 1-11 (2010).<br />
Song H. et al. PNAS 105, 13486-13491 (2008).<br />
Yao Y. G. et al. Journal of Medical Genetics 45, 769-772 (2008).<br />
52