Abstracts
ngsfinalprogram
ngsfinalprogram
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Poster <strong>Abstracts</strong><br />
the results are presented. Analysis results are<br />
shown in a highly accessible manner, allowing<br />
the user to gain a quick overview as well<br />
as permitting deep analysis. The performance<br />
of the PAIPline was benchmarked on real and<br />
artificial datasets of known compositions and<br />
compared to competing tools. The results and<br />
discussed features show that the presented approach<br />
is a viable strategy for the identification<br />
of pathogen sequences in NGS datasets.<br />
n 10<br />
SEPARATION OF FOREGROUND AND<br />
BACKGROUND READS IN MIXED NGS<br />
DATASETS<br />
S. Tausch, A. Nitsche, B. Renard, P. Dabrowski;<br />
Robert Koch Institute, Berlin, GERMANY.<br />
NGS is a valuable technology for rapid and indepth<br />
analysis of clinical samples, as it allows<br />
sequencing of a pathogen’s whole genome<br />
directly from patient material within as little<br />
as 26 hours. However, the follow-up analysis<br />
is severely slowed down by the abundance of<br />
reads originating from the host. Thus, in order<br />
to exploit the full potential of the technology<br />
for rapid diagnostics, a method for rapid in<br />
silico removal of host reads is necessary. Commonly,<br />
a mapping-based approach is used to<br />
separate reads: either reads mapping to a background<br />
reference or reads not mapping to a<br />
foreground reference are discarded. However,<br />
while the former approach is highly specific<br />
in discarding only true background reads and<br />
the latter is highly sensitive in only keeping<br />
foreground reads, neither offers a good balance.<br />
Hence we have aimed at developing a<br />
novel tool specifically geared towards both<br />
specific and sensitive separation of foreground<br />
and background reads. In order to determine<br />
whether a read belongs to the foreground or<br />
the background, we train markov chains of<br />
an order k from 4 to 12 on user-provided sets<br />
of foreground and background reference sequences,<br />
where each state is a k-mer of length<br />
k and each transition is one of the four possible<br />
bases A, C, G and T. We then calculate the<br />
difference of log likelihoods of each transition<br />
observed within a read with regards to<br />
the foreground and the background markov<br />
chains. This difference is then used as a score<br />
for the separation of reads, with scores smaller<br />
than 0 indicating a background read and scores<br />
larger than 0 indicating a foreground read.<br />
We have tested our tool on several datasets,<br />
including Cowpoxvirus sequenced from a<br />
human host. In all cases, our tool was faster<br />
than any competing tool (achieving speeds of<br />
up to 10 Megabases/second using 4 CPUs),<br />
including Kraken and mapping via bowtie2.<br />
At the same time, we consistently achieved<br />
the best F-Score of all tested tools. Our tool is<br />
developed in python and java and available for<br />
download from http://sourceforge.net/projects/<br />
rambok/ We have developed a freely available,<br />
easy to use, rapid and both highly sensitive and<br />
specific tool for the separation of foreground<br />
and background reads in mixed NGS datasets.<br />
We believe that this will be highly useful as an<br />
initial filtering step for anyone analyzing viral<br />
sequences via NGS.<br />
n 11<br />
A RAPID AND SCALABLE SINGLE<br />
NUCLEOTIDE POLYMORPHISM DISCOVERY<br />
AND VALIDATION PIPELINE FOR OUTBREAK<br />
INVESTIGATION OF BACTERIAL PATHOGENS<br />
B. Rusconi 1 , A. L. Rodriguez 2 , S. S. Koenig 1 ,<br />
M. Eppinger 1 ;<br />
1<br />
University of Texas at San Antonio - South<br />
Texas Center For Emerging Infectious Diseases<br />
(STCEID), San Antonio, TX, 2 University of<br />
Texas at San Antonio -Computational Biology<br />
Initiative, San Antonio, TX.<br />
Background: Assuring a timely and effective<br />
response in the control of bacterial outbreaks<br />
is challenging, as discriminatory power becomes<br />
of particular importance to distinguish<br />
outbreak isolates that form tight clonal complexes<br />
with only few genetic polymorphisms.<br />
The increase of throughput and concomitant<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
45