Abstracts
ngsfinalprogram
ngsfinalprogram
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Oral Presentation <strong>Abstracts</strong><br />
n S4:6<br />
CFSAN SNP PIPELINE: A WHOLE GENOME<br />
SEQUENCE DATA ANALYSIS PIPELINE FOR<br />
FOOD-BORNE PATHOGENS<br />
Y. Luo, J. Pettengill, J. Baugher, H. Rand, S.<br />
Davis;<br />
FDA/CFSAN, College Park, MD.<br />
In support of the analysis of whole genome sequence<br />
data (WGS) for closely related pathogens<br />
in food-borne outbreaks, the Center for<br />
Food Safety and Applied Nutrition (CFSAN)<br />
at the FDA has developed a reference-based<br />
software pipeline for high quality SNP identification<br />
and analysis. This software pipeline<br />
combines into a single package the mapping of<br />
WGS reads to a reference genome, processing<br />
of those mapping files, identification of variant<br />
sites, and production of a SNP matrix. Additional<br />
features include a summary table of the<br />
results, soft-links to minimize data storage, and<br />
the ability to switch between workstations and<br />
computer clusters with minimal effort. The CF-<br />
SAN SNP Pipeline is currently used in production<br />
mode to analyze WGS data from isolates<br />
related to food-borne illnesses. The pipeline is<br />
used when outbreak investigations are ongoing<br />
to link samples and to provide information for<br />
decision-makers. It is also used retrospectively<br />
to aid in the analysis of closed outbreaks. The<br />
CFSAN SNP Pipeline is reference-based,<br />
and so a reference must be provided. Isolate<br />
sequence data must be in fastq format but can<br />
either be paired-end or single-read data. All<br />
analysis steps are run automatically, and only<br />
depend on the proper organization of the input<br />
files and identification of a suitable reference.<br />
Additionally, each of the analysis steps can be<br />
run using individual shell scripts. The addition<br />
of new samples is very straightforward, and result<br />
files from previous portions of the analysis<br />
that do not need to be regenerated are reused.<br />
This greatly reduces the computational time<br />
when adding new samples as the mapping and<br />
pileup steps are not redone. The pipeline will<br />
run without problems on current workstations,<br />
and will run on high performance computing<br />
clusters with either Torque or Grid Engine job<br />
schedulers. The CFSAN SNP Pipeline is written<br />
in a combination of Bash and Python. The<br />
code is designed to run on Linux platforms<br />
with bash and python. BioPython must be<br />
installed in tandem with three executable software<br />
dependencies, Bowtie2, SAMtools, and<br />
VarScan. Substantial effort has been devoted to<br />
making the software robust, well-documented,<br />
and easy to use. The following links provide<br />
for access to the source code, the documentation,<br />
and the Python package. Also provided is<br />
the current publication reference. Source code:<br />
https://github.com/CFSAN-Biostatistics/snppipeline.<br />
Documentation: http://snp-pipeline.<br />
rtfd.org. PyPI package: https://pypi.python.<br />
org/pypi/snp-pipeline. Reference publication:<br />
Pettengill JB, Luo Y, Davis S, Chen Y, Gonzalez-Escalona<br />
N, Ottesen A, Rand H, Allard<br />
MW, Strain E An evaluation of alternative<br />
methods for constructing phylogenies from<br />
whole genome sequence data: A case study<br />
with Salmonella.<br />
n S4:7<br />
ASSEMBLING WHOLE GENOMES FROM<br />
MIXED MICROBIAL COMMUNITIES USING<br />
HI-C<br />
I. Liachko 1 , J. N. Burton 1 , L. Sycuro 2 , A. H.<br />
Wiser 2 , D. N. Fredricks 2 , M. J. Dunham 1 , J.<br />
Shendure 1 ;<br />
1<br />
University of Washington, Seattle, WA, 2 Fred<br />
Hutchinson Cancer Research Center, Seattle,<br />
WA.<br />
Assembly of whole genomes from next-generation<br />
sequencing is inhibited by the lack of<br />
contiguity information in short-read sequencing.<br />
This limitation also impedes metagenome<br />
assembly, since one cannot tell which sequences<br />
originate from the same species within<br />
a population. We have overcome these bottlenecks<br />
by adapting a chromosome conformation<br />
capture technique (Hi-C) for the deconvolution<br />
of metagenomes and the scaffolding of de novo<br />
assemblies of individual genomes. In modeling<br />
the 3D structure of a genome, chromosome<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
23