18.09.2015 Views

Abstracts

ngsfinalprogram

ngsfinalprogram

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Oral Presentation <strong>Abstracts</strong><br />

n S4:6<br />

CFSAN SNP PIPELINE: A WHOLE GENOME<br />

SEQUENCE DATA ANALYSIS PIPELINE FOR<br />

FOOD-BORNE PATHOGENS<br />

Y. Luo, J. Pettengill, J. Baugher, H. Rand, S.<br />

Davis;<br />

FDA/CFSAN, College Park, MD.<br />

In support of the analysis of whole genome sequence<br />

data (WGS) for closely related pathogens<br />

in food-borne outbreaks, the Center for<br />

Food Safety and Applied Nutrition (CFSAN)<br />

at the FDA has developed a reference-based<br />

software pipeline for high quality SNP identification<br />

and analysis. This software pipeline<br />

combines into a single package the mapping of<br />

WGS reads to a reference genome, processing<br />

of those mapping files, identification of variant<br />

sites, and production of a SNP matrix. Additional<br />

features include a summary table of the<br />

results, soft-links to minimize data storage, and<br />

the ability to switch between workstations and<br />

computer clusters with minimal effort. The CF-<br />

SAN SNP Pipeline is currently used in production<br />

mode to analyze WGS data from isolates<br />

related to food-borne illnesses. The pipeline is<br />

used when outbreak investigations are ongoing<br />

to link samples and to provide information for<br />

decision-makers. It is also used retrospectively<br />

to aid in the analysis of closed outbreaks. The<br />

CFSAN SNP Pipeline is reference-based,<br />

and so a reference must be provided. Isolate<br />

sequence data must be in fastq format but can<br />

either be paired-end or single-read data. All<br />

analysis steps are run automatically, and only<br />

depend on the proper organization of the input<br />

files and identification of a suitable reference.<br />

Additionally, each of the analysis steps can be<br />

run using individual shell scripts. The addition<br />

of new samples is very straightforward, and result<br />

files from previous portions of the analysis<br />

that do not need to be regenerated are reused.<br />

This greatly reduces the computational time<br />

when adding new samples as the mapping and<br />

pileup steps are not redone. The pipeline will<br />

run without problems on current workstations,<br />

and will run on high performance computing<br />

clusters with either Torque or Grid Engine job<br />

schedulers. The CFSAN SNP Pipeline is written<br />

in a combination of Bash and Python. The<br />

code is designed to run on Linux platforms<br />

with bash and python. BioPython must be<br />

installed in tandem with three executable software<br />

dependencies, Bowtie2, SAMtools, and<br />

VarScan. Substantial effort has been devoted to<br />

making the software robust, well-documented,<br />

and easy to use. The following links provide<br />

for access to the source code, the documentation,<br />

and the Python package. Also provided is<br />

the current publication reference. Source code:<br />

https://github.com/CFSAN-Biostatistics/snppipeline.<br />

Documentation: http://snp-pipeline.<br />

rtfd.org. PyPI package: https://pypi.python.<br />

org/pypi/snp-pipeline. Reference publication:<br />

Pettengill JB, Luo Y, Davis S, Chen Y, Gonzalez-Escalona<br />

N, Ottesen A, Rand H, Allard<br />

MW, Strain​ E An evaluation of alternative<br />

methods for constructing phylogenies from<br />

whole genome sequence data: A case study<br />

with Salmonella.<br />

n S4:7<br />

ASSEMBLING WHOLE GENOMES FROM<br />

MIXED MICROBIAL COMMUNITIES USING<br />

HI-C<br />

I. Liachko 1 , J. N. Burton 1 , L. Sycuro 2 , A. H.<br />

Wiser 2 , D. N. Fredricks 2 , M. J. Dunham 1 , J.<br />

Shendure 1 ;<br />

1<br />

University of Washington, Seattle, WA, 2 Fred<br />

Hutchinson Cancer Research Center, Seattle,<br />

WA.<br />

Assembly of whole genomes from next-generation<br />

sequencing is inhibited by the lack of<br />

contiguity information in short-read sequencing.<br />

This limitation also impedes metagenome<br />

assembly, since one cannot tell which sequences<br />

originate from the same species within<br />

a population. We have overcome these bottlenecks<br />

by adapting a chromosome conformation<br />

capture technique (Hi-C) for the deconvolution<br />

of metagenomes and the scaffolding of de novo<br />

assemblies of individual genomes. In modeling<br />

the 3D structure of a genome, chromosome<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

23

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!