Abstracts
ngsfinalprogram
ngsfinalprogram
Transform your PDFs into Flipbooks and boost your revenue!
Leverage SEO-optimized Flipbooks, powerful backlinks, and multimedia content to professionally showcase your products and significantly increase your reach.
Final Program and <strong>Abstracts</strong><br />
ASM Conference on<br />
Rapid Next-Generation Sequencing<br />
and Bioinformatic Pipelines for<br />
Enhanced Molecular Epidemiologic<br />
Investigation of Pathogens<br />
September 24 – 27, 2015<br />
Washington, DC
© 2015 American Society for Microbiology<br />
1752 N Street, N.W.<br />
Washington, DC 20036-2904<br />
Phone: 202-737-3600<br />
World Wide Web: www.asm.org<br />
All Rights Reserved<br />
Printed in the United States of America
Table of Contents<br />
ASM Conferences Information....................................... 2<br />
Conference Organization................................................ 3<br />
Acknowledgments........................................................... 3<br />
General Information........................................................ 4<br />
Travel Grants................................................................... 5<br />
Scientific Program........................................................... 6<br />
Oral Presentation <strong>Abstracts</strong>........................................... 15<br />
Poster <strong>Abstracts</strong>............................................................. 39<br />
Index........................................................................... 112<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
1
ASM Conferences Committee<br />
Sean Whelan, Chair<br />
Harvard Medical School<br />
Joanna Goldberg, Vice Chair<br />
Emory University<br />
Victor DiRita<br />
University of Michigan<br />
Lora Hooper<br />
University of Texas Southwestern<br />
Medical Center<br />
Petra Levin<br />
Washington University in Saint Louis<br />
Gary Procop*<br />
Cleveland Clinic<br />
Curtis Suttle<br />
University of British Columbia<br />
Theodore White<br />
University of Missouri – Kansas City<br />
*Indicates Committee Liaison for this Conference<br />
ASM Conferences Mission<br />
To identify emerging or underrepresented topics of broad scientific significance.<br />
To facilitate interactive exchange in meetings of 100 to 500 people.<br />
To encourage student and postdoctoral participation.<br />
To recruit individuals in disciplines not already involved in ASM to ASM<br />
membership.<br />
To foster interdisciplinary and international exchange and collaboration with<br />
other scientific organizations.<br />
2<br />
ASM Conferences
Program Committee<br />
Marc Allard<br />
U.S. Food and Drug<br />
Administration<br />
Silver Spring, MD<br />
Eric Brown<br />
U.S. Food and Drug<br />
Administration<br />
Silver Spring, MD<br />
Dag Harmsen<br />
University of Muenster<br />
Muenster, Germany<br />
Acknowledgments<br />
The Conference Program Committee and the American Society for Microbiology<br />
acknowledge the following for their support of the ASM Conference on Rapid<br />
Next-Generation Sequencing and Bioinformatic Pipelines for Enhanced Molecular<br />
Epidemiologic Investigation of Pathogens. On behalf of our leadership and members,<br />
we thank them for their financial support:<br />
Platinum Supporters<br />
Illumina OpGen ThermoFisher<br />
Gold Supporters<br />
Applied Maths<br />
PathoNGenTrace<br />
Qiagen<br />
Ridom Bioinformatics<br />
Silver Supporters<br />
Geneious<br />
Genologics<br />
Bronze Supporters<br />
Microbial Genomics –<br />
a journal from the Society<br />
for General Microbiology<br />
(SGM)<br />
New England Biolabs<br />
Sponsored Seminars<br />
The Platinum sponsors are hosting additional educational opportunities during the<br />
conference. Please stop by the tabletop displays of Illumina, OpGen and ThermoFisher<br />
to learn more about how you can participate. Space is limited.<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
3
General Information<br />
REGISTRATION AND NAME BADGES<br />
ASM Staff will be available at the<br />
registration desk in the Blue Room<br />
Pre-function at the Omni Shoreham<br />
Hotel during posted registration hours.<br />
Participants may collect name badges and<br />
program materials at the registration desk.<br />
A name badge is required for entry into all<br />
sessions and meals. Each participant may<br />
register one guest to attend the Welcome<br />
Reception at the National Zoo and/or<br />
the Conference Party. Guest tickets are<br />
$50 per event. Guests may not attend<br />
sessions, poster sessions, lunches or<br />
coffee breaks.<br />
GENERAL SESSIONS<br />
All general sessions will be held in the<br />
Blue Room in the Omni Shoreham Hotel.<br />
POSTER SESSIONS<br />
Poster boards are located in the Hampton<br />
Room at the Omni Shoreham Hotel.<br />
Posters will be available for viewing<br />
informally throughout the conference,<br />
with official poster sessions scheduled on<br />
Friday and Saturday.<br />
All posters may be mounted starting on<br />
Thursday after 3:00 pm, and should be<br />
available to view by no later than 10:00<br />
am Friday morning. Posters are to be<br />
removed after 6:30 pm on Saturday,<br />
September 26, but by no later than 12:30<br />
pm on Sunday, September 27. Posters not<br />
removed in time may be discarded.<br />
Odd-numbered posters (1,3,5…) will<br />
be officially presented in Session A on<br />
Friday, September 25, and even-numbered<br />
posters (2,4,6…) will be officially<br />
presented in Session B on Saturday,<br />
September 26.<br />
Please check your assigned number in the<br />
abstract index. The same number is used<br />
for the presentation and board number.<br />
EXHIBITS<br />
Please be sure to visit the supporter’s<br />
display tables in the Blue Room<br />
Pre-Function. Supporting company<br />
representatives will be available to talk<br />
with you during coffee breaks.<br />
NETWORKING MEALS AND SOCIAL<br />
EVENTS<br />
Registration includes attendance at the<br />
Welcome Reception at the National<br />
Zoo on Thursday evening, Networking<br />
Lunches on Friday and Saturday, and the<br />
Conference Party with DJ and Dancing<br />
on Saturday night. Ample time has also<br />
been scheduled for participants to network<br />
during coffee breaks.<br />
CERTIFICATE OF ATTENDANCE<br />
Certificates of Attendance can be found<br />
in the registration packet received at the<br />
registration desk.<br />
Note: Certificates of Attendance do not<br />
list session information.<br />
CAMERAS AND RECORDINGS<br />
POLICY<br />
Audio/video recorders and cameras are<br />
not allowed in session rooms or in the<br />
poster areas. Taking photographs with<br />
any device is prohibited.<br />
CHILD POLICY<br />
Children are not permitted in session<br />
rooms, poster sessions, conference meals<br />
or social events. Please contact the hotel<br />
concierge to arrange for babysitting<br />
services in your hotel room.<br />
4<br />
ASM Conferences
Travel Grants<br />
ASM STUDENT TRAVEL GRANTS<br />
ASM encourages the participation of graduate students and new postdocs at ASM<br />
Conferences. To support the cost of attending the conference, ASM has awarded travel<br />
grants of $500 to each of the following individuals:<br />
Levent Albayrak<br />
Nabil-Fareed Alikhan<br />
Philip Ashton<br />
Ellsworth Campbell<br />
Laura Carroll<br />
Hattie Chung<br />
Madeline Galac<br />
John Haydek<br />
Mathis Hjelmsø<br />
Sung Im<br />
Marianne Kjeldsen<br />
Denis Kutnjak<br />
Ana Lauer<br />
Kara Levinson<br />
An-Dong Li<br />
Helena Jaramillo Mesa<br />
Muhammad Shafiq<br />
Dylan Storey<br />
Anni Zhang<br />
ASM-LINK UNDERGRADUATE FACULTY RESEARCH INITIATIVE<br />
FELLOWSHIPS<br />
The ASM-LINK Undergraduate Faculty Research Initiative (UFRI) Fellowship is a<br />
professional development resource that trains STEM faculty to initiate and sustain<br />
successful research partnerships. Through interactive training, structured mentoring,<br />
and deliberate networking at ASM-sponsored research conferences, UFRI fellows gain<br />
access to resources and networks to advance their undergraduate research programs.<br />
We congratulate the 2015 ASM Conference on Rapid Next-Generation Sequencing<br />
UFRI Fellows:<br />
Olga Calderon<br />
LaGuardia Community College, CUNY, Long Island, NY<br />
Robert Furler<br />
Florida SouthWestern State College, Ft. Myers, FL<br />
Olabisi Ojo<br />
Southern University at New Orleans, New Orleans, LA<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
5
Scientific Program<br />
Thursday, September 24, 2015<br />
5:00 pm – 7:00 pm Opening Keynote Session<br />
Blue Room<br />
Session Chair: Steven Musser<br />
5:00 – 5:15 pm Welcome Remarks<br />
Joseph Campos; Secretary, American Society for<br />
Microbiology, Washington, DC<br />
Gary Procop; ASM Conferences Committee, Washington, DC<br />
Eric Brown; US Food and Drug Administration, Silver<br />
Spring, MD<br />
5:15 – 6:00 pm Microbial Genomics and Beyond<br />
George Weinstock; Jackson Laboratory, Farmington, CT<br />
6:00 – 6:45 pm Translating Genomics from Research to Clinical Application<br />
Julian Parkhill; Univ. of Cambridge, Cambridge, United<br />
Kingdom<br />
7:15 – 9:00 pm Welcome Reception<br />
National Zoo, Hors d’oeuvres and an open bar will be provided. You must<br />
Bird House<br />
have your name badge to enter.<br />
(10 min. walk from Omni)<br />
To get there: Walking out the front entrance of the Omni<br />
Shoreham, walk to the right on Calvert Street (the street<br />
in front of the hotel), and at the major intersection with<br />
Connecticut Avenue turn left to walk up Connecticut Avenue.<br />
The National Zoo entrance will be on your right. The Bird<br />
House is the first major building you will reach as you enter<br />
the Zoo. Staff will be on hand to direct you once you arrive at<br />
the Zoo.<br />
A small shuttle van will also run on a continuous loop between<br />
the hotel and the zoo for attendees who elect not to walk (note<br />
the van is small so wait times could be significant).<br />
6<br />
ASM Conferences
Scientific Program<br />
Friday, September 25, 2015<br />
8:00 – 8:45 am Surveillance Keynote I<br />
Blue Room<br />
Session Chair: Eric Brown<br />
Priming the Innovation Pump: FDA’s Role in Advancing and<br />
Using NGS<br />
Stephen Ostroff; US Food and Drug Administration, Silver<br />
Spring, MD<br />
8:45 – 10:00 am Session 1: Genomics for Food and Veterinary Pathogen<br />
Blue Room<br />
Surveillance<br />
Session Chair: Eric Brown<br />
8:45 – 9:15 am GenomeTrakr: A Pathogen Database to Build a Global<br />
Genomic Network for Pathogen Traceback and Outbreak<br />
Detection<br />
Marc Allard; US Food and Drug Administration, Silver<br />
Spring, MD<br />
9:15 – 9:45 am Global Microbial Identifier – is Global Harmonization and<br />
Comparison of WGS Data Feasible?<br />
Frank Aarestrup; Technical Univ. of Denmark, Lyngby,<br />
Denmark<br />
9:45 – 10:00 am Three Months of Surveillance of S. Typhimurium and<br />
S. 1,4,[5],12:i:- in Denmark Based on Whole-genome<br />
Sequencing and MLVA Typing<br />
Marianne Kjeldsen; Statens Serum Institut, Copenhagen,<br />
DENMARK<br />
10:00 – 10:30 am Coffee Break<br />
Blue Prefunction Room<br />
and Patio<br />
10:30 – 12:00 pm Session 2: One Health, Regulation and Assuring the<br />
Blue Room<br />
Quality of NGS for Pathogen Surveillance<br />
Session Chair: Eric Brown<br />
10:30 – 11:00 am Assuring the Quality of Next-Generation Sequencing in<br />
Clinical and Public Health Laboratories<br />
Amy Gargis; Centers for Disease Control and Prevention<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
7
Scientific Program<br />
11:00 – 11:15 am Long Reads Sequencing for Better Short Reads SNP Analysis<br />
Deborah Moine; Nestlé Institute of Health Sciences,<br />
Lausanne, SWITZERLAND<br />
11:15 – 11:30 am Subtractive-hybridization for Enrichment of Non-host Nucleic<br />
Acid for Improvement of Sequence-based Detection of<br />
Pathogens<br />
Roger Barrette; Plum Island Animal Disease Center (USDA/<br />
APHIS), Greenport, NY<br />
11:30 – 11:45 am The Salmonella in silico Typing Resource (SISTR): Rapid<br />
Analysis of Salmonella Draft Genome Sequence Data<br />
Catherine Yoshida; Public Health Agency of Canda, Guelph,<br />
ON, CANADA<br />
11:45 – 12:00 pm Revolutionising Public Health Reference Microbiology Using<br />
Whole Genome Sequencing: A Case Study with Salmonella<br />
Philip Ashton; Public Health England, London, UNITED<br />
KINGDOM<br />
12:00 – 1:15 pm Lunch<br />
Palladian Ballroom<br />
1:15 – 3:15 pm Session 3: Genomics for Public Health Pathogen Surveillance<br />
Blue Room<br />
Session Chair: Amy Gargis<br />
1:15 – 1:45 pm The Application of Genomics to Public Health—an<br />
Epidemiologist’s Point of View<br />
Greg Armstrong; Centers for Disease Control and Prevention,<br />
Atlanta, GA<br />
1:45 – 2:15 pm Tracing Evolution and Spread of Mycobacterium tuberculosis<br />
Strains in Times of Antibiotic Treatment<br />
Stefan Niemann; Research Center Borstel, Borstel, Germany<br />
2:15 – 2:30 pm Benchmark Datasets for Validating Foodborne Outbreak<br />
Investigations: Integrating WGS and Phylogenomic Analyses<br />
Ruth Timme; US Food and Drug Administration, College<br />
Park, MD<br />
8<br />
ASM Conferences
2:30 – 2:45 pm Integrating Core Genome Phylogenetic Relationships<br />
and Isolate Geographic Data to Trace the 2012 Neisseria<br />
meningitidis Outbreak in New York City<br />
Madeline Galac; Univ. of North Carolina at Charlotte,<br />
Charlotte, NC<br />
2:45 – 3:00 pm Whole Genome Sequencing Provides Rapid Traceback of<br />
Clinical to Food Sources During a Foodborne Outbreak of<br />
Salmonellosis<br />
Maria Hoffmann; US Food and Drug Administration, College<br />
Park, MD<br />
3:00 – 3:15 pm Transforming Public Health Microbiology in the United States<br />
with Whole Genome Sequencing (WGS) - PulseNet and<br />
Beyond<br />
Eija Trees; CDC, Atlanta, GA<br />
3:15 – 3:30 pm Coffee Break<br />
Blue Prefunction Room<br />
and Patio<br />
3:30 – 6:00 pm Session 4: Bioinformatic Pipelines and Tools for Genomic<br />
Blue Room<br />
Pathogen Surveillance<br />
Session Chair: Bill Klimke<br />
3:30 – 4:00 pm Pathogen Genomics at NCBI<br />
David Lipman; National Center for Biotechnology<br />
Information, Bethesda, MD<br />
4:00 – 4:30 pm Overview of Tools for Microbial NGS Data Analysis<br />
Dag Harmsen; Univ. of Munster, Munster, Germany<br />
4:30 – 5:00 pm Community and Social Data / Applications for Genomic<br />
Pathogen Surveillance<br />
David Aanensen, Imperial College, London, United Kingdom<br />
5:00 – 5:15 pm Salmonella Serotype Determination Utilizing High-throughput<br />
Genome Sequencing Data<br />
Xiangyu Deng; Univ. of Georgia, Griffin, GA<br />
5:15 – 5:30 pm PATRIC Pipeline<br />
Fangfang Xia; Univ. of Chicago, Chicago, IL<br />
Scientific Program<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
9
Scientific Program<br />
5:30 – 5:45 pm CFSAN SNP Pipeline: A Whole Genome Sequence Data<br />
Analysis Pipeline for Food-borne Pathogens<br />
Yan Luo; FDA/CFSAN, College Park, MD<br />
5:45 – 6:00 pm Assembling Whole Genomes from Mixed Microbial<br />
Communities Using Hi-C<br />
Ivan Liachko; Univ. of Washington, Seattle, WA<br />
6:00 – 7:30 pm Poster Session A<br />
Hampton Room Odd-numbered posters will be officially presented.<br />
Saturday, September 26, 2015<br />
8:00 – 8:45 am Surveillance Keynote II<br />
Blue Room<br />
Session Chair: Greg Armstrong<br />
Integrating Advanced Molecular Technologies into Public<br />
Health<br />
Rima Khabbaz; Centers for Disease Control and Prevention,<br />
Atlanta, GA<br />
8:45 – 9:45 am Session 5: Omni-Omics for Pathogen Surveillance<br />
Blue Room<br />
Session Chair: Martin Maiden<br />
8:45 – 9:15 am SURPI: A Deep Sequencing Clinical Analysis Tool for<br />
Infectious Disease<br />
Charles Y. Chiu; Univ. of California, San Francisco, San<br />
Francisco, CA<br />
9:15 – 9:45 am Genomics & Transcriptomics in the Clinical Microbiology<br />
Laboratory<br />
Randall J. Olsen; Houston Methodist, Houston, TX<br />
9:45 – 10:15 am Coffee Break<br />
Blue Prefunction Room<br />
and Patio<br />
10:15 – 11:45 am Session 6: Genomics for Microbial Taxonomy<br />
Blue Room<br />
Session Chair: Dag Harmsen<br />
10:15 – 10:45 am A New Genomics Driven Taxonomy. Are We There, Yet?<br />
George Garrity; Michigan State Univ., East Lansing, MI<br />
10<br />
ASM Conferences
Scientific Program<br />
10:45 – 11:15 am Beyond Typing and Phylogeny: the Population and Functional<br />
Genomics of the Neisseria<br />
Martin Maiden; Univ. of Oxford, Oxford, United Kingdom<br />
11:15 – 11:30 am Next Generation Sequencing of Brucella melitensis Isolates<br />
from Kuwait and Comparative Genome Analyses<br />
Abu Mustafa; Kuwait Univ., Jabriya, KUWAIT<br />
11:30 – 11:45 am Microbial Genomic Taxonomy at GenBank<br />
Scott Federhen; NCBI, Bethesda, MD<br />
11:45 am – 1:15 pm Lunch<br />
Empire Ballroom<br />
1:15 – 3:45 pm Session 7: Pathogen Surveillance Software Demonstration<br />
Blue Room<br />
Session Chairs: Kathryn Holt, Bill Klimke, Errol Strain<br />
1:15 – 1:30 pm What Do We Need from Microbial Genomics Surveillance<br />
Software?<br />
Kathryn Holt; Univ. of Melbourne, Melbourne, AUSTRALIA<br />
1:30 – 1:39 pm A Biosurveillance Analysis Pipeline for Genomic Sequence<br />
Data<br />
Christian Olsen; Biomatters, Inc., Newark, NJ<br />
1:39 – 1:48 pm A Universal Whole Genome Sequencing Approach Using<br />
Whole Genome MLST and Whole Genome SNP Analysis in<br />
the Cloud<br />
Hannes Pouseele; Applied Maths NV, Sint-Martens-Latem,<br />
BELGIUM<br />
1:48 – 1:57 pm Typing and Epidemiological Clustering of Common Pathogens<br />
Based on Whole Genome NGS Data<br />
Arne Materna; QIAGEN, Aarhus, DENMARK<br />
1:57 – 2:06 pm wgsa.net: Whole Genome Sequence Analysis<br />
David Aanensen; Imperial College London, London, UNITED<br />
KINGDOM<br />
2:06 – 2:15 pm SeqSphere+ Software for Prospective Bacterial Genomic<br />
Surveillance and Resistome or Virulome Analysis<br />
Jörg Rothgänger; Ridom GmbH, Münster, GERMANY<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
11
Scientific Program<br />
2:15 – 2:24 pm Nullarbor: Rapid Analysis of Bacterial Outbreak Sequence<br />
Data<br />
Torsten Seemann; Univ. of Melbourne, Melbourne,<br />
AUSTRALIA<br />
2:24 – 2:33 pm EnteroBase: A Powerful, User-friendly Online Resource for<br />
Analysing Genomic Variation<br />
Nabil-Fareed Alikhan; Univ. of Warwick, Coventry, UNITED<br />
KINGDOM<br />
2:33 – 2:42 pm SnapperDB: A Scalable Database for Routine Sequencing of<br />
Bacterial Isolates<br />
Philip Ashton; Public Health England, London, UNITED<br />
KINGDOM<br />
2:42 – 2:51 pm Phylogenetic Reconstruction and Outbreak Investigation<br />
Using IRIDA and SNVPhyl<br />
Aaron Petkau; Public Health Agency of Canada, Winnipeg,<br />
MB, CANADA<br />
2:51 – 3:00 pm PanCore: A Flexible Workflow for the Comparison and<br />
Assignment of Genomes to Outcomes<br />
Dylan Storey; Univ. of California, Davis, CA<br />
3:00 – 3:09 pm Reference-free Pan-genomic Epidemiology using Cortex<br />
Zamin Iqbal; Wellcome Trust Centre for Human Genetics,<br />
Univ. of Oxford, Oxford, UNITED KINGDOM<br />
3:10 – 3:15 pm Quality Issues and Standards for Next Generation Sequencing<br />
Bill Klimke; NCBI/NLM/NIH, Bethesda, MD<br />
3:15 – 3:35 pm Retrospective WGS of Two Foodborne Outbreaks<br />
Errol Strain; US Food and Drug Administration, College<br />
Park, MD<br />
3:35 – 4:00 pm Coffee Break<br />
Blue Prefunction Room<br />
and Patio<br />
4:00 – 5:00 pm Session 8: Genomics for Microbial Forensics<br />
Blue Room<br />
Session Chair: Marc Allard<br />
12<br />
ASM Conferences
Scientific Program<br />
4:00 – 4:30 pm Microbial Forensics and Its Needs for Standards and<br />
Standardization<br />
Bruce Budowle; Univ. of Texas, Austin, TX<br />
4:30 – 5:00 pm Anthrax – Molecular Epidemiology and Forensics from Whole<br />
Genome Sequencing and Metagenomic Sampling of Complex<br />
Specimens<br />
Paul Keim; Northern Arizona Univ., Flagstaff, AZ<br />
5:00 – 6:30 pm Poster Session B<br />
Hampton Room Even-numbered posters will be officially presented.<br />
7:00 – 10:00 pm Conference Party<br />
Blue Room<br />
Enjoy a relaxed and fun evening with your colleagues. A DJ<br />
will play a range of music and dancing is encouraged. Heavy<br />
hors d’oeuvres, a carving station and a tickets/cash bar will be<br />
provided.<br />
Sunday, September 27, 2015<br />
8:00 – 10:00 am Session 9: Genomics for Clinical Microbiology Pathogen<br />
Blue Room<br />
Surveillance<br />
Session Chair: Stefan Niemann<br />
8:00 – 8:30 am Full Genome Sequencing to Track Carbapenem Producing<br />
Organisms within a Hospital<br />
Julie Segre; National Institutes of Health, Bethesda, MD<br />
8:30 – 9:00 am Prospective Genome Sequencing for Real-time Surveillance of<br />
Resistant Bacteria in a Univ. Hospital<br />
Alexander Mellmann; Univ. of Munster, Munster, Germany<br />
9:00 – 9:15 am Whole-genome Sequence Analysis of Pseudomonas<br />
aeruginosa in Acute Infection Reveals Widespread withinpopulation<br />
Diversity and Rapid Transmission within the Body<br />
Hattie Chung; Harvard Medical School, Boston, MA<br />
9:15 – 9:30 am Beyond the SNV: Integrating Multiple Data Types into<br />
Genomic Epidemiology<br />
Meredith Wright; J. Craig Venter Institute, La Jolla, CA<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
13
Scientific Program<br />
9:30 – 9:45 am Defining Clonality in Acinetobacter baumannii Using Whole<br />
Genome Sequencing of Outbreak Strains Associated with the<br />
Conflict in Iraq<br />
Erik Snesrud; Walter Reed Army Institute of Research, Silver<br />
Spring, MD<br />
9:45 – 10:00 am Direct from Sputum: Next Gen Analysis of Mycobacterium<br />
tuberculosis in Clinical Samples<br />
David Engelthaler; TGen North, Flagstaff, AZ<br />
10:00 – 10:30 am Coffee Break<br />
Blue Prefunction Room<br />
and Patio<br />
10:30 – 12:30 pm Session 10: Genomics for Virus Surveillance<br />
Blue Room<br />
Session Chair: Charles Y. Chiu<br />
10:30 – 11:00 am Genomic Surveillance of the Ebola Epidemic in Western<br />
Africa<br />
Shirlee Wohl; Harvard Univ., Cambridge, MA<br />
11:00 – 11:30 am Small Game Hunting<br />
W. Ian Lipkin; Columbia Univ., New York, NY<br />
11:30 – 11:45 am Development of an Efficient Next-generation Sequencing<br />
Platform for Charting the Evolution of Norovirus Strains<br />
Gabriel Parra; National Institutes of Health, Bethesda, MD<br />
11:45 – 12:00 pm Genome-wide Comparison of Cowpox Viruses Reveals a New<br />
Clade Related to Variola Virus<br />
Livia Schuenadel; Robert Koch Institute, Berlin, GERMANY<br />
12:00 – 12:15 pm Virome Analyses Among Children with Acute Respiratory<br />
Infection in China<br />
Wenjie Tan; National Institute for Viral Disease Control and<br />
Prevention, China CDC, Beijing, CHINA<br />
12:15 – 12:30 pm Using Wastewater to Monitor Viral Pathogens in the Slum City<br />
of Kibera<br />
Mathis Hjelmsø; Technical Univ. of Denmark, Kgs. Lyngby,<br />
DENMARK<br />
12:30 – 12:35 pm Concluding Remarks and Plans for ASMNGS 2017<br />
Blue Room<br />
Dag Harmsen<br />
14<br />
ASM Conferences
Oral Presentation <strong>Abstracts</strong><br />
n S1:3<br />
THREE MONTHS OF SURVEILLANCE OF S.<br />
TYPHIMURIUM AND S. 1,4,[5],12:I:- IN<br />
DENMARK BASED ON WHOLE-GENOME<br />
SEQUENCING AND MLVA TYPING<br />
M. Kjeldsen, P. Gymoese, M. Torpdahl;<br />
Statens Serum Institut, Copenhagen, DEN-<br />
MARK.<br />
Introduction: Salmonella enterica subsp. enterica<br />
Typhimurium (S. Typhimurium) and its<br />
monophasic variant 1,4,[5],12:i:- are zoonotic<br />
pathogens of significance in both humans and<br />
animals worldwide. In Europe, Salmonella<br />
cause the majority of food-borne outbreaks.<br />
Currently, several laboratories primarily use<br />
pulsed-field gel electrophoresis (PFGE) and<br />
Multiple-locus variable-number tandem repeat<br />
analysis (MLVA) for surveillance and outbreak<br />
investigations of Salmonella. Surveillance<br />
studies based on whole-genome sequences<br />
(WGS) shows good results and are promising<br />
alternatives to conventional methods. In this<br />
study, we evaluate SNP analysis in comparison<br />
to MLVA for surveillance of S. Typhimurium<br />
and S. 1,4,[5],12:i:-. Materials and Methods:<br />
We analyzed all S. Typhimurium and S.<br />
1,4,[5],12:i:- human clinical isolates from the<br />
Danish surveillance program from January to<br />
March 2015. This collection comprises of 40<br />
monophasic S. Typhimurium and 66 S. Typhimurium<br />
isolates, hereunder three outbreaks<br />
defined by MLVA-typing and epidemiological<br />
findings. The relatedness of the strains was<br />
examined by core genome SNP analysis, and<br />
results were compared with those of MLVA<br />
and Multi-locus sequence typing (MLST).<br />
Results: WGS analysis on the collection of<br />
106 strains resulted in close to 5900 SNPs.<br />
A clear correlation between SNP and MLST<br />
analysis was observed. S. Typhimurium ST36<br />
was separated by a deep branch from ST19<br />
and ST34. Isolates of ST34 mainly comprised<br />
monophasic variants and were separated by<br />
440 SNPs, indicating a close relationship<br />
within this group. In correspondence with the<br />
MLVA defined S. Typhimurium outbreaks,<br />
the SNP based tree revealed three clusters of<br />
closely related strains with a few SNP differences.<br />
In one of the outbreaks, MLVA included<br />
35 isolates while the SNP analysis added two<br />
potential outbreak isolates to this cluster. In the<br />
ST34 group, SNP analysis dispersed all MLVA<br />
clusters, including the outbreak cluster (of<br />
eight isolates) located within this group; SNP<br />
queried if one of the eight defined outbreak<br />
isolates should be included. Conclusion: Our<br />
results show that strains with identical MLVA<br />
profiles can be either unrelated or closely related<br />
based on SNP distance determined from<br />
WGS. Using WGS analysis for outbreak detection<br />
seems reliable and in addition, it provides<br />
a higher resolution of the strains relationships.<br />
At present, defining an outbreak solely on SNP<br />
differences is problematic, since the number<br />
of SNP differences allowed within a cluster<br />
have to be considered. This study highlights<br />
the challenges with both SNP and MLVA based<br />
cluster detection and emphasizes the importance<br />
of combining molecular methods with<br />
epidemiological data.<br />
n S2:2<br />
LONG READS SEQUENCING FOR BETTER<br />
SHORT READS SNP ANALYSIS<br />
D. Moine 1 , L. Baert 2 , C. Barretto 2 , C. Ngom-<br />
Bru 2 , M. Kasam 1 , C. Fournier 1 , L. Michot 2 , J.<br />
Gimonet 2 , C. Chilton 2 ;<br />
1<br />
Nestlé Institute of Health Sciences, Lausanne,<br />
SWITZERLAND, 2 Nestle Research Center,<br />
Lausanne, SWITZERLAND.<br />
Whole genome sequencing (WGS) is an<br />
emerging tool for foodborne pathogen characterization.<br />
It can help to identify and type<br />
the bacteria for investigative purposes (source<br />
attribution), factory ecology and trend analysis<br />
in the food industry. This novel approach is<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
15
Oral Presentation <strong>Abstracts</strong><br />
more precise and faster than former methods<br />
like ribotyping or 16S gene sequencing. The<br />
sequencing of the isolates (e.g. Salmonella,<br />
Listeria, Cronobacter) was performed using<br />
Illumina short read sequencing technology.<br />
Regarding bioinformatic analysis, the approach<br />
of single nucleotide polymorphism (SNP)<br />
called from a reference is a good way to obtain<br />
precise result. One key parameter when doing<br />
SNP analysis is to have a valuable reference<br />
genome that is closely related to the targeted<br />
strain. Often those complete reference genomes<br />
are not available. Only a few strains of<br />
Salmonella, Listeria or Cronobacter have complete<br />
publicly available genomes of high quality.<br />
To create the reference genomes, we used<br />
the Pacific Biosciences long reads sequencing<br />
technology followed by hierarchical genome<br />
assembly process (HGAP) for de novo genome<br />
assembly. The assemblies were then validated<br />
using a quality checking pipeline (mega blast<br />
and dot plot comparison).<br />
n S2:3<br />
SUBTRACTIVE-HYBRIDIZATION FOR<br />
ENRICHMENT OF NON-HOST NUCLEIC ACID<br />
FOR IMPROVEMENT OF SEQUENCE-BASED<br />
DETECTION OF PATHOGENS<br />
R. W. Barrette, F. R. Grau, M. T. McIntosh;<br />
Plum Island Animal Disease Center (USDA/<br />
APHIS), Greenport, NY.<br />
Next generation sequencing is a powerful tool<br />
for detection and characterization of pathogens.<br />
This technology is rapidly reducing the<br />
sequencing cost per base while simultaneously<br />
increasing the amount of sequence data<br />
produced. Next generation sequencing is<br />
particularly well suited to screening complex<br />
mixtures of nucleic acids, which can be exploited<br />
for pathogen identification. However,<br />
such nucleic acid mixtures are often biased to<br />
host-derived, rather than pathogen-derived,<br />
nucleic acid. This can greatly reduce the<br />
number of sequence reads that are relevant to<br />
infectious agents. By hybridization of random<br />
cDNA from samples to host RNA immobilized<br />
on paramagnetic beads, we have developed a<br />
subtracted-hybridization method that may be<br />
used to enrich microbial or viral cDNA from<br />
diagnostic specimens. To test this method,<br />
samples from cattle exhibiting clinical signs<br />
similar to foot-and-mouth disease (FMD) were<br />
cultured in primary lamb kidney cells (LK) and<br />
a monkey kidney cell line (Vero) for isolation<br />
of potential viruses. FMD virus was rapidly<br />
ruled out by conventional diagnostic methods<br />
including real-time RT-PCR of the original and<br />
cultured samples; however, electron microscopy<br />
of LK and Vero cell cultures revealed the<br />
presence of picornavirus-like particles. LK cultured<br />
material was subsequently subjected to<br />
random cDNA amplification, host subtraction<br />
and next-generation sequencing. Total cDNA<br />
was simultaneously sequenced to compare the<br />
effectiveness of the new method. Results from<br />
the host subtracted cDNA library resolved 74%<br />
of the genome of a novel isolate of Enterovirus<br />
Type 1A. This was compared to 34% virus<br />
genome coverage by direct NGS. Phylogenetic<br />
analysis revealed that only 84% nucleotide sequence<br />
identity was shown between the newly<br />
identified enterovirus and the next most closely<br />
related virus. This work illustrates the utility<br />
of next generation sequencing and host nucleic<br />
acid subtraction in sequence-based detection<br />
and characterization of new viruses.<br />
n S2:4<br />
THE SALMONELLA IN SILICO TYPING<br />
RESOURCE (SISTR): RAPID ANALYSIS OF<br />
SALMONELLA DRAFT GENOME SEQUENCE<br />
DATA<br />
C. Yoshida 1 , P. Kruczkiewicz 2 , J. H. Nash 1 , E.<br />
N. Taboada 2 ;<br />
1<br />
Public Health Agency of Canda, Guelph, ON,<br />
CANADA, 2 Public Health Agency of Canda,<br />
Lethbridge, AB, CANADA.<br />
Salmonella is an important food safety and<br />
public health concern worldwide. Serotyping<br />
has historically been essential in human<br />
disease surveillance and outbreak prevention<br />
and control. Although laboratories around the<br />
16<br />
ASM Conferences
Oral Presentation <strong>Abstracts</strong><br />
world rely on serotyping as a primary means<br />
of isolate characterization, there is significant<br />
consensus that current means of Salmonella<br />
subtyping often lack the specificity and discriminatory<br />
power required in the context of<br />
epidemiologic investigations. Whole-genome<br />
sequencing (WGS) is increasingly being adopted<br />
as a front-line laboratory tool, promising to<br />
revolutionize our ability to perform advanced<br />
pathogen characterization in support of enhanced<br />
public health surveillance and epidemiologic<br />
investigations. In an effort to overcome<br />
the significant barrier that exists for laboratories<br />
without the necessary bioinformatics infrastructure<br />
we have developed the Salmonella In<br />
Silico Typing Resource (SISTR), an open webaccessible<br />
bioinformatics platform for rapidly<br />
analyzing draft Salmonella genome sequence<br />
data. In addition to serovar prediction by<br />
genoserotyping, Multi-Locus Sequence Typing<br />
(MLST) and ribosomal MLST (rMLST), the<br />
SISTR platform incorporates a novel core genome<br />
MLST (cgMLST) scheme that we have<br />
developed, the first such scheme described for<br />
Salmonella. The platform incorporates metadata-driven<br />
visualizations to examine the phylogenetic,<br />
geospatial and temporal distribution of<br />
genome-sequenced isolates. The resource also<br />
incorporates a database comprising over 4,000<br />
publicly available genomes, allowing users to<br />
place their isolates in a broader phylogenetic<br />
and epidemiological context. We have used a<br />
dataset comprised of 4,188 finished genomes<br />
and WGS draft assemblies to examine correlation<br />
between an isolate’s cgMLST cluster<br />
and its serovar and show how phylogenetic<br />
context from cgMLST analysis can supplement<br />
the genoserotyping analysis to increase<br />
the accuracy of in silico serovar prediction<br />
to over 94.6%. As sequencing of Salmonella<br />
isolates at public health laboratories becomes<br />
increasingly common, rapid in silico analysis<br />
of minimally processed draft genome assemblies<br />
provides a powerful replacement for<br />
current methods of isolate characterization.<br />
The SISTR platform can be used to generate<br />
the requisite serovar information and subtyping<br />
data currently used by epidemiologists and<br />
public health practitioners, while also providing<br />
some of the genome-based analyses that<br />
are the primary motivation for a move towards<br />
WGS. This type of integrated analysis allows<br />
for continuity with historical serotyping and<br />
subtyping data as we transition towards the<br />
increasing adoption of genomic epidemiology<br />
in public health. The SISTR web-application is<br />
accessible online at https://lfz.corefacility.ca/<br />
sistr-app/.<br />
n S2:5<br />
REVOLUTIONISING PUBLIC HEALTH<br />
REFERENCE MICROBIOLOGY USING WHOLE<br />
GENOME SEQUENCING: A CASE STUDY<br />
WITH SALMONELLA<br />
P. Ashton, S. Nair, T. Peters, A. Waldram, E. de<br />
Pinna, R. Elson, K. Grant, T. Dallman;<br />
Public Health England, London, UNITED<br />
KINGDOM.<br />
Background: Salmonella is a major human<br />
pathogen and a global public health issue.<br />
Currently, presumptive Salmonella isolates<br />
received by Public Health England (PHE) are<br />
typed by slide agglutination against specific<br />
anti-sera to determine their serotype based<br />
on their lipopolysaccharide and flagella. The<br />
microbiological typing data generated in<br />
the laboratory is fed into the public health<br />
workflow at PHE that involves the use of a<br />
sophisticated statistical algorithm for detection<br />
of pathogen outbreaks. A more discriminatory<br />
typing method would lead to more accurate<br />
and focussed public health investigation. Here,<br />
we address the question of whether whole<br />
genome sequencing (WGS) can improve on<br />
results obtained via this current workflow.<br />
Methods: Public Health England have adopted<br />
WGS for the routine identification and<br />
epidemiological investigation of Salmonella.<br />
The sequence type (ST) of each isolate is determined<br />
via a reference mapping approach.<br />
For the more common STs, the whole genome<br />
is analysed for Single Nucleotide Polymorphisms<br />
that are used to construct phylogenetic<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
17
Oral Presentation <strong>Abstracts</strong><br />
trees, giving the highest possible resolution.<br />
To date, more than 11000 isolates have been<br />
analysed in this fashion. Findings: There are<br />
three primary results to report. Firstly, as there<br />
is a link between underlying genetics and serotype,<br />
the serotype can be determined from<br />
the sequence type with 98.5% concordance.<br />
This allows for backwards compatibility with<br />
existing outbreak detection methods. Secondly,<br />
clusters of genetically related cases can be<br />
identified; between 01/04/2014 and 11/08/2014<br />
there were seven clusters of more than 10 Salmonella<br />
Enteritidis isolates within a 10 SNP<br />
cluster. Only one of these was identified as an<br />
outbreak by traditional means. These clusters<br />
were retrospectively investigated for common<br />
exposures. Finally, a WGS analysis of clusters<br />
identified based on traditional typing will be<br />
presented. Interpretation: Although WGS is<br />
being embraced and adopted by PHE, it is not<br />
being treated as a panacea for public health<br />
microbiology, but rather as a paradigm shift in<br />
microbiological typing which has the potential<br />
to transform, rather than replace, traditional<br />
epidemiological investigation.<br />
n S3:3<br />
BENCHMARK DATASETS FOR VALIDATING<br />
FOODBORNE OUTBREAK INVESTIGATIONS:<br />
INTEGRATING WGS AND PHYLOGENOMIC<br />
ANALYSES<br />
R. E. Timme 1 , H. Rand 1 , E. Trees 2 , R. Agarwala<br />
3 , S. David 1 , M. Shumway 3 , M. Simmons 4 ,<br />
G. Tillman 4 , P. Bronstein 4 , H. Carleton 2 , S.<br />
Defibaugh-Chavez 4 , W. Klimke 3 , L. S. Katz 2 ;<br />
1<br />
US Food and Drug Administration, College<br />
Park, MD, 2 Center for Disease Control, Decatur,<br />
GA, 3 National Center for Biotechnology<br />
Information, Bethesda, MD, 4 USDA-FSIS,<br />
Athens, GA.<br />
Background: As US regulatory agencies begin<br />
to rely on whole genome sequencing (WGS)<br />
data and analyses for foodborne pathogen<br />
surveillance, and public health applications for<br />
WGS expand, a critical need has emerged for<br />
benchmark datasets of well-curated outbreaks.<br />
Such datasets can serve as validation standards<br />
for training purposes, ensuring analytical consistency<br />
among participating labs, and allowing<br />
researchers to compare parameter choices<br />
and understand the inherent biases of different<br />
methods and workflows. Having analytical<br />
consistency and the flexibility to use a variety<br />
of methods will give epidemiologists and regulators<br />
the best possible support for their efforts.<br />
What we have done: We identified one retrospective<br />
outbreak dataset for each of the most<br />
common foodborne pathogens (Salmonella,<br />
Listeria, and E.coli) that met criteria for wellvetted<br />
epidemiological data, sequence quality,<br />
and completeness of metadata. Chosen outbreaks<br />
represented the size (8-30 isolates) and<br />
degrees of sequence divergence seen in recent<br />
years. The isolates from which these genomes<br />
came are vouchered in culture collections at<br />
the FDA and CDC, allowing further validation<br />
of identified variants. Our standard format<br />
provides BioSample, SRA, and GenBank<br />
accession numbers, a classification of the isolates<br />
as in or out of the outbreak, a suggested<br />
reference genome, and a phylogenetic tree in<br />
Newick format. To ensure that labs can use the<br />
same materials for validating current or future<br />
analysis tools, we created detailed instructions<br />
to exactly replicate these datasets. Results: Using<br />
our detailed instructions, we downloaded<br />
each standardized dataset on a wide variety<br />
of Linux platforms (CentOS6, Ubuntu 14.4,<br />
Mac OS, etc) and verified that these downloads<br />
were both correct (using sha256sum values)<br />
and compatible with many current analysis<br />
methods for SNP finding and reconstructing<br />
phylogenies (CFSAN SNP Pipeline, Lyve-<br />
SET, BioNumerics, wgMLST, Wombac, kSNP,<br />
and SNVPhyl). We then created a strategy<br />
for comparing the results of different analysis<br />
methods. Conclusion: Government, university,<br />
and industry labs now have a shared baseline<br />
against which we can assess new methods,<br />
enabling researchers to work together more<br />
effectively and with greater confidence in each<br />
other’s results. The process of building this<br />
first benchmark dataset has helped four US<br />
18<br />
ASM Conferences
Oral Presentation <strong>Abstracts</strong><br />
government agencies (CDC, FDA, USDA,<br />
NCBI) work together and refine their methodologies<br />
together. Additional curated datasets<br />
can and should be incorporated into this schema:<br />
phylogenetic data are context-dependent:<br />
as more sequence data accumulate, we will<br />
need to revise our analyses. In this way, our<br />
work can ensure a sound footing for WGSbased<br />
regulatory decision-making.<br />
n S3:4<br />
INTEGRATING CORE GENOME<br />
PHYLOGENETIC RELATIONSHIPS AND<br />
ISOLATE GEOGRAPHIC DATA TO TRACE THE<br />
2012 NEISSERIA MENINGITIDIS OUTBREAK<br />
IN NEW YORK CITY<br />
M. R. Galac 1 , I. Ezeoke 2 , M. D. Krepps 3 , Y.<br />
Lin 2 , J. Kornblum 2 , H. S. Gibbons 3 , D. Weiss 2 ,<br />
D. A. Janies 1 ;<br />
1<br />
University of North Carolina at Charlotte,<br />
Charlotte, NC, 2 New York City Department<br />
of Health and Mental Hygiene, Queens, NY,<br />
3<br />
Edgewood Chemical and Biological Center,<br />
Aberdeen, MD.<br />
Between 2012 and 2014, a serogroup C Neisseria<br />
meningitidis (Nm) outbreak was detected<br />
among men who have sex with men (MSM)<br />
in New York City (NYC). To determine if<br />
geographic paths of isolates from this outbreak<br />
were distinct, we compared them to previous<br />
and concurrent isolates of Nm from NYC. We<br />
sequenced 102 isolates (79 serogroup C) covering<br />
an 11 year period (2003-2013). Bacterial<br />
whole genomes were sequenced with Illumina<br />
at ~250x coverage, assembled de novo, and<br />
annotated by xBASE. OrthoMCL identified<br />
orthologous clusters in our 102 genomes plus<br />
14 complete Nm genomes from GenBank.<br />
Sequences from clusters containing a gene<br />
from every isolate were individually aligned<br />
with MAFFT and concatenated to produce<br />
one core genome. Based on this alignment, we<br />
employed RAxML to infer a phylogenetic tree<br />
using maximum likelihood optimality criterion<br />
which was then integrated with geocoded patient<br />
addresses. We then focused on 73 NYC<br />
genomes with geographic data that were not<br />
multiple samples from the same patient. To<br />
build a network of transmission events, we<br />
used PAUP* to calculate the ancestor descendent<br />
changes in geographic location on the tree<br />
and to count the changes and directionality of<br />
isolation location at ancestral nodes. This created<br />
a network of bacterial movement. At the<br />
NYC borough resolution, Brooklyn is integral<br />
to the movement of all Nm city-wide over the<br />
11 years examined. We then subdivided the<br />
boroughs into neighborhoods. We found that<br />
almost all transmission events were unique<br />
movements between neighborhoods and did<br />
not represent repeated or reciprocal movement<br />
of the bacteria from one neighborhood to another.<br />
The most closely related non-outbreak<br />
isolate to the monophyletic MSM outbreak<br />
isolates originated in Brooklyn-neighborhood-<br />
A then spread to Manhattan-neighborhood-B.<br />
From Manhattan-neighborhood-B, the infection<br />
moved to different neighborhoods in Manhattan,<br />
Brooklyn and the Bronx. This included<br />
Manhattan-neighborhood-C from which 6<br />
isolates with nearly identical core genomes<br />
radiated outward to additional neighborhoods<br />
in 4 boroughs. GeoGenes was used which assigns<br />
higher network betweenness to locations<br />
that are repeatedly the shortest path between<br />
all possible pairs of locations. We found that<br />
Manhattan-neighborhood-B and C were hubs<br />
for the spread of the MSM outbreak. In summary,<br />
whole genome sequencing, phylogenetic,<br />
and betweenness analyses allowed us<br />
to determine that the outbreak was the result<br />
of a single highly similar group of Nm and<br />
enabled us to track the spread of the bacteria<br />
throughout NYC. These analyses could be<br />
an integral component of the public health<br />
response against meningococcal transmission.<br />
The results can further elucidate the relation<br />
between cases and the phylogeographic related<br />
networks, allowing for more targeted and efficient<br />
interventions.<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
19
Oral Presentation <strong>Abstracts</strong><br />
n S3:5<br />
WHOLE GENOME SEQUENCING PROVIDES<br />
RAPID TRACEBACK OF CLINICAL TO<br />
FOOD SOURCES DURING A FOODBORNE<br />
OUTBREAK OF SALMONELLOSIS<br />
M. Hoffmann 1 , Y. Luo 1 , S. R. Monday 1 , T.<br />
Muruvanda 1 , D. Janies 2 , I. Senturk 3 , U. V.<br />
Catalyurek 3 , W. J. Wolfgang 4 , R. Myers 5 , P. S.<br />
Evans 1 , J. Meng 6 , M. W. Allard 1 , E. W. Brown 1 ;<br />
1<br />
US Food and Drug Administration, College<br />
Park, MD, 2 University of North Carolina,<br />
Charlotte, NC, 3 Ohio State University, Columbus,<br />
OH, 4 New York State Department of<br />
Health, Albany, NY, 5 Department of Health and<br />
Mental Hygiene, Baltimore, MD, 6 University of<br />
Maryland, College Park, MD.<br />
Salmonella serovar Bareilly is responsible for<br />
numerous outbreaks among humans. In 2012<br />
a widespread foodborne outbreak associated<br />
with scraped tuna imported from India occurred<br />
in the United States. A comparative<br />
genomic analysis within the serovar was<br />
performed to explore, on a global scale, how<br />
effectively whole-genome sequencing (WGS)<br />
can differentiate outbreak isolates of S. Bareilly<br />
from non-outbreak isolates sharing the<br />
same Xbal PFGE pattern. We sequenced, on<br />
different platforms, 100 S. Bareilly isolates<br />
including 41 isolates from the 2012 outbreak.<br />
A single isolate was sequenced on the Pacific<br />
Biosciences RS II Sequencer to determine the<br />
first complete genome sequence of S. Bareilly<br />
that served as the reference genome. Subsequent<br />
raw reads were mapped to this reference<br />
genome to build a single-nucleotide polymorphism<br />
(SNP) matrix and construct a phylogenetic<br />
tree. Pathogen genomes were linked<br />
with geography by projecting the phylogeny<br />
on a virtual globe. Using the phylogenetic tree<br />
and the pathogen metadata a transmission network<br />
was reconstructed for S. Bareilly. Using<br />
SNP analyses, we were able to distinguish<br />
and separate highly clonal S. Bareilly strains<br />
sharing the same XbaI PFGE pattern. The isolates<br />
from the recent 2012 outbreak clustered<br />
together, sharing only a few SNPs differences<br />
between them. Our results revealed a common<br />
origin for the outbreak strains, indicating that<br />
the patients in the U.S. were infected from<br />
sources originating at the India facility. In addition,<br />
we identified a unique arsenic resistance<br />
operon carried by many of these strains. Our<br />
data strongly suggests that WGS, combined<br />
with geographic mapping and the novel use of<br />
transmission networks for genetic data, vastly<br />
improves the rapid source tracking and surveillance<br />
of a bacterial outbreak that are critically<br />
important for characterizing outbreak dynamics<br />
and, ultimately, protection of the public<br />
health.<br />
n S3:6<br />
TRANSFORMING PUBLIC HEALTH<br />
MICROBIOLOGY IN THE UNITED STATES<br />
WITH WHOLE GENOME SEQUENCING (WGS)<br />
- PULSENET AND BEYOND<br />
E. K. Trees, H. Carleton, P. Gerner-Smidt;<br />
CDC, Atlanta, GA.<br />
Background: A number of different methods<br />
and technologies are used in public health<br />
laboratories to characterize foodborne bacterial<br />
pathogens ranging from phenotypic tests<br />
e.g., growth characteristics, serotyping and<br />
antimicrobial susceptibility testing, to PCR<br />
and other molecular methods for detection<br />
of e.g., virulence genes, and for subtyping in<br />
outbreak investigations e.g., pulsed-field gel<br />
electrophoresis (PFGE). This traditional strain<br />
characterization is labor and resource intensive<br />
and has a turn-around-time (TAT) of up to<br />
several months. Most of this information may<br />
be extracted from the genome sequence of any<br />
organism. With the introduction of next generation<br />
sequencing technologies and advances in<br />
bioinformatics it is now possible to characterize<br />
foodborne pathogens in a single workflow<br />
with a TAT of ≤ four working days in a costefficient<br />
manner. Methods: The Enteric Diseases<br />
Laboratory Branch at CDC is working<br />
with partners in federal agencies, public health<br />
20<br />
ASM Conferences
Oral Presentation <strong>Abstracts</strong><br />
laboratories in the states and internationally to<br />
build applications to extract information about<br />
genus, species, lineage, serotype, pathotype,<br />
virulence profile and antimicrobial resistance<br />
markers and subtyping using gene by gene approaches<br />
(whole genome multi-locus sequence<br />
typing, wgMLST) from the genome sequences.<br />
The PulseNet foodborne disease surveillance<br />
network infrastructure and analytical platform<br />
(BioNumerics, Applied Maths) and database<br />
will be used because of its versatility and<br />
because the public health laboratories have<br />
worked with it for > 15 years. Additionally,<br />
the use of this analytic software does not require<br />
extensive bioinformatics skills or local<br />
high performance computing capacity. The<br />
method will also be validated and will replace<br />
conventional CLIA related reference identification<br />
activities in the branch and public health<br />
laboratories. Results: WGS has successfully<br />
been tested for the US national surveillance<br />
of listeriosis since September 2013 and the<br />
wgMLST analysis has been implemented in<br />
the daily routine. The technology has proven<br />
its power supplementing traditional methods in<br />
outbreak investigations by adding phylogenetic<br />
relevance and increased resolution compared<br />
with current methods thereby enhancing case<br />
definitions and source tracking. The wgMLST<br />
method is currently being piloted in-house<br />
for surveillance of Campylobacteraceae,<br />
Shiga toxin-producing E. coli, and Salmonella<br />
with the remainder of foodborne bacteria to<br />
be added over the next three years. A betatesting<br />
pilot with the PulseNet public health<br />
laboratory partners will start in summer 2015.<br />
Conclusion: WGS will revolutionize the surveillance<br />
of both sporadic and outbreak related<br />
foodborne infections by replacing and enhancing<br />
current laboratory methods in use in public<br />
health laboratories.<br />
n S4:4<br />
SALMONELLA SEROTYPE DETERMINATION<br />
UTILIZING HIGH-THROUGHPUT GENOME<br />
SEQUENCING DATA<br />
S. Zhang 1 , Y. Yin 2 , M. Jones 3 , Z. Zhang 4 , B.<br />
Deatherage Kaiser 5 , B. Dinsmore 6 , C. Fitzgerald<br />
6 , P. Fields 6 , X. Deng 1 ;<br />
1<br />
University of Georgia, Griffin, GA, 2 Illinois<br />
Institute of Technology, Chicago, IL, 3 J. Craig<br />
Venter Institute, Rockville, MD, 4 University of<br />
Michigan, Ann Arbor, MI, 5 Pacific Northwest<br />
National Laboratory, Richard, WA, 6 Centers<br />
for Disease Control and Prevention, Atlanta,<br />
GA.<br />
Serotyping forms the basis of national and<br />
international surveillance networks for Salmonella,<br />
one of the most prevalent foodborne<br />
pathogens worldwide. Public health microbiology<br />
is currently being transformed by whole<br />
genome sequencing (WGS) which opens the<br />
door to serotype determination using WGS<br />
data. SeqSero (www.denglab.info/SeqSero)<br />
is a novel web-based tool for determining<br />
Salmonella serotypes using high-throughput<br />
genome sequencing data. SeqSero is based<br />
on curated databases of Salmonella serotype<br />
determinants (rfb gene cluster, fliC and fljB<br />
alleles) and is predicted to determine serotype<br />
rapidly and accurately for nearly the full spectrum<br />
of Salmonella serotypes (more than 2,300<br />
serotypes), from both raw sequencing reads<br />
and genome assemblies. The performance of<br />
SeqSero was evaluated by testing: 1) raw reads<br />
from genomes of 308 Salmonella isolates of<br />
known serotype; 2) raw reads from genomes<br />
of 3,306 Salmonella isolates sequenced and<br />
made publicly available by GenomeTrakr, a<br />
U.S. national monitoring network operated<br />
by the Food and Drug Administration; and 3)<br />
354 other publicly available draft or complete<br />
Salmonella genomes. We also demonstrated<br />
Salmonella serotype determination from raw<br />
sequencing reads of fecal metagenomes from<br />
mice orally infected with this pathogen. Seq-<br />
Sero can help to maintain the well-established<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
21
Oral Presentation <strong>Abstracts</strong><br />
utility of Salmonella serotyping when integrated<br />
into a platform of WGS-based pathogen<br />
subtyping and characterization.<br />
n S4:5<br />
PATRIC PIPELINE<br />
F. Xia 1 , T. Brettin 2 , S. Boisvert 2 , N. R. Conrad 2 ,<br />
J. J. Davis 1 , T. Disz 1 , J. Edirisinghe 2 , R. A. Edwards<br />
3 , C. Henry 1 , R. W. Kenyon 4 , D. Machi 4 ,<br />
C. Mao 4 , G. J. Olsen 5 , R. Olson 2 , R. Overbeek 6 ,<br />
B. Parrello 6 , G. D. Pusch 6 , M. P. Shukla 2 , B. W.<br />
Sobral 4 , R. L. Stevens 1 , V. Vonstein 6 , A. Warren<br />
4 , R. Will 4 , H. Yoo 4 , A. R. Wattam 4 ;<br />
1<br />
University of Chicago, Chicago, IL, 2 Argonne<br />
National Laboratory, Lemont, IL, 3 San Diego<br />
State University, San Diego, CA, 4 Virginia<br />
Tech, Blacksburg, VA, 5 University of Illinois<br />
at Urbana and Champaign, Urbana, IL, 6 Fellowship<br />
for Interpretation of Genomes, Burr<br />
Ridge, IL.<br />
Recent advances in DNA sequencing technology<br />
accompanied by plummeting per-base cost<br />
is making sequence-based applications more<br />
amenable. While a plethora of bioinformatics<br />
databases and workflows exist, their capabilities<br />
are often hampered by the inconsistent<br />
use of analysis tools. PATRIC, the NIAIDfunded<br />
comprehensive bacterial bioinformatics<br />
resource, has integrated more than 30,000<br />
consistently annotated prokaryote genomes<br />
with a focus on human pathogenic species.<br />
Here we present PATRIC’s new computational<br />
services that support the assembly, annotation<br />
and metabolic modeling of user-supplied<br />
genomes in the same consistent fashion. These<br />
services, integrated with PATRIC’s collections<br />
of specialty genes such as antibiotic resistance<br />
determinants and virulence factors, will enable<br />
users to rapidly process newly sequenced<br />
pathogens and investigate key pathogenic<br />
determinants in foodborne outbreaks using the<br />
powerful visualization and comparative analysis<br />
tools in PATRIC. We have implemented<br />
the new services with three principles in mind.<br />
(1) Controlled vocabulary. At the heart of<br />
PATRIC’s annotation service is a controlled<br />
vocabulary for functional annotation derived<br />
from the curated subsystems and protein families<br />
in the RAST and SEED systems [2]. Similarly,<br />
the new model reconstruction service<br />
relies on our curated biochemistry data [5].<br />
These curation efforts ensure newly sequenced<br />
genomes can be automatically annotated and<br />
modeled and readily compared with existing<br />
reference data. (2) Modular design. In genome<br />
assembly as well as other bioinformatic analyses,<br />
there is often no single tool best suited<br />
for all occasions [4]. We have added support<br />
for more than 30 tools for error correction,<br />
contig assembly, scaffolding, contig evaluation,<br />
consensus building, gene calling, overlap<br />
removal, as well as many custom algorithms<br />
[3]. These modules are condensed into a few<br />
curated workflows to ensure convenient and<br />
efficient execution as well as consistent quality<br />
control. (3) Integrated analysis. The new<br />
workspace allows users to upload their own<br />
data for analysis, and upon completion the<br />
private results are immediately integrated into<br />
PATRIC. This enables users to take advantage<br />
of PATRIC’s data (drug targets, omics, AMR<br />
and other clinical metadata) and comparative<br />
tools (protein family sorter, phylogeny, heat<br />
maps, etc). In addition to these services, we<br />
are actively building support for batch analysis<br />
and SNP-level comparative analysis for closely<br />
related genomes. URL: https://www.patricbrc.<br />
org. References: [1] Gillespie, et al. “PATRIC:<br />
... (2011). [2] Overbeek, et al. “The SEED<br />
... (RAST).” Nucleic acids research (2014):<br />
D206-D214. [3] Brettin, et al. “RASTtk ...”<br />
Scientific reports 5 (2015). [4] Earl, et al. “Assemblathon<br />
...” Genome research (2011). [5]<br />
Henry, et al. “High-throughput ... models.”<br />
Nature biotechnology (2010).<br />
22<br />
ASM Conferences
Oral Presentation <strong>Abstracts</strong><br />
n S4:6<br />
CFSAN SNP PIPELINE: A WHOLE GENOME<br />
SEQUENCE DATA ANALYSIS PIPELINE FOR<br />
FOOD-BORNE PATHOGENS<br />
Y. Luo, J. Pettengill, J. Baugher, H. Rand, S.<br />
Davis;<br />
FDA/CFSAN, College Park, MD.<br />
In support of the analysis of whole genome sequence<br />
data (WGS) for closely related pathogens<br />
in food-borne outbreaks, the Center for<br />
Food Safety and Applied Nutrition (CFSAN)<br />
at the FDA has developed a reference-based<br />
software pipeline for high quality SNP identification<br />
and analysis. This software pipeline<br />
combines into a single package the mapping of<br />
WGS reads to a reference genome, processing<br />
of those mapping files, identification of variant<br />
sites, and production of a SNP matrix. Additional<br />
features include a summary table of the<br />
results, soft-links to minimize data storage, and<br />
the ability to switch between workstations and<br />
computer clusters with minimal effort. The CF-<br />
SAN SNP Pipeline is currently used in production<br />
mode to analyze WGS data from isolates<br />
related to food-borne illnesses. The pipeline is<br />
used when outbreak investigations are ongoing<br />
to link samples and to provide information for<br />
decision-makers. It is also used retrospectively<br />
to aid in the analysis of closed outbreaks. The<br />
CFSAN SNP Pipeline is reference-based,<br />
and so a reference must be provided. Isolate<br />
sequence data must be in fastq format but can<br />
either be paired-end or single-read data. All<br />
analysis steps are run automatically, and only<br />
depend on the proper organization of the input<br />
files and identification of a suitable reference.<br />
Additionally, each of the analysis steps can be<br />
run using individual shell scripts. The addition<br />
of new samples is very straightforward, and result<br />
files from previous portions of the analysis<br />
that do not need to be regenerated are reused.<br />
This greatly reduces the computational time<br />
when adding new samples as the mapping and<br />
pileup steps are not redone. The pipeline will<br />
run without problems on current workstations,<br />
and will run on high performance computing<br />
clusters with either Torque or Grid Engine job<br />
schedulers. The CFSAN SNP Pipeline is written<br />
in a combination of Bash and Python. The<br />
code is designed to run on Linux platforms<br />
with bash and python. BioPython must be<br />
installed in tandem with three executable software<br />
dependencies, Bowtie2, SAMtools, and<br />
VarScan. Substantial effort has been devoted to<br />
making the software robust, well-documented,<br />
and easy to use. The following links provide<br />
for access to the source code, the documentation,<br />
and the Python package. Also provided is<br />
the current publication reference. Source code:<br />
https://github.com/CFSAN-Biostatistics/snppipeline.<br />
Documentation: http://snp-pipeline.<br />
rtfd.org. PyPI package: https://pypi.python.<br />
org/pypi/snp-pipeline. Reference publication:<br />
Pettengill JB, Luo Y, Davis S, Chen Y, Gonzalez-Escalona<br />
N, Ottesen A, Rand H, Allard<br />
MW, Strain E An evaluation of alternative<br />
methods for constructing phylogenies from<br />
whole genome sequence data: A case study<br />
with Salmonella.<br />
n S4:7<br />
ASSEMBLING WHOLE GENOMES FROM<br />
MIXED MICROBIAL COMMUNITIES USING<br />
HI-C<br />
I. Liachko 1 , J. N. Burton 1 , L. Sycuro 2 , A. H.<br />
Wiser 2 , D. N. Fredricks 2 , M. J. Dunham 1 , J.<br />
Shendure 1 ;<br />
1<br />
University of Washington, Seattle, WA, 2 Fred<br />
Hutchinson Cancer Research Center, Seattle,<br />
WA.<br />
Assembly of whole genomes from next-generation<br />
sequencing is inhibited by the lack of<br />
contiguity information in short-read sequencing.<br />
This limitation also impedes metagenome<br />
assembly, since one cannot tell which sequences<br />
originate from the same species within<br />
a population. We have overcome these bottlenecks<br />
by adapting a chromosome conformation<br />
capture technique (Hi-C) for the deconvolution<br />
of metagenomes and the scaffolding of de novo<br />
assemblies of individual genomes. In modeling<br />
the 3D structure of a genome, chromosome<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
23
Oral Presentation <strong>Abstracts</strong><br />
conformation capture techniques such as Hi-C<br />
are used to measure long-range interactions of<br />
DNA molecules in physical space. These tools<br />
employ crosslinking of chromatin in intact<br />
cells followed by intra-molecular ligation,<br />
joining DNA fragments that were physically<br />
nearby at the time of crosslink. Subsequent<br />
deep sequencing of these DNA junctions generates<br />
a genome-wide contact probability map<br />
that allows the 3D modeling of genomic conformation<br />
within a cell. The strong enrichment<br />
in Hi-C signal between genetically neighboring<br />
loci allows the scaffolding of entire chromosomes<br />
from fragmented draft assemblies.<br />
Hi-C signal also preserves the cellular origin<br />
of each DNA fragment and its interacting partner,<br />
allowing for deconvolution and assembly<br />
of multi-chromosome genomes from a mixed<br />
population of organisms. We have used Hi-C<br />
to scaffold whole genomes of animals, plants,<br />
fungi, as well as prokaryotes and archaea. We<br />
have also been able to use this data to annotate<br />
functional features of microbial genomes, such<br />
as centromeres in many fungal species. Additionally,<br />
we have applied our technology to<br />
diverse metagenomic populations such as craft<br />
beer, bacterial vaginosis infections, soil, and<br />
tree endophyte samples to discover and assemble<br />
the genomes of novel strains of known<br />
species as well as novel prokaryotes and<br />
eukaryotes. The high quality of Hi-C-based<br />
assemblies allows the simultaneous closing of<br />
numerous unculturable genomes, placement of<br />
plasmids within host genomes, and microbial<br />
strain deconvolution in a way not possible<br />
with other methods. Reference: Burton JN*,<br />
Liachko I*, Dunham MJ, Shendure J. Specieslevel<br />
deconvolution of metagenome assemblies<br />
with Hi-C-based contact probability maps. G3.<br />
2014, May 22;4(7):1339-46.<br />
n S6:3<br />
NEXT GENERATION SEQUENCING OF<br />
BRUCELLA MELITENSIS ISOLATES FROM<br />
KUWAIT AND COMPARATIVE GENOME<br />
ANALYSES<br />
A. S. Mustafa, F. Shaheed, N. Habibi, M. W.<br />
Khan;<br />
Kuwait University, Jabriya, KUWAIT.<br />
Human brucellosis is a zoonotic disease of<br />
worldwide occurrence. In Kuwait, almost all<br />
cases are caused by Brucella melitensis. Three<br />
different strains (biovars) of B. melitensis have<br />
been identified using classical techniques but<br />
the presence of very limited variation across<br />
strains makes it difficult to identify a particular<br />
strain using the classical molecular methods,<br />
e.g. PCR and Sanger sequencing. The aim of<br />
this study was to exploit the potential of next<br />
generation sequencing to identify the strain(s)<br />
of B. melitensis present in Kuwait, and also<br />
to find the extent of differences among the<br />
isolates of the same strain by comparative<br />
genome analyses. B. melitensis were isolated<br />
from 15 patients suspected of human brucellosis.<br />
The bacterial colonies from culture<br />
plates were suspended in saline and heated at<br />
95°C for 10 minutes. DNA released from the<br />
bacterial cells were purified using the QIAamp<br />
DNA Mini Kit (Qiagen). The isolated DNA<br />
were quantitated and checked for purity using<br />
a spectrophotometer (Epoch) and a fluorometer<br />
(Qubit), respectively. DNA libraries were<br />
prepared using the Nextera XT DNA Sample<br />
Preparation Kit (Illumina) and sequenced using<br />
the next generation sequence platform of<br />
MiSeq (Illumina) using standard procedures.<br />
The obtained sequence files were aligned to<br />
the sequences of three known biovars of B.<br />
melitensis available in the NCBI data base,<br />
i.e. biovar 1 str. 16M, biovar 2 str. 63/9, and<br />
biovar 3 str. Ether. The alignment and variant<br />
calling were performed using ‘bwa mem’<br />
and SAMtools/VCFtools, respectively. The<br />
results showed that the genome size of all the<br />
isolates was around 3.3 mega base pairs, and<br />
24<br />
ASM Conferences
Oral Presentation <strong>Abstracts</strong><br />
all of them belonged to B. melitensis biovar<br />
2 str. 63/9. A neighbor-joining tree analysis<br />
identified one of the isolates as an outlier. Furthermore,<br />
variations (SNPs and indels) were<br />
spread all over the genome; but 138 SNPs<br />
were common among the 14 isolates, supporting<br />
the same ancestral origin. In addition,<br />
SNPs (2 - 478) unique to each isolate were<br />
also identified, which divided the B. melitensis<br />
biovar 2 into two major variant groups. In<br />
conclusion, this study suggest that biovar 2 is<br />
the most prevalent biovar of B. melitensis in<br />
Kuwait. Furthermore, at least two major variant<br />
groups exist within biovar 2. Supported<br />
by Kuwait University Research Sector grant<br />
SRUL02/13.<br />
n S6:4<br />
MICROBIAL GENOMIC TAXONOMY AT<br />
GENBANK<br />
S. Federhen;<br />
NCBI, Bethesda, MD.<br />
Incorrectly identified genomes at GenBank<br />
are a problem for users of the data. Some<br />
genomes are submitted with incorrect species<br />
identifications. Others were correctly identified<br />
when they were submitted but should now<br />
be updated based on a subsequent taxonomic<br />
publication, for example the description of a<br />
new species. GenBank has traditionally relied<br />
on the submitters to provide the correct<br />
taxonomic identifications for their sequence<br />
submissions. Two developments have combined<br />
to change this situation in the domain<br />
of microbial genomes. First, the curation of<br />
type material in the NCBI taxonomy database<br />
allows us to flag sequences from type in the<br />
nucleotide and genome domains of Entrez.<br />
Second, current sequencing technology makes<br />
it fast and easy to generate microbial genomes.<br />
It has been clear for some time that the current<br />
paradigm of species delimitation by 16S rRNA<br />
sequence and DNA-DNA hybridization (DDH)<br />
would eventually be replaced with a model<br />
based on whole genome analysis. We present<br />
a proposal to find and correct misidentified<br />
genomes based on average nucleotide identity<br />
(ANI) from type and proxytype. Sequences<br />
from type are reliably identified (by definition)<br />
once we have verified that they are free from<br />
contamination and are actually from the strain<br />
with which they are annotated. All other identifications<br />
are a matter of opinion, and will be<br />
subject to verification. We have genomes from<br />
type (both finished and WGS) for 4000 species,<br />
including 3500 bacteria. This represents<br />
25% of bacterial species with validly published<br />
names. The other 75% of bacterial species<br />
will generally have an assortment of short sequences<br />
from type in GenBank - at least a 16S<br />
sequence, but often more. These sequences are<br />
used to probe our existing genomes and predict<br />
where the genome from type will appear once<br />
we do get one. In many cases we can designate<br />
a proxy for the missing type from among<br />
the genomes that we do have - we call these<br />
‘proxytype’ genomes. Taken together, these<br />
genomes from type and proxytype represent a<br />
scaffold of reliably identified sequences that<br />
we can use in conjunction with some simple<br />
genome-wide comparison measures to validate<br />
the identifications in our other genomes.<br />
Once we have identified genomes that need<br />
taxonomic updates, we plan to correct the entries,<br />
add a structured comment detailing the<br />
evidence for the update, and notify the submitters<br />
of the change. This represents a significant<br />
change in policy for GenBank - a new genomic<br />
paradigm for validating taxonomic identifications,<br />
some new types of analysis, as well as<br />
a shift in the boundary for database-driven<br />
source feature updates. We convened a workshop<br />
to present the proposal, with representation<br />
from a broad spectrum of the bacterial<br />
taxonomic community (GenBank genomic<br />
taxonomy workshop, 12-13 May 2015). This<br />
group unanimously endorsed our genomic approach<br />
to validating taxonomic identifications<br />
in genomes at GenBank.<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
25
Oral Presentation <strong>Abstracts</strong><br />
n S7:2<br />
A BIOSURVEILLANCE ANALYSIS PIPELINE<br />
FOR GENOMIC SEQUENCE DATA<br />
C. Olsen 1 , K. Qaadri 1 , H. Shearman 2 , R. Moir 2 ,<br />
M. Kearse 2 ;<br />
1<br />
Biomatters, Inc., Newark, NJ, 2 Biomatters,<br />
Ltd., Auckland, NEW ZEALAND.<br />
Next-generation sequencing (NGS) approaches<br />
have numerous applications for biosurveillance<br />
programs and outbreak investigation.<br />
However, there are significant challenges for<br />
analyzing the data accurately without the aid<br />
of high performance compute resources in a<br />
timely fashion. Often times the users are not<br />
bioinformaticians who are comfortable running<br />
sequence analysis pipelines. Biomatters’<br />
Geneious R9 is a bioinformatics software<br />
platform that allows researchers the use of<br />
industry-leading algorithms for their genomic<br />
and protein sequence analyses. Geneious offers<br />
a comprehensive suite of functions, including<br />
a robust collection of peer-reviewed tools,<br />
that enable researchers to be more efficient<br />
with their sequence analysis workflows. The<br />
recent addition of the 16S Biodiversity tool<br />
and Sequence Classifier plugin provides tools<br />
which can be incorporated into an easy to use<br />
pathogen identification workflow. The 16S<br />
Biodiversity tool identifies high-throughput<br />
16S rRNA amplicons from environmental<br />
samples using the RDP database, and visualizes<br />
biodiversity as an interactive chart using<br />
a secure web viewer. The Sequence Classifier<br />
plugin taxonomically classifies an organic<br />
sample by how similar its DNA is to your<br />
own database of known sequences using a<br />
BLAST-like algorithm with multiple loci and<br />
trees to assist with identification. By utilizing<br />
Geneious R9, biologists can easily streamline<br />
their sequence analysis workflows for mixed<br />
sample analysis. This demonstration is for a<br />
sequence analysis pipeline for identification of<br />
bacterial pathogens from mixed metagenomic<br />
data generated from outbreaks. The pipeline<br />
can also be extended to include eukaryotic and<br />
fungal pathogens.<br />
n S7:3<br />
A UNIVERSAL WHOLE GENOME<br />
SEQUENCING APPROACH USING WHOLE<br />
GENOME MLST AND WHOLE GENOME SNP<br />
ANALYSIS IN THE CLOUD<br />
H. Pouseele, K. De Bruyne, B. Pot, K. Janssens;<br />
Applied Maths NV, Sint-Martens-Latem, BEL-<br />
GIUM.<br />
Introduction: While whole genome sequencing<br />
(WGS) is becoming very attractive for<br />
research as well as for routine analyses, current<br />
challenge is to extract whole genome typing<br />
information relevant for e.g. outbreak surveillance<br />
without the need for cluster facilities or<br />
highly specialized staff. Methodology: Here<br />
we present a cloud-based, high throughput<br />
(HT) environment for WGS data processing.<br />
Raw sequences are processed with standardized<br />
and validated pipelines to produce WG<br />
multi-locus sequence typing (wgMLST) and<br />
WG single nucleotide polymorphism (wg-<br />
SNP) results. Whereas wgMLST is typically<br />
used to identify potential outbreak clusters,<br />
wgSNP analysis is considered useful for final<br />
subtyping. The wgMLST pipeline uses WGS<br />
data (assembled or not) to perform MLST on<br />
a genome-wide scale. For each sample, locus<br />
presence is analyzed and allelic variants are<br />
determined. Per locus, new sequences are<br />
submitted, curated and assigned new allele<br />
numbers. WgMLST provides the possibility to<br />
obtain traditional typing results such as MLST,<br />
rMLST, etc, yielding full compatibility with<br />
former typing schemes, without additional<br />
cost or time delay. A ‘pan-genome’ wgMLST<br />
scheme has the advantage over the core genome<br />
of maximizing resolution and leads to<br />
stability, as adding new samples does not influence<br />
existing information. The core schema is<br />
implemented as a subschema of the pan-genome<br />
and was shown to have a high epidemiological<br />
relevance and stability, necessary for<br />
long-term surveillance and outbreak investigation.<br />
For the wgSNP approach, a dual reference-based<br />
approach is used: reads are mapped<br />
26<br />
ASM Conferences
Oral Presentation <strong>Abstracts</strong><br />
on organism- and outbreak-specific references,<br />
yielding maximum resolution for detecting<br />
outbreak isolates. Once allele assignments and/<br />
or SNPs are calculated, traditional BioNumerics®<br />
analysis tools are used for phylogenetic,<br />
statistical and comparative follow-up analyses.<br />
Access to the Amazon cloud is integrated in<br />
BioNumerics®, allowing easy point-and-click<br />
analysis. Confidential metadata remain at all<br />
times at the local BioNumerics® database,<br />
while anonymous WGS data are processed<br />
in the cloud. Results: We will evaluate the<br />
wgMLST and wgSNP approaches for L. monocytogenes<br />
and Salmonella using the Amazon<br />
cloud solution. wgMLST combined with wg-<br />
SNP analysis identified reliably strains belonging<br />
to documented outbreaks. Public NCBI<br />
SRA data will provide additional sequences<br />
that add context to more precisely understand<br />
and position the outbreak. Conclusion: NGS<br />
combined with automated data analysis and<br />
interpretation tools holds great promise for<br />
rapid, accurate and comprehensive identification<br />
of outbreaks. BioNumerics® 7.6* together<br />
with its scalable HT calculation environment,<br />
offers a powerful, user-friendly, easy-access<br />
environment where both wgMLST and wgSNP<br />
analyses can be performed and interpreted,<br />
lowering the thresholds for the use of WGS in<br />
routine applications.<br />
n S7:4<br />
TYPING AND EPIDEMIOLOGICAL<br />
CLUSTERING OF COMMON PATHOGENS<br />
BASED ON WHOLE GENOME NGS DATA<br />
A. Materna, K. Einer-Jensen, P. Liboriussen,<br />
J. Johansen, L. Schauser, A. C. Materna;<br />
QIAGEN, Aarhus, DENMARK.<br />
Next generation sequencing (NGS) data from<br />
whole pathogen genomes is frequently used for<br />
enhanced surveillance and outbreak detection<br />
of common pathogens. Version 1.5 of the CLC<br />
Microbial Genomics Module for CLC Genomics<br />
Workbench and CLC Genomics Server<br />
introduces new functionality for molecular<br />
typing and epidemiological analysis of bacterial<br />
isolates. The latest update to the module<br />
enables the user to perform stepwise (tool-bytool)<br />
analysis or to take advantage of included<br />
multistep workflows. With a few clicks workflows<br />
can be optimized for routine analysis of<br />
a specific pathogen or outbreak. New features<br />
include for instance streamlined tools for NGSbased<br />
Multilocus Sequence Typing (MLST),<br />
resistance typing, as well as detection of genus<br />
and species information. New tools for phylogenetic<br />
tree reconstruction generate trees based<br />
on single nucleotide polymorphisms (SNPs) or<br />
infer K-mer trees from NGS reads or genomes.<br />
A new table format, acting as a database, collects<br />
typing results and associates these results<br />
with metadata such as sample information,<br />
geographic origin, treatment outcome, etc. Results<br />
generated using e.g. MLST and resistance<br />
typing can furthermore be associated with the<br />
original sample metadata. Users can filter for<br />
results and metadata to find and select relevant<br />
subsets of samples for downstream analysis.<br />
Results and metadata available during tree<br />
generation can further be used to explore phylogeny<br />
in the context of this epidemiologically<br />
relevant information. Version 1.5 of the CLC<br />
Microbial Genomics Module aims to facilitate<br />
tasks commonly carried out during outbreak<br />
investigation, such as typing or source tracking<br />
based on whole genome data. Preconfigured<br />
workflows and simple deployment via integration<br />
into the widely used CLC Genomics<br />
Workbench and CLC Genomics Server ecosystem<br />
offer a user-friendly platform for scientists<br />
engaged in outbreak prevention and control.<br />
n S7:5<br />
WGSA.NET: WHOLE GENOME SEQUENCE<br />
ANALYSIS<br />
D. M. Aanensen 1 , S. Argimon 2 , C. A. Yeats 1 , A.<br />
Fedosejev 1 , C. Glasner 2 , R. Goater 2 , D. Garcia<br />
2 , J. NT 2 ;<br />
1<br />
Imperial College London, London, UNITED<br />
KINGDOM, 2 Centre for Genomic Pathogen<br />
Surveillance, Wellcome Genome Campus,<br />
Cambridgeshire, UNITED KINGDOM.<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
27
Oral Presentation <strong>Abstracts</strong><br />
WGSA.net [1] provides an intuitive interface<br />
for the uploading, processing, clustering and<br />
visualization of microbial genomic assemblies.<br />
Uploaded data are run through a number of<br />
analysis modules, including MLST, detection<br />
of genes and variants for identifying<br />
potential antibiotic resistance and virulence,<br />
and profiling of gene family membership for<br />
the production of clustering and identification<br />
of strict core and non-core genes. Metadata<br />
included during upload includes required (eg<br />
location and date) and non-required datatypes.<br />
Once processed, assemblies are presented<br />
to users in, firstly, a population context and<br />
then secondly, through selection of specific<br />
sub-parts of a population, relatedness to other<br />
very closely related genomes allowing further<br />
investigation by a user (eg outbreak detection).<br />
All results are available to download allowing<br />
further investigation and utility. An intuitive<br />
user interface presenting inferred clustering<br />
(using PhyloCanvas), geographic location (using<br />
Google maps) and data tables, allows the<br />
overlay on clustering of both user metadata<br />
and also results of anlaysis modules, based on<br />
the visualization tool microreact.org [2]. Wgsa.<br />
net is available via the web and runs in any<br />
modern browser. [1] http://www.wgsa.net [2]<br />
http://microreact.org<br />
n S7:6<br />
SEQSPHERE+ SOFTWARE FOR<br />
PROSPECTIVE BACTERIAL GENOMIC<br />
SURVEILLANCE AND RESISTOME OR<br />
VIRULOME ANALYSIS<br />
J. Rothgänger;<br />
Ridom GmbH, Münster, GERMANY.<br />
SeqSphere+ was introduced in the year 2013<br />
(Nat Biotechnol. 31: 294, 2013) and supports<br />
genome-wide allele and single nucleotide<br />
variant (SNV) calling from whole genome<br />
sequence (WGS) data either on core genome<br />
and/or accessory genome level. However,<br />
the recommended (initial) analysis is a core<br />
genome MLST (cgMLST) allele typing as a<br />
global and uniform nomenclature service is<br />
maintained to ensure for a ‘molecular typing<br />
Esperanto’. A number of cgMLST schemes using<br />
the software have been published recently;<br />
e.g., for M. tuberculosis (JCM 52: 2479, 2014)<br />
or L. monocytogenes (JCM 54: Jul 1. pii:<br />
JCM.01193-15, 2015 [ahead of print]). These<br />
schemes are available for download within<br />
the software. In addition, users can define<br />
with the included cgMLST Target Definer on<br />
the fly own ‘ad hoc’ schemes. Furthermore,<br />
the software supports setup of resistome and<br />
virulome schemes. Place, time, ‘person’, and<br />
type dimensions can be visualized with built-in<br />
geographic information system (GIS), epicurve,<br />
coloring, and phylogenetic tree (among<br />
others the minimum spanning tree algorithm is<br />
supported) functionality. All dimension views<br />
are inter-linked and exportable in publication<br />
quality SVG format. Finally, the software<br />
can generate sample-reports for senders with<br />
a summary of the analytical results (e.g.,<br />
MLST, cgMLST), QC/QA data, and extensive<br />
documentation of the analytical procedure.<br />
SeqSphere+ is designed for distributed workgroups<br />
(client/server model with encryption of<br />
all data in transmission) and requires no scripting<br />
or bioinformatics skills. It allows automatic<br />
processing and analyzing of next generation<br />
sequence (NGS) and Sanger data for prospective<br />
bacterial genomic surveillance. De novo<br />
assembly or reference mapping of NGS read<br />
data is achieved with the incorporated Velvet<br />
or BWA tools, respectively. Defining and starting<br />
a pipeline to down-sample, assemble, and<br />
analyze data processes NGS data fully automated,<br />
e.g., by fetching the raw reads from a<br />
benchtop-sequencer as soon as data are generated.<br />
For speeding- and scaling-up the analysis<br />
simply an additional computer can be added<br />
for processing data in parallel. Experiment<br />
and epidemiologic meta-data are stored in an<br />
integrated searchable SQL database together<br />
with the DNA data. New sequence entries can<br />
be compared against stored data and automatic<br />
cluster alerts of possible outbreaks can be triggered.<br />
Meta- and sequence data can be (semi)-<br />
automated submitted to the EBI ENA archive.<br />
A backup plan for all data can be defined and<br />
28<br />
ASM Conferences
Oral Presentation <strong>Abstracts</strong><br />
automatically executed. An audit trail of all<br />
user actions including the execution of analysis<br />
and (manual) data editing is also maintained.<br />
SeqSphere+ is commercially available from<br />
Ridom GmbH (Münster, Germany) for Windows<br />
and Linux operation systems. Further<br />
information and a request for a fully functional<br />
trial version can be found at http://www.ridom.<br />
de/seqsphere/.<br />
n S7:7<br />
NULLARBOR: RAPID ANALYSIS OF<br />
BACTERIAL OUTBREAK SEQUENCE DATA<br />
T. Seemann, J. Kwong, D. M. Bulach, B. P.<br />
Howden;<br />
University of Melbourne, Melbourne, AUS-<br />
TRALIA.<br />
The modern public health microbiology laboratory<br />
has embraced whole genome sequencing<br />
as the primary assay for pathogen surveillance<br />
and outbreak analysis. Here we present Nullarbor,<br />
a software pipeline for turning a set<br />
of isolate sequence data into a single report<br />
summarizing the key information about each<br />
isolate and the relationship between isolates.<br />
This report is then used by epidemiologists<br />
and laboratory staff to make a final recommendation<br />
or actionable decision. Nullarbor<br />
first performs quality control on each isolate.<br />
The reads are adaptor and quality trimmed<br />
then measured for yield whereby low coverage<br />
isolates are quarantined. The cleaned reads are<br />
scanned with Kraken to identify the likely species<br />
and the level of contamination, and offtarget<br />
or mixed samples are quarantined. The<br />
read are then de novo assembled into contigs<br />
using MegaHit and annotated using Prokka<br />
(*). The contigs are also used to calculate the<br />
MLST with the mlst tool (*) and the resistome<br />
profile using ABRicate (*). Next the relationship<br />
between the isolates is determined. Each<br />
isolate is aligned against a reference genome<br />
and variants called using Snippy (*) and a<br />
core genome SNP alignment produced. The<br />
reference can be provided or the isolate assemblies<br />
can be used, and the SNP alignment<br />
may be optionally filtered of recombination<br />
using ClonalFrameML. A phylogenetic tree is<br />
created using FastTree and various statistics<br />
including SNP distances are calculated. The<br />
annotated genomes are used to calculate the<br />
pan-genome with Roary and visualized using<br />
FriPan (*). The use of the pan-genome augments<br />
the investigation with data on mobile<br />
genetic elements otherwise missed by core<br />
SNP analysis. Nullarbor then generates a report<br />
which can rendered in multiple formats<br />
using PanDoc. The Nullarbor pipeline follows<br />
the Unix philosophy of using and combining<br />
existing standalone tools in an efficient manner.<br />
The input is a spreadsheet-like text file,<br />
and the default output is a clean HTML report.<br />
The pipeline utilises the Unix make dependency<br />
system to enable highly parallel analyses<br />
on a single machine, to maximize efficiency<br />
for the typical laboratory having only a single<br />
bioinformatics workstation. It is currently used<br />
by the Microbiological Diagnostics Unit Public<br />
Health Laboratory in Australia routinely for all<br />
investigations. Nullarbor is open-source software<br />
released under a GPL licence and runs<br />
on Unix systems. It is available from https://<br />
github.com/tseemann/nullarbor. Simple installation<br />
of Nullarbor and its dependencies may<br />
be achieved via Homebrew Science https://<br />
github.com/Homebrew/homebrew-science.<br />
Future plans include a Docker image based on<br />
CoreOS, and a virtual machine image for use<br />
with Amazon, OpenStack and VirtualBox. (*)<br />
denotes software written by the first author.<br />
n S7:8<br />
ENTEROBASE: A POWERFUL, USER-<br />
FRIENDLY ONLINE RESOURCE FOR<br />
ANALYSING GENOMIC VARIATION<br />
N. Alikhan, M. Sergeant, Z. Zhou, A. Millard,<br />
M. J. Pallen, M. Achtman;<br />
University of Warwick, Coventry, UNITED<br />
KINGDOM.<br />
The decreasing cost of next-generation sequencing<br />
promises to revolutionise molecular<br />
epidemiology. For example, Salmonella en-<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
29
Oral Presentation <strong>Abstracts</strong><br />
terica has over 40,000 sets of short reads available<br />
within GenBank. Sequencing at this scale<br />
can potentially encompass sufficient genomic<br />
variation across a species to elucidate evolutionary<br />
lineages and virulence factors such<br />
as antimicrobial resistance. These data could<br />
provide a basis for a global perspective of<br />
microbial pathogens, which not only identifies<br />
outbreak-associated strains but also long-term<br />
transmission trends and novel environmental<br />
reservoirs. However, the paucity of analytical<br />
tools to handle data at such scales remains a<br />
key limitation. Here we present EnteroBase,<br />
which addresses such logistical challenges and<br />
facilitates access to data for a general audience<br />
of clinicians and epidemiologists. EnteroBase<br />
is a user-friendly online resource, where users<br />
can upload their own sequencing data for<br />
de novo assembly by a stream-lined pipeline.<br />
The assemblies are used for calling MLST and<br />
wgMLST patterns, allowing users to compare<br />
their strains to publically available genotyping<br />
data from other EnteroBase users, GenBank<br />
and classical MLST databases. EnteroBase<br />
was designed to exclude low quality data. Assemblies<br />
are screened for contamination, poor<br />
sequencing quality and misassemblies, and<br />
linked to standardised and curated strain metadata,<br />
including geographic, host and temporal<br />
information. Curation will be by experts from<br />
the microbial research community. Curated<br />
metadata will support exploring associations<br />
between strain genotype and phenotypic or<br />
geographic features. EnteroBase will also<br />
include SNP and pan-genome based genome<br />
comparisons, including virulence factors and<br />
antimicrobial resistance. Visualisation approaches<br />
will be integrated, including minimal<br />
spanning trees and geographic mapping. Many<br />
approaches implemented in EnteroBase will<br />
also be applicable to metagenomic data, which<br />
we intend implementing. We will provide<br />
integration with existing databases, such as<br />
BIGSdb, and existing analytical tools, such as<br />
Bionumerics, in addition to providing a standardized<br />
ontology implemented through APIs<br />
allowing for easy interoperability with other<br />
similar resources. EnteroBase is accessible<br />
through all modern web browsers, and is available<br />
at http://enterobase.warwick.ac.uk.<br />
n S7:9<br />
SNAPPERDB: A SCALABLE DATABASE FOR<br />
ROUTINE SEQUENCING OF BACTERIAL<br />
ISOLATES<br />
P. Ashton, A. Al-Shahib, A. Jironkin, A. Underwood,<br />
T. Dallman;<br />
Public Health England, london, UNITED<br />
KINGDOM.<br />
As routine sequencing of bacterial isolates becomes<br />
a reality, scalable data storage solutions<br />
are required. Analysis of bacterial populations<br />
often requires re-computing the likely variants<br />
across all isolates in a dataset and this is<br />
not feasible in rapidly growing, large datasets.<br />
Public Health England has embarked on the<br />
implementation of high throughput sequencing<br />
for the surveillance of several important<br />
human pathogens and aims to leverage the<br />
high discriminatory power of single nucleotide<br />
polymorphisms (SNPs) to detect linked<br />
cases and outbreaks of infectious disease. For<br />
this software demonstration we will present<br />
SnapperDB, a set of tools to store and query<br />
bacterial variant data to facilitate reproducible<br />
and scalable analysis of bacterial populations.<br />
The use of a relational database enables highly<br />
efficient queries that can generate SNPs in the<br />
core genome for phylogenetic analysis, or the<br />
whole genome consensus sequence for output<br />
into e.g. recombination detection tools. As part<br />
of SnapperDB, a pairwise distance matrix is<br />
maintained from which hierarchical clustering<br />
is performed. This allows the assignment of a<br />
‘SNP address’, which locates the isolate within<br />
‘SNP space’ and enables the rapid identification<br />
of closely related isolates. SnapperDB is<br />
a stable application which is easily installed<br />
from the Github repository by Unix power<br />
users. For those who are less familiar with the<br />
command line, there are pre-configured instances<br />
on both the MRC CLIMB (http://www.<br />
climb.ac.uk/) and Amazon Web Services cloud<br />
computing infrastructures.<br />
30<br />
ASM Conferences
Oral Presentation <strong>Abstracts</strong><br />
n S7:10<br />
PHYLOGENETIC RECONSTRUCTION AND<br />
OUTBREAK INVESTIGATION USING IRIDA<br />
AND SNVPHYL<br />
A. Petkau 1 , P. Mabon 1 , L. S. Katz 2 , F. Bristow 1 ,<br />
T. Matthews 1 , J. Adam 1 , J. Cabral 3 , C. Sieffert 1 ,<br />
N. Knox 1 , D. Dooley 4 , E. Griffiths 5 , G. Winsor 5 ,<br />
M. R. Laird 5 , M. Courtot 5 , P. Kruczkiewicz 6 ,<br />
E. Taboada 6 , J. A. Carriço 7 , A. Keddy 8 , R. G.<br />
Beiko 8 , C. Berry 1 , A. Reimer 1 , M. Graham 1 , W.<br />
Hsiao 4 , F. Brinkman 5 , G. Van Domselaar 1 ;<br />
1<br />
Public Health Agency of Canada, Winnipeg,<br />
MB, CANADA, 2 Centers for Disease Control<br />
and Prevention, Atlanta, GA, 3 University of<br />
Manitoba, Winnipeg, MB, CANADA, 4 BC<br />
Public Health Microbiology and Reference<br />
Laboratory, Vancouver, BC, CANADA, 5 Simon<br />
Fraser University, Burnaby, BC, CANADA,<br />
6<br />
Laboratory for Foodborne Zoonoses, Lethbridge,<br />
AB, CANADA, 7 University of Lisbon,<br />
Lisbon, PORTUGAL, 8 Dalhousie University,<br />
Halifax, NS, CANADA.<br />
Whole Genome Sequencing (WGS) based<br />
methods for disease surveillance and outbreak<br />
investigation are poised to replace existing<br />
typing methods such as pulsed-field gel electrophoresis<br />
(PFGE) and multi-locus sequence<br />
typing (MLST). The wealth of information<br />
obtained from WGS provides a typing method<br />
enabling significantly increased resolution<br />
between outbreak-associated and non-outbreak<br />
concurrent isolates. However, the routine<br />
use of WGS data for outbreak investigation<br />
has been hindered due to complexity in the<br />
management and quality assessment of WGS<br />
data, execution of analysis pipelines, and the<br />
visualization and interpretation of results. SN-<br />
VPhyl is a pipeline for building whole genome<br />
phylogenies from single nucleotide variants<br />
(SNVs). SNVPhyl accepts a set of pathogen<br />
WGS sequence reads, an assembled reference<br />
genome, and a collection of QA/QC parameters.<br />
The sequence reads are mapped to the<br />
reference genome, high-quality variants are<br />
identified within the core genome, and a table<br />
of all variants together with a quality report of<br />
the data is generated. The identified variants<br />
are used to generate a multiple sequence alignment<br />
of variant sites along with a maximum<br />
likelihood phylogeny. SNVPhyl is an integrated<br />
suite of tools that are implemented within<br />
a Galaxy workflow. Galaxy, a web-based<br />
bioinformatics analysis platform, supports<br />
execution of workflows on a variety of different<br />
high-performance computing environments<br />
which enables, together with parallelization<br />
of the pipeline tools, rapid analysis of large<br />
datasets. SNVPhyl is also integrated into<br />
IRIDA (Integrated Rapid Infectious Disease<br />
Analysis) a genomic epidemiology platform.<br />
IRIDA provides a web interface for the storage<br />
and management of WGS data and epidemiological<br />
metadata, a simplified interface for<br />
executing pipelines such as SNVPhyl, and the<br />
storage and visualization of analysis results. In<br />
addition, IRIDA provides support for integration<br />
with external tools using a REST API. In<br />
particular, a plugin has been developed for the<br />
software GenGIS, a phylogeographic analysis<br />
and visualization tool, allowing the in-depth<br />
exploration of SNVPhyl pipeline results. SN-<br />
VPhyl is the culmination of nearly five years<br />
of development on a pipeline to construct<br />
whole genome phylogenies at the National<br />
Microbiology Laboratory (NML) in Canada,<br />
with contributions from the US Centers for<br />
Disease Control and Prevention. Both IRIDA<br />
and SNVPhyl are actively in use at the NML<br />
for outbreak response and are available as free<br />
and open source software. More information<br />
can be found at http://irida.ca.<br />
n S7:11<br />
PANCORE: A FLEXIBLE WORKFLOW FOR<br />
THE COMPARISON AND ASSIGNMENT OF<br />
GENOMES TO OUTCOMES<br />
D. B. Storey, B. C. Weimer;<br />
University of California, Davis, CA.<br />
The identification and assignment of bacterial<br />
pathogens to specific outbreaks is a task<br />
of importance for the continued security and<br />
safety of our food supply. As high throughput<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
31
Oral Presentation <strong>Abstracts</strong><br />
sequencing technologies continue to increase<br />
in throughput, decrease in cost, and increase<br />
in mobility; it is clear that they will play an<br />
increasing role in our ability to identify and<br />
characterize bacteria in our food supply in a<br />
pro-active manner. By applying robust datascience<br />
techniques to these problems we can<br />
attempt to begin regulating our food supply in<br />
a prescriptive manner instead of a reactive one.<br />
With these goals in mind we present PanCore;<br />
a work-flow that utilizes: high throughput sequencing<br />
data, reference free assembly, global<br />
annotations, machine learning techniques, and<br />
predictive algorithms to classify and assign<br />
bacterial genomes to a phenotype and identify<br />
outlier isolates in a fast and robust manner.<br />
The work-flow makes use of publicly available<br />
sequencing data from the SRA/ENA and data<br />
generated as part of the 100kPathogen genomes<br />
project as the underlying data for building<br />
global models of what the potential genetic<br />
landscape is for an organism of interest. We<br />
incorporate a number of measures including:<br />
genomic distance, kmer distributions, genetic<br />
content, gene polymorphisms, and allele frequencies<br />
into this database. These expansive<br />
databases are subjected to dimensionality<br />
reduction techniques, and informative features<br />
are extracted. Using these reduced data representations<br />
and user input data (i.e. serotype,<br />
presence in an outbreak, geographic location<br />
etc.) and a training set the program undergoes<br />
a second round of clustering and feature extraction.<br />
These refined features are then used<br />
to train a Naive Bayesian classifier which can<br />
be directly applied to new incoming sequencing<br />
data. This two step approach allows for the<br />
classification of isolates directly into classes<br />
and identification of outliers in a data dependent<br />
manner. This means that unknown and/or<br />
mis-classified isolates can be quickly identified<br />
and subjected to more directed analyses and reincluded<br />
in the underlying database in an efficient<br />
manner. By front-loading computing and<br />
applying dimensionality reduction techniques<br />
prior to training the classifier we also make it<br />
possible to provide fast scalable classification<br />
that doesn’t require large infrastructure for<br />
support. We have successfully applied these<br />
methods as a way to identify Campylobacter<br />
isolates that have been mis-classified by biochemical<br />
methods and identify potential markers<br />
for identifying isolate host range.<br />
n S7:12<br />
REFERENCE-FREE PAN-GENOMIC<br />
EPIDEMIOLOGY USING CORTEX<br />
Zamin Iqbal 1 , Henk den Bakker 2 , Phelim<br />
Bradley 1 , Rachel Norris 1 , Jennifer Gardy 3 ,<br />
Sarah Walker 4 , Tim Peto 4 , Derrick Crook 4 ;<br />
1<br />
Wellcome Trust Centre for Human Genetics,<br />
Univ. of Oxford, UK, 2 Department of Animal<br />
and Food Sciences, Texas Tech University,<br />
Lubbock, Texas, 3 Communicable Disease<br />
Prevention and Control Services, British Columbia<br />
Centre for Disease Control, Vancouver,<br />
BC, Canada, 4 Nuffield Department of Medicine,<br />
University of Oxford, UK<br />
Bacterial outbreak studies based on genetic<br />
epidemiology alone are fundamentally focussed<br />
on looking at genetic similarity and<br />
differences within sets of samples, both in<br />
the core genome, and also at shared accessory<br />
genome elements (e.g. due to plasmid<br />
transfer). Standard approaches require one to<br />
compare all samples against a reference genome,<br />
which can add artefacts and noise, and<br />
cost compute time. We therefore developed a<br />
reference-free multi-sample approach called<br />
Cortex [1] , allowing partial assembly of many<br />
samples into a joint de Bruijn graph, followed<br />
by rapid and accurate determination of SNP,<br />
indel and structural variants segregating within<br />
the samples, and extremely simple assaying<br />
of accessory genome content. In terms of the<br />
species involved in this challenge, Cortex has<br />
been used for outbreak analysis of Salmonella<br />
[2] and sequence presence/absence testing in<br />
Listeria [3]. In our experience, this approach<br />
leads to highly accurate and sensitive call sets<br />
for bacteria (where coverage is not generally<br />
limiting) without the need for manual curation<br />
32<br />
ASM Conferences
Oral Presentation <strong>Abstracts</strong><br />
or parameter tuning, allowing the user to focus<br />
on subsequent analyses.<br />
The software runs on Linux or Mac OS X, and<br />
is freely available under the GPLv3 license at<br />
http:/github.com/iqbal-lab/cortex. All analyses<br />
done for this bioinformatics challenge will be<br />
available on a secondary github repository.<br />
[1] De novo assembly and genotyping of variants<br />
using colored de Bruijn graphs Iqbal et al,<br />
Nature Genetics (2012)<br />
[2] Rapid whole-genome sequencing for surveillance<br />
of Salmonella enterica serovar enteritidis,<br />
den Bakker et al, EID (2014)<br />
[3] Whole genome sequencing allows for<br />
improved identification of persistent Listeria<br />
monocytogenes in food associated environments.<br />
Stasiewicz et al, AEM (2015)<br />
n S9:3<br />
WHOLE-GENOME SEQUENCE ANALYSIS OF<br />
PSEUDOMONAS AERUGINOSA IN ACUTE<br />
INFECTION REVEALS WIDESPREAD WITHIN-<br />
POPULATION DIVERSITY AND RAPID<br />
TRANSMISSION WITHIN THE BODY<br />
H. Chung 1 , K. B. Flett 2 , M. Anderson 2 , R. Kishony<br />
3 , G. P. Priebe 2 ;<br />
1<br />
Harvard Medical School, Boston, MA, 2 Boston<br />
Children’s Hospital, Boston, MA, 3 Technion<br />
- Israel Institute of Technology, Haifa,<br />
ISRAEL.<br />
Bacterial pathogen populations mutate and<br />
adapt during the course of an infection. Strong<br />
selective pressures such as host-adaptation<br />
and antibiotic treatment lead to genetic diversification<br />
within a population. Examining<br />
this diversity in pathogen populations originating<br />
from chronic infections such as cystic<br />
fibrosis has revealed that many mutations are<br />
polymorphic rather than fixed. Theoretical<br />
works have shown that such within-population<br />
diversity can hinder accurate reconstruction<br />
of epidemiological transmission networks.<br />
Here we show that even in an acute infection,<br />
we observe within-population diversity due to<br />
pre-existing polymorphisms in the infecting<br />
population, as well as de novo mutations that<br />
arise rapidly during treatment. We describe a<br />
pipeline for rapidly collecting and sequencing<br />
populations of a bacterial pathogen from<br />
patients over multiple time points. Focusing on<br />
ventilator-associated tracheitis in mechanically<br />
ventilated children, we sampled Pseudomonas<br />
aeruginosa populations from the respiratory<br />
tract (and in some cases the gut) of eight patients<br />
prior to and after antibiotic treatment.<br />
Using a low-cost library preparation method<br />
we developed, we prepared in just 4 days the<br />
whole-genomes of 636 Pseudomonas isolates<br />
for next-generation sequencing on the Illumina<br />
HiSeq platform. Comparing diversity<br />
of pathogen populations between the airways<br />
and the gut reveals fast transmission of newly<br />
generated genotypic variants across the body.<br />
While analyzing de novo mutations is useful<br />
for inferring genes that confer selective advantage<br />
in adapting to the human host, uncovering<br />
pre-existing diversity of a pathogen across<br />
the body prior to treatment could be key for<br />
tailoring patient-specific treatment strategies.<br />
Furthermore, incorporating within-population<br />
diversity into current epidemiological models<br />
will improve the accuracy of reconstructing<br />
transmission events, especially in rapidly occurring<br />
outbreaks.<br />
n S9:4<br />
BEYOND THE SNV: INTEGRATING<br />
MULTIPLE DATA TYPES INTO GENOMIC<br />
EPIDEMIOLOGY<br />
M. S. Wright 1 , G. G. Sutton 2 , R. A. Bonomo 3 ,<br />
M. D. Adams 1 ;<br />
1<br />
J. Craig Venter Institute, La Jolla, CA, 2 J.<br />
Craig Venter Institute, Rockville, MD, 3 University<br />
Hospitals Case Medical Center,Louis<br />
Stokes Cleveland Department of Veteran Affairs<br />
Medical Center, Cleveland, OH.<br />
Single nucleotide variant (SNV) analyses can<br />
be useful for identifying transmission routes<br />
during outbreaks, characterizing pathogen<br />
population structure, and providing taxonomic<br />
discrimination for strain identification. The<br />
inclusion of gene content analysis adds further<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
33
Oral Presentation <strong>Abstracts</strong><br />
resolution to evolutionary relationships and<br />
yields phenotypically significant information<br />
during investigations that use genomic<br />
epidemiology. To demonstrate this we have<br />
sequenced > 50 Klebsiella pneumoniae (Kp)<br />
and >200 Acinetobacter baumannii (Ab)<br />
isolates from the Midwestern US. Core SNV<br />
phylogenies for both species indicated population<br />
mixing across hospital locations, with the<br />
maintenance of lineages distinct from other<br />
geographical locations. Gene content analysis<br />
additionally revealed population-specific<br />
plasmids in both species. A significant founder<br />
effect was observed in Kp, where all ST258b<br />
strains likely originated from a common ancestor<br />
that had an entS deletion unique to this<br />
lineage. Mapping of insertion sequence (IS)<br />
locations identified lineage- and strain-specific<br />
IS events. Longitudinal sequence analysis of<br />
Ab isolates originating from the same patient<br />
over time highlighted gene content variability<br />
mediated by isolate-specific IS events including<br />
the loss of phenotypically relevant antibiotic<br />
resistance genes, as well as antibiotic<br />
resistance plasmid gain events that would be<br />
missed by conventional SNV analysis. Using<br />
patient-specific SNVs, IS events, and gene<br />
content analyses, we were able to determine<br />
whether persistent Ab infections were the result<br />
of treatment failure or reinfection by new<br />
strains. Thus the combination of SNV, gene<br />
content, and IS mapping data lead to a more<br />
complete picture of transmission dynamics,<br />
pathogen populations, and evolution. Challenges<br />
remain in integrating these data types in<br />
clinically-relevant settings, given the requirements<br />
for rapid assessments and minimal data<br />
curation efforts. Computational tools are under<br />
development so that new strains can be rapidly<br />
placed in the genotypic and phenotypic context<br />
of local and global strains.<br />
n S9:5<br />
DEFINING CLONALITY IN ACINETOBACTER<br />
BAUMANNII USING WHOLE GENOME<br />
SEQUENCING OF OUTBREAK STRAINS<br />
ASSOCIATED WITH THE CONFLICT IN IRAQ<br />
E. Snesrud, P. Mc Gann, L. Appalla, F.<br />
Onmus-Leone, A. C. Ong, R. Maybank, R.<br />
Clifford, M. K. Hinkle, P. E. Waterman, E. P.<br />
Lesho;<br />
Walter Reed Army Institute of Research, Silver<br />
Spring, MD.<br />
Multi-drug resistant (MDR) A. baumannii<br />
emerged as a significant source of infection<br />
during the conflict in Iraq. Unravelling the<br />
epidemiology of these strains using conventional<br />
typing methods, such as multi-locus<br />
sequence typing (MLST), is difficult, as these<br />
methods lack the resolution to detect small<br />
genetic changes, such as single nucleotide<br />
polymorphisms (SNPs). Whole genome sequencing<br />
(WGS) offers the prospect of providing<br />
definitive data on strain relatedness, but<br />
studies to define what constitutes clonality<br />
are lacking. Here, WGS was employed on<br />
a large collection of A. baumannii cultured<br />
from patients treated at the Walter Reed Army<br />
Medical Center (WRAMC) from 2003-2011 in<br />
an effort to determine a baseline for clonality<br />
in this species. From 2003-2011, carbapenem<br />
resistance among clinical isolates of A.<br />
baumannii rose from 12% to >95%. WGS<br />
was performed using the Illumina MiSeq and<br />
NextSeq benchtop sequencers on every deduplicated<br />
carbapenem-resistant A. baumannii<br />
(CRAB) archived during this period (N=394).<br />
An additional 142 carbapenem-sensitive A.<br />
baumannii from the same time period were<br />
sequenced in tandem. Comparative genomics<br />
were performed on all isolates to determine<br />
clonality. Carbapenem resistance was mediated<br />
by the Class D oxacillinases in all isolates,<br />
with bla OXA-23<br />
identified in 312 strains (79.2%).<br />
In silico MLST revealed 12 different sequence<br />
types (ST) carrying bla OXA-23<br />
, with ST-1, 2, 20,<br />
25, 81 and 94 the most prevalent. SNP-based<br />
analysis revealed that ST-1 was composed of<br />
34<br />
ASM Conferences
Oral Presentation <strong>Abstracts</strong><br />
4 different clonal clusters that were temporally<br />
separated. Within each clonal group, SNP accumulation<br />
occurred over time, with an average<br />
of 23 SNPs separating the first and last<br />
isolate identified in each cluster. In contrast,<br />
isolates of ST-2, 20, 25, 81, and 94 were all<br />
caused by a single clone that persisted and<br />
spread throughout the healthcare facility over<br />
1 to 6 years. Accumulation of SNPs in these<br />
strains was consistent with that observed for<br />
ST-1; strains varied by 0 to 25 SNPs, with the<br />
number of SNPs increasing as time progressed.<br />
At the height of the Iraq conflict (2004-2006),<br />
a new patient with a CRAB infection was<br />
being identified almost daily. Remarkably,<br />
despite the large number of patients involved,<br />
WGS revealed that the majority of infections<br />
over the 9 years were caused by just 9 different<br />
strains, which appear to have entered,<br />
persisted and disseminated within the facility<br />
over periods ranging from 1 to 6 years. SNPbased<br />
phylogeny demonstrated that every ST<br />
accumulated SNPs at a comparable rate, with<br />
an average of 8 SNPs accumulating every year.<br />
Thus, we define a baseline for clonality in A.<br />
baumannii as two isolates sharing ≤ 8 (± 3)<br />
SNPs over the course of a year.<br />
n S9:6<br />
DIRECT FROM SPUTUM: NEXT GEN<br />
ANALYSIS OF MYCOBACTERIUM<br />
TUBERCULOSIS IN CLINICAL SAMPLES<br />
D. M. Engelthaler 1 , R. E. Colman 1 , V. Crudu 2 ,<br />
D. Catanzaro 3 , A. Catanzaro 4 , P. Keim 5 , T.<br />
Cohen 6 , T. C. Rodwell 4 ;<br />
1<br />
TGen North, Flagstaff, AZ, 2 Phthisiopneumology<br />
Institute, Chișinău, MOLDOVA, REPUB-<br />
LIC OF, 3 University of Arkansas, Little Rock,<br />
AR, 4 University of California San Diego, San<br />
Diego, CA, 5 Northern Arizona University,<br />
Flagstaff, AZ, 6 Yale University, New Haven,<br />
CT.<br />
The incidence of drug-resistant (DR) tuberculosis<br />
(TB) continues to increase worldwide.<br />
With the presence of multi-drug resistance it<br />
has become critical to quickly identify the appropriate<br />
treatment regimen, in order to effectively<br />
treat disease and prevent further transmission<br />
of DR-TB. Tabletop DNA sequencers<br />
now allow for rapid and robust sequencing<br />
of pathogen isolates as well as direct analysis<br />
of clinical samples. However, applying this<br />
technology directly to complex samples, such<br />
as sputum, currently has limitations due to the<br />
complexity of biological samples. We have<br />
developed accessible tools and methodologies<br />
for direct sequencing of clinical sputum samples<br />
which enable us to detect Mycobacterium<br />
tuberculosis (Mtb), produce a rapid drug susceptibility<br />
profile, detect heteroresistance and<br />
conduct additional analyses related to the nature<br />
of TB infection and transmission. Targeted<br />
sequencing allows for Next Gen Drug Susceptibility<br />
Testing (Next Gen-DST), an amplicon<br />
sequencing method for generating a rapid and<br />
inexpensive DST profile straight from positive<br />
sputum samples. Additionally we have devised<br />
Single Molecule Overlapping Read (SMOR)<br />
analysis – an advanced amplicon sequencing<br />
approach for detecting and measuring heteroresistance<br />
(i.e., resistance allele mixtures) to<br />
0.1% minor resistance component. Lastly, a<br />
more detailed analysis of the generated with<br />
these techniques allows for haplotype analysis<br />
leading to an understanding of the nature of an<br />
infection (e.g., whether a patient has a superinfection<br />
of multiple strains or is infected with<br />
multiple lineages of the same strain). We have<br />
employed these techniques on DNA extracted<br />
from >150 remnant clinical Mtb sputum<br />
samples from The Republic of Moldova. The<br />
Next Gen-DST assay provided comparable<br />
drug sensitivity profiles as culture-based DST<br />
on 36 well established target loci; the SMOR<br />
analysis identified the presence of mixtures<br />
in samples at
Oral Presentation <strong>Abstracts</strong><br />
n S10:3<br />
DEVELOPMENT OF AN EFFICIENT NEXT-<br />
GENERATION SEQUENCING PLATFORM FOR<br />
CHARTING THE EVOLUTION OF NOROVIRUS<br />
STRAINS<br />
G. I. Parra, C. K. Karangwa, S. V. Sosnovtsev,<br />
K. Y. Green;<br />
National Institutes of Health, Bethesda, MD.<br />
Noroviruses (NoV) are important pathogens<br />
of acute gastroenteritis. An effective vaccine<br />
could save thousands of lives each year, but<br />
the number of antigenic components needed<br />
for the development of efficacious vaccines<br />
as well as the immune correlates of protection<br />
against NoV infections are still unknown. Like<br />
many other RNA viruses, NoV are genetically<br />
diverse with seven major genogroups containing<br />
over 30 different genotypes. To gain insight<br />
into the rules that govern intra- and interhost<br />
NoV evolution and antigenic diversity,<br />
we developed an efficient platform to analyze<br />
complete NoV genomes by Next-Generation<br />
Sequencing (NGS), and analyzed differences<br />
in the population dynamics of NoV infecting<br />
healthy individuals. The first set of samples<br />
was collected from patients with GII.3, GII.4,<br />
or GII.6 infection that, despite resolving symptoms<br />
within days, shed NoV for up to 4 weeks.<br />
The GII.6 and GII.3 strains were stable and did<br />
not show evidence of adaptive changes during<br />
the prolonged shedding phase, while the<br />
GII.4 viruses showed a number of nucleotide<br />
changes as infection progressed. A second set<br />
of samples was obtained from cases of personto-person<br />
transmission of non-GII.4 strains,<br />
which showed that although minor changes<br />
could be detected during transmission (acute<br />
phase), the virus would often revert to the original<br />
virus sequence during the recovery phase<br />
of the infection. Finally the new amplification<br />
and genome sequence method developed allowed<br />
us not only to amplify archival samples<br />
but also describe and characterize a new GII.17<br />
norovirus (Hu/GII.17/GaithersburgD1/2014/<br />
USA) that is currently causing large outbreaks<br />
of gastroenteritis in countries from Asia. Taken<br />
together, our data suggests different patterns of<br />
evolution among NoV strains; with some viruses<br />
(like GII.4) more prone to change, while<br />
others remain static over time, limiting their<br />
antigenic diversity and prevalence. Population<br />
dynamics offers a new tool in the development<br />
of NoV vaccines.<br />
n S10:4<br />
GENOME-WIDE COMPARISON OF COWPOX<br />
VIRUSES REVEALS A NEW CLADE RELATED<br />
TO VARIOLA VIRUS<br />
P. Dabrowski, A. Radonic, A. Kurth, L.<br />
Schuenadel, A. Nitsche;<br />
Robert Koch Institute, Berlin, GERMANY.<br />
Zoonotic infections caused by several Orthopoxviruses<br />
(OPV) like Monkeypox virus or<br />
Vaccinia virus have a significant impact on<br />
human health. In Europe, the number of diagnosed<br />
infections with Cowpox viruses (CPXV)<br />
is increasing in animals as well as in humans.<br />
CPXV used to be enzootic in cattle; however,<br />
such infections were not being diagnosed over<br />
the last decades. Instead, individual cases of<br />
cowpox are being found in cats or exotic zoo<br />
animals that transmit the infection to humans.<br />
Both animals and humans reveal local exanthema<br />
on arms and legs or on the face. Although<br />
cowpox is generally regarded as a self-limiting<br />
disease, immunosuppressed patients can develop<br />
a lethal systemic disease resembling smallpox.<br />
To date, only limited information on the<br />
complex and, compared to other OPV, sparsely<br />
conserved CPXV genomes is available. Since<br />
CPXV displays the widest host range of all<br />
OPV known, it seems important to comprehend<br />
the genetic repertoire of CPXV which in<br />
turn may help elucidate specific mechanisms<br />
of CPXV pathogenesis and origin. Therefore,<br />
about 50 genomes of independent CPXV<br />
strains from clinical cases involving several humans,<br />
rats, cats, jaguarundis, beaver, elephant,<br />
marah and mongoose were sequenced. All<br />
samples were collected as part of the routine<br />
36<br />
ASM Conferences
Oral Presentation <strong>Abstracts</strong><br />
diagnostics at the German Consultant Laboratory<br />
for Poxviruses over the last decade. The<br />
first genomes were gained by using massive<br />
parallel pyrosequencing (GS FLX) while Illumina<br />
sequencing in combination with Nextera<br />
library generation was utilized later on. The<br />
extensive phylogenetic analysis showed that<br />
the CPXV strains sequenced clearly cluster<br />
into several distinct clades, some of which are<br />
closely related to Vaccinia viruses while others<br />
represent different clades in a CPXV cluster.<br />
Particularly one CPXV clade is more closely<br />
related to Camelpox virus, Taterapox virus and<br />
Variola virus than to any other known OPV.<br />
These results support and extend recent data<br />
from other groups who postulate that CPXV<br />
does not form a monophyletic clade and should<br />
be divided into multiple lineages.<br />
n S10:5<br />
VIROME ANALYSES AMONG CHILDREN<br />
WITH ACUTE RESPIRATORY INFECTION IN<br />
CHINA<br />
W. Tan, Y. Wang;<br />
National Institute for Viral Disease Control<br />
and Prevention, China CDC, Beijing, CHINA.<br />
Acute respiratory infection (ARI) of children<br />
is known to be caused by several recognized<br />
respiratory viruses. Global understanding of<br />
virome of respiratory tract in children with<br />
ARI is limited, especially in Beijing, China,<br />
though infection by individual viruses is well<br />
characterized. To define the virome of respiratory<br />
tract in children with ARI, we carried<br />
out next-generation sequencing (NGS) of<br />
nasopharyngeal swabs by Illumina Hiseq 2500<br />
followed by phylogenetic analysis. A total of<br />
42,951,290 reads were obtained with 25%<br />
of all the generated reads assigned to recognized<br />
respiratory viruses. Respiratory tract of<br />
children with ARI contains complicated viral<br />
populations which are mainly dominated by<br />
seven families including Paramyxoviridae,<br />
Comnaviridae, Parvoviridae, Orthomyxoviridae,<br />
Picornaviridae, Adenoviridae and Anelloviridae.<br />
Various viruses of different genotypes<br />
were detected in respiratory samples. In detail<br />
HRSV, HCoVs (HCoV-229E, HCoV-OC43,<br />
HCoV-HKU1), HBoV1, influenza A/B, HRVs,<br />
HAdVs, anelloviruses (TTVs and TTMVs)<br />
and HPIVs represented the most abundant<br />
and common viruses harbored by childhood<br />
respiratory tract in Beijing. In contrast HMPV,<br />
HCoV-NL63 and measles virus were occasionally<br />
detected. Contigs sequence analysis<br />
indicated that HRSV detected in this study<br />
mainly belonged to BA and GA2 subtypes. At<br />
least three genotypes of HCoV-OC43 circulating<br />
in Beijing were determined including B,<br />
C/D and UNT subgroups and genotype UNT<br />
is the predominant genotype in recent years.<br />
Influenza A, B and C viruses were all detected<br />
in this study and mainly included H1N1 and<br />
H5N1. Most of the reads related to HRVs belong<br />
to HRV-A and HRV-C. Diverse types of<br />
anelloviruses (TTVs and TTMVs) were found<br />
in respiratory samples including TTV5, TTV7,<br />
TTV10, TTV21, TTMV5, TTMV7, TTMV8<br />
and TTV-like mini virus. The viral population<br />
and dominant genotypes detected differed significantly<br />
between ours and previous reports.<br />
This research firstly provides a comprehensive<br />
understanding of virome of children with ARI<br />
in China and indicated a high heterogeneity of<br />
known viruses present in respiratory tract of<br />
children, which may benefit detection and prevention<br />
of respiratory disease in China.<br />
n S10:6<br />
USING WASTEWATER TO MONITOR VIRAL<br />
PATHOGENS IN THE SLUM CITY OF KIBERA<br />
M. H. Hjelmsø 1 , O. Lukjancenko 1 , L. Bergmark<br />
1 , E. Ngeno 2 , F. M. Aarestrup 1 , R. S. Hendriksen<br />
1 ;<br />
1<br />
Technical University of Denmark, Kgs. Lyngby,<br />
DENMARK, 2 Center for Disease Control,<br />
Nairobi, KENYA.<br />
Pathogenic viruses are a huge burden to<br />
mankind. Enteric viruses, the major cause of<br />
gastroenteritis, indisposes millions of people<br />
each year and kills hundreds of thousands in<br />
developing countries. Other important diseases<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
37
Oral Presentation <strong>Abstracts</strong><br />
caused by viruses include MERS, SARS, Aids<br />
and Ebola. To combat these diseases, health<br />
authorities need an understanding of the spread<br />
and epidemiology of the viruses in question.<br />
Classic surveillance relies on data from local<br />
health centers and diagnostic labs. This<br />
is problematic as many people in the world<br />
either fail to contact or have no access to such<br />
facilities, leading to an under-reporting of the<br />
problem. Alternative monitoring methods are<br />
needed to limit the spread of viral pathogens in<br />
the future. The objective of this study was to<br />
establish if deep sequencing of wastewater can<br />
be used as a surveillance tool for viral pathogens.<br />
Infected individuals shed virus particles<br />
in large numbers in their feces, which ends up<br />
contaminating the wastewater. In this regard,<br />
wastewater is most often seen as a problem,<br />
but in this project we use it to monitor the<br />
health state of the population it originates<br />
from. To test this novel approach, a proof of<br />
concept study was made in the slum city of<br />
Kibera, Kenya. This city has very poor sanitary<br />
conditions and limited medical facilities, leading<br />
to a high incidence of infective diseases.<br />
Wastewater was sampled daily at two central<br />
locations for three months. Samples were<br />
frozen at -80 °C directly after sampling and<br />
shipped frozen to our laboratory in Denmark.<br />
The virus particles were then isolated from the<br />
rest of the wastewater content, concentrated<br />
with PEG8000 precipitation and treated with<br />
nucleases to degrade extracellular DNA and<br />
RNA. The nucleotides from the pure viral<br />
concentrate were then extracted with the Nucleospin<br />
RNA XS kit and the High Pure Viral<br />
Nucleic Acid Kit selecting for RNA and DNA<br />
viruses, respectively. The nucleotides were<br />
then amplified with 40 cycles of PCR using a<br />
random primer before library creation with the<br />
Nextera XT kit and sequenced on the Illumina<br />
MiSeq creating 250bp paired-end reads. The<br />
reads were then mapped to a custom database<br />
containing all viral sequences and genomes<br />
from the NCBI and ViPR databases. We were<br />
able to detect 456 different virus species,<br />
including the human pathogens: Enterovirus,<br />
Rotavirus, Norovirus, Astrovirus, Hepatitis C<br />
virus and Human Herpes Virus. Several of the<br />
human viral pathogens exhibited a clear rise<br />
and fall in numbers during the study period.<br />
Similar epicurves were seen at both sampling<br />
points, suggesting that our results were indeed<br />
a reflection of an outbreak in the city. Unfortunately,<br />
no clinical data exists to confirm this.<br />
With this approach, the public health of large<br />
populations can be monitored cost effectively.<br />
Better monitoring could be instrumental in<br />
combating the diseases locally and from<br />
preventing global outbreaks of existing and<br />
emerging viral pathogens.<br />
38<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
n 1<br />
A BIOSURVEILLANCE ANALYSIS PIPELINE<br />
FOR GENOMIC SEQUENCE DATA<br />
C. Olsen 1 , K. Qaadri 1 , H. Shearman 2 , R. Moir 2 ,<br />
M. Kearse 2 , S. Markowitz 2 , J. Kuhn 2 , S. Dunn 2 ,<br />
A. Cooper 2 ;<br />
1<br />
Biomatters, Inc., Newark, NJ, 2 Biomatters,<br />
Ltd., Auckland, NEW ZEALAND.<br />
Next-generation sequencing (NGS) approaches<br />
have numerous applications for biosurveillance<br />
programs and outbreak investigation.<br />
However, there are significant challenges for<br />
analyzing the data accurately without the aid<br />
of high performance compute resources in a<br />
timely fashion. Often times the users are not<br />
bioinformaticians who are comfortable running<br />
sequence analysis pipelines. Biomatters’<br />
Geneious R9 is a bioinformatics software<br />
platform that allows researchers the use of<br />
industry-leading algorithms for their genomic<br />
and protein sequence analyses. Geneious offers<br />
a comprehensive suite of functions, including<br />
a robust collection of peer-reviewed tools,<br />
that enable researchers to be more efficient<br />
with their sequence analysis workflows. The<br />
recent addition of the 16S Biodiversity tool<br />
and Sequence Classifier plugin provides tools<br />
which can be incorporated into an easy to use<br />
pathogen identification workflow. The 16S<br />
Biodiversity tool identifies high-throughput<br />
16S rRNA amplicons from environmental<br />
samples using the RDP database, and visualizes<br />
biodiversity as an interactive chart using<br />
a secure web viewer. The Sequence Classifier<br />
plugin taxonomically classifies an organic<br />
sample by how similar its DNA is to your<br />
own database of known sequences using a<br />
BLAST-like algorithm with multiple loci and<br />
trees to assist with identification. By utilizing<br />
Geneious R9, biologists can easily streamline<br />
their sequence analysis workflows for mixed<br />
sample analysis. This poster aims to describe a<br />
sequence analysis pipeline for identification of<br />
bacterial pathogens from mixed metagenomic<br />
data generated from outbreaks. The pipeline<br />
can also be extended to include eukaryotic and<br />
fungal pathogens.<br />
n 2<br />
THE IMPORTANCE OF ASSESSING THE<br />
PERFORMANCE OF METHODS FOR VARIANT<br />
DETECTION AND CONSTRUCTING SNP<br />
MATRICES: INSIGHTS FROM A VALIDATION<br />
EXPERIMENT OF THE CFSAN SNP PIPELINE<br />
J. B. Pettengill, S. Davis, Y. Luo, H. Rand, E.<br />
Strain;<br />
FDA, College Park, MD.<br />
The CFSAN SNP Pipeline combines into a single<br />
package the steps necessary to generate a<br />
SNP matrix with which a phylogenetic tree can<br />
be inferred to assist in traceback and foodborne<br />
outbreak investigations. The pipeline works<br />
with next-generation sequencing reads from<br />
a group of individuals and uses a referencebased<br />
approach to identify variant sites. Given<br />
the importance of decisions that may be based<br />
on the topology inferred from the SNP matrix,<br />
it is paramount that the method be validated.<br />
With this objective in mind, we developed a<br />
simple python package that when given a reference<br />
genome will generate variants of known<br />
position against which we validate our pipeline.<br />
We created 1000 simulated Salmonella<br />
enterica sp. enterica Serovar Agona genomes<br />
at 100x and 20x coverage, each containing<br />
500 SNPs, 20 single-base insertions and 20<br />
single-base deletions. For the 100x dataset,<br />
the CFSAN SNP Pipeline recovered 98.9% of<br />
the introduced SNPs and had a false positive<br />
rate of 1.04 x 10-6; for the 20x dataset 98.8%<br />
of SNPs were recovered and the false positive<br />
rate was 8.34 x 10-7. Interestingly, failure to<br />
meet the consensus frequency threshold rather<br />
than the coverage threshold was the primary<br />
explanation for false negatives. Additionally,<br />
false negatives were not randomly distrib-<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
39
Poster <strong>Abstracts</strong><br />
uted across the genome, which suggests that<br />
hotspots within the genome may exist where it<br />
will be difficult to accurately detect differences<br />
between two samples. These results provide<br />
critical metrics that show the CFSAN SNP<br />
Pipeline to be a robust method for constructing<br />
a SNP matrix and further reinforces the utility<br />
and importance of validation exercises.<br />
n 3<br />
TGS-TB: TOTAL GENOTYPING SOLUTION<br />
FOR MYCOBACTERIUM TUBERCULOSIS<br />
USING SHORT-READ WHOLE-GENOME<br />
SEQUENCING<br />
T. Sekizuka 1 , A. Yamashita 1 , Y. Murase 2 , T.<br />
Iwamoto 3 , S. Mitarai 2 , S. Kato 2 , M. Kuroda 1 ;<br />
1<br />
National Institute of Infectious Diseases, Shinjyuku-ku,<br />
JAPAN, 2 Japan Anti-Tuberculosis Association,<br />
Kiyose-shi, JAPAN, 3 Kobe Institute<br />
of Health, Kobe-shi, JAPAN.<br />
Background: Whole-genome sequencing<br />
(WGS) with next-generation DNA sequencing<br />
(NGS) is an increasingly accessible and affordable<br />
method for genotyping hundreds of Mycobacterium<br />
tuberculosis (Mtb) isolates, leading<br />
to more effective epidemiological studies<br />
involving single nucleotide variations (SNVs)<br />
in the core genomic sequences based on molecular<br />
evolution. Methods: We developed an<br />
all-in-one web-based tool for genotyping Mtb,<br />
referred to as Total Genotyping Solution for<br />
TB (TGS-TB), to facilitate multiple genotyping<br />
platforms using NGS for spoligotyping and<br />
the detection of phylogenes with core genomic<br />
single nucleotide variations (SNVs), IS6110<br />
insertion sites, and VNTRs (our customized<br />
short TR on 43 loci) through a user-friendly<br />
simple click interface. In addition, this methodology<br />
is implemented with a KvarQ script<br />
to predict MTBC lineages/sublineages and<br />
potential antimicrobial resistance. Findings:<br />
The results of in silico analyses using TGS-TB<br />
are completely consistent with those obtained<br />
using conventional molecular genotyping<br />
methods, suggesting that MiSeq NGS short<br />
reads could provide multiple genotypes to<br />
discriminate multiple strains of Mtb. Indeed,<br />
seven Mtb isolates showing the same VNTR<br />
profile were accurately discriminated through<br />
median joining network analysis using specific<br />
SNVs unique to those isolates. Furthermore,<br />
an additional IS6110 insertion was detected<br />
in one of those isolates as supportive genetic<br />
information in addition to core genomic SNVs.<br />
The results obtained from all in silico analyses<br />
can be downloaded from the website. Interpretation:<br />
TGS-TB provides more accurate<br />
and discriminative strain typing for clinical<br />
and epidemiological investigations; NGS strain<br />
typing offers a total genotyping solution for<br />
Mtb outbreak and surveillance. The genotype<br />
information obtained for all Mtb isolates can<br />
be deposited into an integrated database for<br />
the surveillance of future outbreaks and global<br />
infections. TGS-TB web site: http://gph.<br />
niid.go.jp/tgs-tb Funding: This research was<br />
funded through a Grant-in-Aid for Research<br />
on Emerging and Re-emerging Infectious Diseases<br />
(H25-Shinko-Ippan-015) from the Ministry<br />
of Health Labour and Welfare Programs<br />
of Japan.<br />
n 4<br />
MARA: THE MULTI-ANTIBIOTIC RESISTANCE<br />
ANNOTATOR<br />
S. Partridge 1 , G. Tsafnat 2 ;<br />
1<br />
Westmead Millennium Institute, Sydney, AUS-<br />
TRALIA, 2 Centre for Health Informatics, Macquarie<br />
University, Sydney, AUSTRALIA.<br />
Much of the increasingly problematic multiresistance<br />
in Gram-negative bacteria is due<br />
to resistance genes associated with different<br />
mobile elements (mainly gene cassettes/<br />
integrons, insertion sequences, transposons)<br />
that tend to cluster together in complex multiresistance<br />
regions (MRR). MRR in turn are<br />
found on plasmids that can spread between<br />
cells, including different species, or sometimes<br />
in islands integrated into the chromosome.<br />
Increasing numbers of MRR sequences are<br />
becoming available as part of large projects<br />
using next-generation methods, enabling<br />
40<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
comparative analysis to better understand how<br />
resistance genes are spreading. However, many<br />
such sequences are poorly and inconsistently<br />
annotated, as available software often only<br />
focuses on identifying genes and the putative<br />
functions of the proteins that they encode. In<br />
the resistance domain, gene functions are often<br />
well known, but resistance gene nomenclature<br />
is confusing, with different naming systems<br />
for the same genes and relationships between<br />
genes often not evident from their names<br />
alone. Consistently naming genes, identifying<br />
minor variations that have important effects<br />
on resistance phenotype and simultaneously<br />
identifying boundaries of mobile genetic elements<br />
are needed to provide the most useful<br />
annotations. We previously developed an automated<br />
tool, Attacca, which uses a database<br />
of ‘features’ and computational grammars<br />
to accurately and consistently annotate gene<br />
cassettes and integrons, and the Repository of<br />
Antibiotic-resistance Cassettes (RAC) website<br />
http://www.rac.aihi.mq.edu.au/), which provides<br />
a database of resistance and other gene<br />
cassettes and allows sequences to be submitted<br />
for annotation by Attacca. We have now<br />
extended Attacca to annotate other resistance<br />
genes and associated mobile elements using<br />
an expanded database (compiled from existing<br />
resistance gene repositories, where possible)<br />
and additional grammar rules. Attacca annotations<br />
were compared with manually annotated<br />
sequences to refine the grammars. Attacca is<br />
able to accurately and consistently annotate<br />
resistance genes, the boundaries of complete<br />
mobile elements and their fragments, insertions<br />
of one mobile element inside another and<br />
direct repeats indicative of insertion, as well<br />
as providing diagrammatic representations of<br />
annotations. The Multi-Antibiotic Resistance<br />
Annotator (MARA) website (linked to RAC)<br />
will provide access to the extended resistance<br />
gene database, as browsable lists that include<br />
information such as relationships within different<br />
resistance gene families, and a service for<br />
annotating MRR sequences.<br />
n 5<br />
MOLECULAR EPIDEMIOLOGY OF<br />
CAMPYLOBACTER ISOLATES FROM<br />
SOUTH AFRICA INVESTIGATED USING<br />
WEB-BASED BIOINFORMATICS PIPELINES<br />
DEVELOPED AT THE CENTER FOR GENOMIC<br />
EPIDEMIOLOGY OF THE TECHNICAL<br />
UNIVERSITY OF DENMARK<br />
A. M. Smith, M. Thobela, A. Ismail, K. H.<br />
Keddy;<br />
National Institute for Communicable Diseases,<br />
Johannesburg, SOUTH AFRICA.<br />
Background: Campylobacter is a leading<br />
cause of bacterial gastroenteritis globally. The<br />
molecular epidemiology of Campylobacter is<br />
well described in developed countries. However,<br />
in developing countries very little data exist<br />
on the molecular epidemiology of Campylobacter.<br />
Objectives: A proof-of-concept study<br />
was conducted to determine the feasibility of<br />
investigating the molecular epidemiology of<br />
Campylobacter by using web-based bioinformatics<br />
pipelines to analyze whole-genome<br />
sequence (WGS) data. Methods: Sixteen human<br />
isolates of Campylobacter jejuni were<br />
investigated. Isolates were recovered in 2014<br />
from the Gauteng and Western Cape provinces<br />
of South Africa. Genomic DNA was isolated<br />
from strains using the QIAGEN QIAamp DNA<br />
Mini Kit and sequenced using Illumina MiSeq<br />
next generation sequencing technology. Raw<br />
data generated on the MiSeq were assembled<br />
into contiguous DNA sequences using CLC<br />
Genomics Workbench Software. Assembled<br />
data was analyzed using ‘multi-locus sequence<br />
typing (MLST)’ and ‘acquired antimicrobial<br />
resistance gene finder’ pipelines developed at<br />
the Center for Genomic Epidemiology (CGE)<br />
of the Technical University of Denmark. Results:<br />
Genomic data were successfully uploaded<br />
to CGE servers and the resulting analyses<br />
were returned via e-mail messages containing<br />
website links to the analyzed data. Analysis<br />
of a single data file was typically completed<br />
within 5-10 minutes. Our 16 isolates showed<br />
the following MLST subtypes: ST-4063 (n=2),<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
41
Poster <strong>Abstracts</strong><br />
1737 (n=1), ST-4624 (n=1), ST-6091 (n=1),<br />
ST-5809 (n=1), ST-658 (n=1), ST-356 (n=1),<br />
ST-257 (n=1), ST-883 (n=1), ST-474 (n=1),<br />
ST-829 (n=1), ST-51 (n=1), and ST-unknown<br />
(n=3). Analysis for acquired antimicrobial resistance<br />
genes showed the following: 3 isolates<br />
had no evidence of known resistance genes; 9<br />
isolates contained bla OXA-61<br />
; 1 isolate contained<br />
tet(O); 2 isolates contained both bla OXA-61<br />
and<br />
tet(O); and 1 isolate contained bla OXA-61<br />
, tet(O),<br />
tet(B), sul2, strA and strB. Conclusions:<br />
Our proof-of-concept study was successful.<br />
Analysis of WGS data using web-based bioinformatics<br />
pipelines at the CGE provided a<br />
single, rapid and cost-effective approach to<br />
investigate the molecular epidemiology of<br />
Campylobacter. We showed a wide diversity<br />
of MLST subtypes among 16 Campylobacter<br />
isolates suggesting that a heterogeneous population<br />
of Campylobacter may exist in South<br />
Africa. A diversity of acquired antimicrobial<br />
resistance genes was shown in the population<br />
with bla OXA-61<br />
(coding for β-lactam resistance)<br />
most commonly described.<br />
n 6<br />
GENETIC TYPING AND EPIDEMIOLOGICAL<br />
CLUSTERING OF COMMON PATHOGENS<br />
BASED ON WHOLE GENOME NGS DATA<br />
K. Einer-Jensen, P. Liboriussen, J. Johansen,<br />
L. Schauser, A. C. Materna;<br />
QIAGEN, Aarhus, DENMARK.<br />
Next generation sequencing (NGS) data from<br />
whole pathogen genomes is frequently used for<br />
enhanced surveillance and outbreak detection<br />
of common pathogens. Version 1.5 of the CLC<br />
Microbial Genomics Module for CLC Genomics<br />
Workbench and CLC Genomics Server<br />
introduces new functionality for molecular<br />
typing and epidemiological analysis of bacterial<br />
isolates. The latest update to the module<br />
enables the user to perform stepwise (tool-bytool)<br />
analysis or to take advantage of included<br />
multistep workflows. With a few clicks workflows<br />
can be optimized for routine analysis of<br />
a specific pathogen or outbreak. New features<br />
include for instance streamlined tools for NGSbased<br />
Multilocus Sequence Typing (MLST),<br />
resistance typing, as well as detection of genus<br />
and species information. New tools for phylogenetic<br />
tree reconstruction generate trees based<br />
on single nucleotide polymorphisms (SNPs) or<br />
infer K-mer trees from NGS reads or genomes.<br />
A new table format, acting as a database, collects<br />
typing results and associates these results<br />
with metadata such as sample information,<br />
geographic origin, treatment outcome, etc. Results<br />
generated using e.g. MLST and resistance<br />
typing can furthermore be associated with the<br />
original sample metadata. Users can filter for<br />
results and metadata to find and select relevant<br />
subsets of samples for downstream analysis.<br />
Results and metadata available during tree<br />
generation can further be used to explore phylogeny<br />
in the context of this epidemiologically<br />
relevant information. Version 1.5 of the CLC<br />
Microbial Genomics Module aims to facilitate<br />
tasks commonly carried out during outbreak<br />
investigation, such as typing or source tracking<br />
based on whole genome data. Preconfigured<br />
workflows and simple deployment via integration<br />
into the widely used CLC Genomics<br />
Workbench and CLC Genomics Server ecosystem<br />
offer a user-friendly platform for scientists<br />
engaged in outbreak prevention and control.<br />
n 7<br />
NEEDS ASSESSMENT AND ONTOLOGY<br />
DEVELOPMENT FOR INTEGRATING WHOLE<br />
GENOME SEQUENCING INTO ROUTINE<br />
OUTBREAK INVESTIGATIONS<br />
E. J. Griffiths 1 , M. Courtot 1 , D. Dooley 2 , J.<br />
Adam 3 , F. Bristow 3 , J. A. Carrico 4 , B. Dhillon<br />
1 , M. Graham 3 , M. Laird 1 , T. Matthews 3 , A.<br />
Petkau 3 , L. Schriml 5 , J. Shay 1 , E. Taboada 6 , G.<br />
Winsor 1 , G. Van Domselaar 3 , F. Brinkman 1 , W.<br />
Hsiao 2 ;<br />
1<br />
Simon Fraser University, Burnaby, BC,<br />
CANADA, 2 BC Public Health Microbiology<br />
and Reference Laboratory, Vancouver, BC,<br />
CANADA, 3 National Microbiology Laboratory,<br />
Winnipeg, MB, CANADA, 4 University of<br />
Lisbon, Lisbon, PORTUGAL, 5 University of<br />
42<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
Maryland School of Medicine, Baltimore, MD,<br />
6<br />
Laboratory for Foodborne Zoonoses, Lethbridge,<br />
AB, CANADA.<br />
Whole-genome sequencing (WGS) can provide<br />
increased pathogen typing discriminatory<br />
power as well as additional genomic information<br />
using comparative genomics tools. Canada’s<br />
Integrated Rapid Infectious Disease Analysis<br />
(IRIDA) platform will equip public health<br />
workers with user friendly tools for incorporating<br />
WGS into isolate typing and epidemiological<br />
pipelines to support real-time infectious<br />
disease investigation. In order to understand<br />
the practical requirements for implementing<br />
this platform within the Canadian health care<br />
network, a needs assessment was conducted<br />
by interviewing public health stakeholders<br />
and domain experts. User activities, lab<br />
management software, pathogen-tracking and<br />
reporting systems were profiled to better characterize<br />
levels of user computational expertise,<br />
required functionalities and information flow.<br />
Key gaps were identified in personnel training,<br />
data processing and sharing, data integration,<br />
as well as governance. Consistent capture of<br />
structured metadata is crucial for integrated<br />
data analyses, human health risk assessments,<br />
source attribution, ecosystems modelling and,<br />
in the simplest terms, to make sense of the genome<br />
data. In addition to the needs assessment,<br />
an ontology resource review was conducted<br />
to assess the utility of different community<br />
standards for fulfilling the needs of a genomic<br />
epidemiology program. No single ontology is<br />
sufficient to cover all attributes required for<br />
genomic epidemiology and the very breadth<br />
of many ontologies hinders their practical use<br />
in real-time by users with little bioinformatics<br />
expertise. User profiles and data requirements<br />
were harmonized with different sources<br />
in order to produce an OWL file containing<br />
metadata fields and terms describing isolate<br />
source attribution, lab analytics, sequencing/<br />
assembly/annotation processes as well as<br />
quality metrics, and patient demographics (or<br />
environmental descriptors). By adhering to<br />
the best practices of the Open Biomedical and<br />
Biological Ontology (OBO) Consortium, the<br />
application ontology we are developing allows<br />
consolidation of various existing ontological<br />
efforts into a resource directly compatible with<br />
IRIDA. The data integration capabilities of<br />
the IRIDA ontology are currently being tested<br />
in a Canadian government Genome Research<br />
and Development Initiative and will shortly be<br />
expanded to include antimicrobial resistance.<br />
This research is a key component of the IRIDA<br />
platform allowing for automated data integration<br />
alleviating the burden of manual analyses.<br />
Furthermore, outbreak response can be expedited<br />
after auto-detecting deviations above<br />
pre-defined biosurveillance thresholds. The<br />
lessons learned from the needs assessment and<br />
ontology development reported here should be<br />
informative for other countries with autonomous<br />
health regions.<br />
n 8<br />
IRIDA: A FEDERATED PLATFORM ENABLING<br />
RICHER GENOMIC EPIDEMIOLOGY<br />
ANALYSIS IN A PUBLIC HEALTH<br />
ENVIRONMENT<br />
F. Bristow 1 , J. Adam 1 , J. A. Carrico 2 , M. Courtot<br />
3 , B. Dhillon 3 , D. Dooley 4 , E. Griffiths 3 , J.<br />
Isaac-Renton 5 , A. Keddy 6 , P. Kruczkiewicz 7 , M.<br />
Laird 3 , T. Matthews 1 , A. Petkau 1 , L. Schriml 8 ,<br />
J. Shay 3 , E. Taboada 7 , P. Tang 4 , J. Thiessen 1 ,<br />
G. Winsor 3 , R. G. Beiko 6 , M. Graham 1 , G. Van<br />
Domselaar 1 , W. Hsiao 4 , F. S. Brinkman 3 ;<br />
1<br />
National Microbiology Laboratory, Winnipeg,<br />
MB, CANADA, 2 University of Lisbon, Lisbon,<br />
PORTUGAL, 3 Simon Fraser University, Burnaby,<br />
BC, CANADA, 4 BC Public Health Microbiology<br />
and Reference Laboratory, Vancouver,<br />
BC, CANADA, 5 BC Public Health Microbiology<br />
and Reference Laboratory, Burnaby, BC,<br />
CANADA, 6 Dalhousie University, Halifax, NS,<br />
CANADA, 7 Laboratory for Foodborne Zoonoses,<br />
Lethbridge, AB, CANADA, 8 University of<br />
Maryland School of Medicine, Baltimore, MD.<br />
Most public data analysis tools/pipelines for<br />
genomic epidemiology require considerable<br />
technical knowledge to execute, or cannot eas-<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
43
Poster <strong>Abstracts</strong><br />
ily integrate rich epidemiologic data, providing<br />
a barrier to wider use by public health workers.<br />
Canada’s Integrated Rapid Infectious Disease<br />
Analysis (IRIDA) web-based bioinformatics<br />
platform provides a model approach that<br />
aims to address these issues and complement<br />
other international bioinformatics pipelines<br />
for genomic epidemiology. The IRIDA development<br />
team comprises five interconnected<br />
working groups: 1) Ontology and Database;<br />
2) Microbial Typing; 3) Architecture and API;<br />
4) Tools Development; 5) User Experience.<br />
Teams are embedded in Canadian national and<br />
provincial public health agencies, and in academia,<br />
to engage end users and stakeholders<br />
during design and implementation phases of<br />
the project. IRIDA implements secure storage<br />
of WGS data, epidemiological and application<br />
metadata, data analysis pipelines, visualization<br />
of results, and a federated data sharing model<br />
intended to facilitate secure communication<br />
within and between provincial and federal public<br />
health institutions in Canada. Metadata is<br />
encoded in an application ontology following<br />
community recognized standards and extending<br />
existing OBO domain ontologies (http://<br />
www.obofoundry.org/) to promote interoperability.<br />
Data analysis pipelines and execution<br />
provenance are transparently implemented<br />
using Galaxy, and federated data sharing and<br />
analysis is realized with a common REST<br />
API across platform instances. Data analysis<br />
tools being further developed include SNV-<br />
Phyl for phylogeographic analysis, in silico<br />
microbial typing capability, and IslandViewer/<br />
visualization tools for antimicrobial resistance,<br />
virulence factor, and genomic island analysis.<br />
Linkage with other international genomic epidemiology<br />
initiatives, involving public genomic<br />
data release with more limited metadata, is<br />
also envisaged. A publicly available academic<br />
version of IRIDA that does not provide access<br />
to potentially sensitive epidemiologic metadata<br />
will provide IRIDA’s analysis tools for wider<br />
research use. An initial IRIDA version is being<br />
tested in the public health environment<br />
using current outbreak data, enabling further<br />
refinement of ontology and tool development.<br />
IRIDA is free, open-source software that may<br />
be a useful platform for other countries with<br />
autonomous health regions that wish to empower<br />
their public health workers’ genomic<br />
analysis capabilities. See http://www.irida.ca<br />
for more information.<br />
n 9<br />
PIPELINE FOR THE AUTOMATIC<br />
IDENTIFICATION OF PATHOGENS<br />
A. Andrusch, P. Dabrowski, A. Nitsche;<br />
Robert Koch Institute, Berlin, GERMANY.<br />
Special diagnostics of infectious diseases<br />
nowadays rely on the gold standard of nucleic<br />
acid detection that is the polymerase chain<br />
reaction (PCR). The specific detection by PCR<br />
has its limit in that every pathogen has to be<br />
tested for with an own assay. This limitation<br />
can be overcome by adopting the molecular<br />
open view capacities that next-generation<br />
sequencing (NGS) can provide. NGS-based<br />
methods allow for the representative sequencing<br />
of all nucleic acids contained in clinical<br />
samples, enabling the downstream analysis<br />
of all generated reads for various known<br />
pathogens at once. This comes at the price of<br />
necessary filtering steps for the removal of<br />
background reads originating from the patient.<br />
Beyond that, NGS cannot only extend the<br />
diagnostic possibilities provided by PCR, but<br />
also serve as a stepping stone in the detection<br />
of hitherto unknown and novel pathogens. The<br />
‘Pipeline for the Automatic Identification of<br />
Pathogens’ (PAIPline) presented here, is a new<br />
complete workflow for the pathogen search in<br />
NGS datasets. It includes several steps for the<br />
preprocessing and quality control of the raw<br />
data to ensure that only information-rich reads<br />
are evaluated. It furthermore includes steps<br />
for the assignment of reads to their respective<br />
taxons based on reliable, established referencebased<br />
algorithms. Filtering of background<br />
reads, contaminants and organisms of low<br />
interest as well as the evaluation of ambiguous<br />
read information is automatically done before<br />
44<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
the results are presented. Analysis results are<br />
shown in a highly accessible manner, allowing<br />
the user to gain a quick overview as well<br />
as permitting deep analysis. The performance<br />
of the PAIPline was benchmarked on real and<br />
artificial datasets of known compositions and<br />
compared to competing tools. The results and<br />
discussed features show that the presented approach<br />
is a viable strategy for the identification<br />
of pathogen sequences in NGS datasets.<br />
n 10<br />
SEPARATION OF FOREGROUND AND<br />
BACKGROUND READS IN MIXED NGS<br />
DATASETS<br />
S. Tausch, A. Nitsche, B. Renard, P. Dabrowski;<br />
Robert Koch Institute, Berlin, GERMANY.<br />
NGS is a valuable technology for rapid and indepth<br />
analysis of clinical samples, as it allows<br />
sequencing of a pathogen’s whole genome<br />
directly from patient material within as little<br />
as 26 hours. However, the follow-up analysis<br />
is severely slowed down by the abundance of<br />
reads originating from the host. Thus, in order<br />
to exploit the full potential of the technology<br />
for rapid diagnostics, a method for rapid in<br />
silico removal of host reads is necessary. Commonly,<br />
a mapping-based approach is used to<br />
separate reads: either reads mapping to a background<br />
reference or reads not mapping to a<br />
foreground reference are discarded. However,<br />
while the former approach is highly specific<br />
in discarding only true background reads and<br />
the latter is highly sensitive in only keeping<br />
foreground reads, neither offers a good balance.<br />
Hence we have aimed at developing a<br />
novel tool specifically geared towards both<br />
specific and sensitive separation of foreground<br />
and background reads. In order to determine<br />
whether a read belongs to the foreground or<br />
the background, we train markov chains of<br />
an order k from 4 to 12 on user-provided sets<br />
of foreground and background reference sequences,<br />
where each state is a k-mer of length<br />
k and each transition is one of the four possible<br />
bases A, C, G and T. We then calculate the<br />
difference of log likelihoods of each transition<br />
observed within a read with regards to<br />
the foreground and the background markov<br />
chains. This difference is then used as a score<br />
for the separation of reads, with scores smaller<br />
than 0 indicating a background read and scores<br />
larger than 0 indicating a foreground read.<br />
We have tested our tool on several datasets,<br />
including Cowpoxvirus sequenced from a<br />
human host. In all cases, our tool was faster<br />
than any competing tool (achieving speeds of<br />
up to 10 Megabases/second using 4 CPUs),<br />
including Kraken and mapping via bowtie2.<br />
At the same time, we consistently achieved<br />
the best F-Score of all tested tools. Our tool is<br />
developed in python and java and available for<br />
download from http://sourceforge.net/projects/<br />
rambok/ We have developed a freely available,<br />
easy to use, rapid and both highly sensitive and<br />
specific tool for the separation of foreground<br />
and background reads in mixed NGS datasets.<br />
We believe that this will be highly useful as an<br />
initial filtering step for anyone analyzing viral<br />
sequences via NGS.<br />
n 11<br />
A RAPID AND SCALABLE SINGLE<br />
NUCLEOTIDE POLYMORPHISM DISCOVERY<br />
AND VALIDATION PIPELINE FOR OUTBREAK<br />
INVESTIGATION OF BACTERIAL PATHOGENS<br />
B. Rusconi 1 , A. L. Rodriguez 2 , S. S. Koenig 1 ,<br />
M. Eppinger 1 ;<br />
1<br />
University of Texas at San Antonio - South<br />
Texas Center For Emerging Infectious Diseases<br />
(STCEID), San Antonio, TX, 2 University of<br />
Texas at San Antonio -Computational Biology<br />
Initiative, San Antonio, TX.<br />
Background: Assuring a timely and effective<br />
response in the control of bacterial outbreaks<br />
is challenging, as discriminatory power becomes<br />
of particular importance to distinguish<br />
outbreak isolates that form tight clonal complexes<br />
with only few genetic polymorphisms.<br />
The increase of throughput and concomitant<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
45
Poster <strong>Abstracts</strong><br />
cost reduction of next generation sequencing<br />
has allowed researchers to migrate from<br />
analyzing distinct loci in only few prototypical<br />
isolates to whole genome sequencing typing<br />
(WGST) of large bacterial collections. The<br />
growing sequence databases fortified with<br />
strain-associated metadata require efficient<br />
and well-integrated bioinformatics tools that<br />
will explore and harvest the information<br />
content of WGS data to enhance the traceability<br />
of microbes in support of informed and<br />
timely countermeasures. Particularly single<br />
nucleotide polymorphism (SNP) discovery<br />
and typing often surpass labor-intensive molecular<br />
typing schemes in offering both robust<br />
long- and short-term high-resolution variant<br />
markers. Methods: Our group has developed<br />
a bioinformatics SNP Discovery and Validation<br />
Pipeline (SNPDV) implemented on the<br />
open source platform Galaxy coded in Python<br />
that makes use of the wealth of sequence data<br />
for the whole genome sequence typing of<br />
bacterial pathogens. Its modular architecture<br />
guarantees high flexibility and scalability for<br />
SNP discovery and validation and can be run<br />
in serial, multiprocessor, or threaded version<br />
depending on available server capabilities.<br />
This pipeline allows to rapidly type strains of<br />
unknown provenance by testing allelic states<br />
in already established SNPs panels, or for unbiased<br />
de novo discovery. Results: Given the<br />
clonal nature of outbreaks, we have successfully<br />
deployed this pipeline in the investigation<br />
of the 2010 Haiti cholera and 2006 spinach<br />
outbreaks, and diverse other enteric and<br />
emerging pathogens, such as Yersinia pestis,<br />
Bacillus spp., Vibrio spp., and Acinetobacter<br />
baumannii, achieving improved phylogenetic<br />
accuracy and resolution. Discussion: Following<br />
the concept of genomic epidemiology<br />
the gathered phylogenomic data have major<br />
public health relevance in utilizing sequencebased<br />
information for improved surveillance,<br />
strain attribution, and source identification<br />
to contain outbreaks. Canonical SNPs can be<br />
implemented in efficient typing assays offering<br />
robust phylogenetic signals for outbreak inand<br />
exclusion that surpass classical technologies.<br />
The phylogenomic framework is a further<br />
critical resource to precisely cluster pathogens<br />
into subgroups distinguished by different genotypic,<br />
epidemiological, and phenotypic traits,<br />
such as whether particular genotypes reflect<br />
highly pathogenic subpopulations. Future<br />
developments are directed towards the cloud<br />
implementation of this pipeline in collaboration<br />
with the UTSA Cloud and Big Data Laboratory<br />
(NSF Cloud).<br />
n 12<br />
COMBINING LABORATORY AND<br />
EPIDEMIOLOGICAL DATA FOR OUTBREAK<br />
INVESTIGATION AND PUBLIC HEALTH<br />
SURVEILLANCE: LESSONS FROM AN<br />
OUTBREAK OF HIV-1 INFECTION LINKED TO<br />
INJECTION DRUG USE OF OXYMORPHONE _<br />
INDIANA, 2015<br />
E. M. Campbell 1 , R. R. Galang 1 , J. Gentry 2 , P.<br />
J. Peters 1 , S. J. Blosser 2 , E. L. Chapman 2 , C.<br />
Conrad 2 , J. W. Duwve 2 , L. Ganova-Raeva 3 , W.<br />
Heneine 1 , D. Hillman 2 , H. Jia 1 , L. Lui 2 , J. Lovchik<br />
2 , A. Perez 2 , P. Peyrani 4 , P. Pontones 2 , S.<br />
Ramachandran 3 , J. C. Roseberry 2 , M. Sandoval<br />
2 , A. Shankar 1 , H. Thai 3 , G. Xia 3 , Y. Khudyakov<br />
3 , W. H. Switzer 1 ;<br />
1<br />
Division of HIV/AIDS Prevention, National<br />
Center for HIV/AIDS, Viral Hepatitis, STD,<br />
and TB Prevention, CDC, Atlanta, GA, 2 Indiana<br />
State Department of Health, Indianapolis,<br />
IN, 3 Division of Viral Hepatitis, National Center<br />
for HIV/AIDS, Viral Hepatitis, STD, and<br />
TB Prevention, CDC, Atlanta, GA, 4 Division of<br />
Infectious Diseases, University of Louisville,<br />
Louisville, KY.<br />
In January 2015, a cluster of HIV-1 infections<br />
was detected in rural Indiana among persons<br />
who reported injecting the prescription opioid,<br />
oxymorphone. As of May, HIV-1 infection<br />
was diagnosed in 153 individuals. Molecular<br />
analyses of HIV-1 and HCV sequences were<br />
combined with epidemiological data via a<br />
novel bioinformatics pipeline to infer the<br />
timing of HIV transmission relative to HCV<br />
and to explore risk factors associated with<br />
46<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
the inferred transmission network. Interviews<br />
were conducted with HIV-1-positive patients<br />
to capture high-risk contacts with respect to<br />
needle-sharing events and sexual encounters.<br />
HIV polymerase (pol) and HCV NS5B gene<br />
sequences obtained from blood specimens<br />
from newly diagnosed infections were phylogenetically<br />
analyzed. Clusters were defined<br />
when HIV-1 pol sequences were highly genetically<br />
related (0.99).<br />
Genetic distances < 1.5% were used to infer<br />
the HIV-1 transmission network. Molecular,<br />
demographic, behavioral, and high-risk contact<br />
data were combined to discern transmission<br />
and haplotype networks. Interactive exploration<br />
of these networks through open source<br />
tools informed public health response and<br />
helped to prioritize resources. One large cluster<br />
of HIV-1 subtype B infections was identified<br />
(n= 55) that was comprised of three primary<br />
haplotypes. Of 36 HIV-infected specimens<br />
with HCV antibody results, 34 (94%) were<br />
HCV co-infected. Among all HCV infections,<br />
genotype 1a (n=82) was most common, followed<br />
by 3a (n=29), 2b (n=5), and 1b (n=3).<br />
Three unique clusters of HCV strains were<br />
identified (Cluster 1, n = 45; Cluster 2, n =9;<br />
Cluster 3, n = 7. Of 118 HCV-infected specimens<br />
with HIV antibody results, 38 (32.2%)<br />
were HIV co-infected. The overwhelming<br />
majority of HIV transmission events (98.3%)<br />
occurred during needle-sharing rather than<br />
sexual encounters (1.7%). Positive assortativity<br />
(r=0.17) among needle-sharing encounters<br />
was observed between individuals who<br />
identified as commercial sex workers. In this<br />
outbreak, a single strain of HIV-1 was introduced<br />
into a population infected with multiple<br />
HCV strains. Due to the lack of molecular<br />
diversity, transmission network reconstruction<br />
was uninformative until contextualized<br />
with epidemiological data afforded by patient<br />
interviews. The heterogeneity of HCV strains<br />
(clustering and non-clustering) suggests earlier<br />
introduction of HCV compared with HIV.<br />
These data demonstrate the outbreak potential<br />
with introduction of HIV-1 into a community<br />
where HCV prevalence is high among persons<br />
who inject prescription opioids. These results<br />
also highlight the utility of combining multiple<br />
bioinformatics methods into a single pipeline<br />
to rapidly synthesize data from local, state, and<br />
federal public health institutions for characterizing<br />
transmission networks and to provide a<br />
rapid public health response.<br />
n 13<br />
AN AUTOMATED PIPELINE FOR MICROBIAL<br />
GENOME SEQUENCE ASSEMBLY AND<br />
CHARACTERIZATION<br />
F. Onmus-Leone, E. Snesrud, L. Appalla, P.<br />
McGann, A. Ong, R. Maybank, M. Hinkle, E.<br />
Lesho, R. Clifford;<br />
Walter Reed Army Institute of Research, Silver<br />
Spring, MD.<br />
Background: The Multidrug-resistant organism<br />
Repository & Surveillance Network<br />
(MRSN) collects and characterizes multidrug<br />
resistant bacteria isolated throughout the US<br />
military healthcare system. To complement<br />
species identification, antibiotic susceptibility<br />
testing and molecular assays, the MRSN has<br />
begun to employ whole genome sequencing<br />
(WGS) for routine bacterial characterization.<br />
Objective: To perform WGS on the hundreds<br />
of isolates received each month, MRSN has<br />
developed an automated analysis pipeline built<br />
from commercial, open source and custom<br />
software. Methods: Sequencing platforms<br />
used by MRSN are the Illumina MiSeq, Illumina<br />
NextSeq and PacBio RSII. Assembly begins<br />
with the merging of overlapping sequencing<br />
reads using FLASh and the filtering of lowquality<br />
portions of reads with Btrim. Filtered<br />
reads are then assembled with GS Assembler<br />
software. Assembly quality is measured by<br />
read coverage, the number of contigs and contig<br />
N50. Species identification is performed by<br />
searching against the SILVA 16S rDNA sequence<br />
database; contamination is indicated by<br />
the presence of 16S sequences from multiple<br />
species or the existence of multiple contigs<br />
with a complete 16S gene from one species.<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
47
Poster <strong>Abstracts</strong><br />
Assemblies that pass quality control continue<br />
through the pipeline for the next phase of analysis,<br />
which includes identification of resistance<br />
genes, identification of virulence genes, plasmid<br />
Inc typing and multilocus sequence typing.<br />
These analyses are BLAST-based searches<br />
against nucleotide and protein databases from<br />
public sources that are quality checked, expanded<br />
with the addition of novel genes reported<br />
from the literature and curated by MRSN.<br />
The RAST annotation server and Prokka are<br />
used for “working” whole genome annotation.<br />
Final genome annotation is performed by the<br />
NCBI upon sequence submission. Determining<br />
strain relatedness is critical for hospital infection<br />
control and disease epidemiology. For<br />
whole-genome phylogenetic analysis, sequencing<br />
reads from isolates are aligned to a reference<br />
genome using Bowtie. High quality SNPs<br />
and indels in the core genome are used to build<br />
evolutionary trees. Overall genome similarity<br />
(chromosomes and plasmids) among strains is<br />
measured using BLAST-based methods; similarity<br />
matrices are used for UPGMA clustering.<br />
Results: Our pipeline allows the automated<br />
analysis of 50 genomes per week and can be<br />
readily scaled up. We have processed ~1800<br />
bacterial genomes sequenced on Illumina platforms<br />
to date. Custom scripts control the pipeline,<br />
handle file management, produce reports<br />
and prepare results for loading into the MRSN<br />
database. Conclusion: The value of WGS data<br />
to infectious disease surveillance is unequaled,<br />
allowing comprehensive identification of resistance<br />
loci and definitive determination of strain<br />
relatedness. The analytic pipeline developed<br />
by MRSN will be used to characterize all bacteria<br />
in our repository.<br />
n 14<br />
ARMOR-D: A DATABASE LINKING<br />
DEMOGRAPHIC, PHENOTYPIC AND<br />
GENOMIC DATA FOR MULTIDRUG-<br />
RESISTANT BACTERIA<br />
R. Clifford 1 , M. Julius 1 , Y. Kwak 1 , F. Onmus-<br />
Leone 1 , L. Appalla 1 , E. Snesrud 1 , J. Padilla 1 ,<br />
G. Ward 1 , M. Sparks 1 , P. Waterman 2 , M. Hinkle<br />
1 , E. Lesho 1 ;<br />
1<br />
Walter Reed Army Institute of Research, Silver<br />
Spring, MD, 2 Armed Forces Health Surveillance<br />
Center, Silver Spring, MD.<br />
Background: In response to Executive Order<br />
#13676 Combating Antibiotic Resistant Bacteria,<br />
interagency efforts to link phenotypic,<br />
demographic and sequence data in a central<br />
database similar to Genbank have recently<br />
begun. Objective: To assist, we describe the<br />
Antimicrobial Resistance Monitoring and Research<br />
Database (ARMoR-D) developed for<br />
the Department of Defense (DoD) healthcare<br />
system. Methods: ARMoR-D is a relational<br />
database built upon SQL Server®, C#, .NET®<br />
and other enterprise technologies. Data objects<br />
are highly normalized and modeled after those<br />
naturally falling in this domain - primarily isolates<br />
and results. Related business rules are not<br />
part of the data model, but are implemented in<br />
the application code. User managed “dictionary”<br />
database tables enhance data quality control.<br />
ARMoR-D has a web-based user interface<br />
to support characterization and archiving of<br />
isolates. It imports data from submitting facilities<br />
and test results from the central referral<br />
laboratory, and manages repository inventory.<br />
ARMoR-D stores high-level whole genome<br />
sequencing data from a semi-automated pipeline,<br />
including 16S-based species identification,<br />
MLST typing, antibiotic resistance and<br />
virulence genes, quality control metrics, and<br />
the locations of sequence assembly files on the<br />
system. ARMoR-D has a generic data model<br />
that can be easily extended to support new<br />
organisms, sample types, laboratory test results<br />
and other data that pertain to the characterization<br />
of biological isolates and/or isolate reposi-<br />
48<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
tories. Due to its unique mission, the ARMoR<br />
Program typically collects multiple results<br />
for the same test per isolate; thus there is no<br />
limit on the number of data records associated<br />
with a given data object. Special features<br />
in ARMoR-D flag and resolve discrepancies<br />
among replicate test results. ARMoR-D also<br />
allows users to work effectively with de-identified<br />
data while retaining restricted access to<br />
personally identifiable information necessary<br />
for epidemiology and transmission analyses.<br />
Results: Currently, ARMoR-D contains 2 million<br />
test results from the centralized analyses<br />
of more than 30,000 clinical isolates, along<br />
with their associated demographic information<br />
and repository locations. The database also<br />
contains sequence-based results for more than<br />
1000 of these bacterial samples. Conclusion:<br />
ARMoR-D provides a useful model for those<br />
seeking to link demographic, phenotypic, and<br />
genomic data as they respond to the national<br />
strategy to combat antibiotic resistant bacteria.<br />
Future versions of ARMoR-D will include a<br />
public-facing webpage with decision support<br />
tools and calculators for optimized antibiotic<br />
dosing, and machine based learning algorithms<br />
for identifying genetic determinants of antimicrobial<br />
resistance and predicting drug-resistant<br />
phenotypes from the genome sequences of<br />
isolates that have not yet undergone automated<br />
susceptibility testing.<br />
n 15<br />
A WEB-HOSTED R WORKFLOW TO<br />
SIMPLIFY AND AUTOMATE THE ANALYSIS<br />
OF 16S NGS DATA<br />
J. Bradshaw 1 , T. Purucker 2 , K. Wong 1 , M.<br />
Molina 2 ;<br />
1<br />
ORISE, Athens, GA, 2 U.S. EPA, Athens, GA.<br />
Next-Generation Sequencing (NGS) produces<br />
large data sets that include tens-of-thousands<br />
of sequence reads per sample. For analysis of<br />
bacterial diversity, 16S NGS sequences are<br />
typically analyzed in a workflow that containing<br />
best-of-breed bioinformatics packages that<br />
may leverage multiple programming languages<br />
(e.g., Python, R, Java, etc.). The process to<br />
transform raw NGS data to usable operational<br />
taxonomic units (OTUs) can be tedious due to<br />
the number of quality control (QC) steps used<br />
in QIIME and other software packages for<br />
sample processing. Therefore, the purpose of<br />
this work was to simplify the analysis of 16S<br />
NGS data from a large number of samples by<br />
integrating QC, demultiplexing, and QIIME<br />
(Quantitative Insights Into Microbial Ecology)<br />
analysis in an accessible R project. User command<br />
line operations for each of the pipeline<br />
steps were automated into a workflow. In addition,<br />
the R server allows multi-user access<br />
to the automated pipeline via separate user<br />
accounts while providing access to the same<br />
large set of underlying data. We demonstrate<br />
the applicability of this pipeline automation<br />
using 16S NGS data from approximately<br />
100 stormwater runoff samples collected<br />
in a mixed-land use watershed in northeast<br />
Georgia. OTU tables were generated for each<br />
sample and the relative taxonomic abundances<br />
were compared for different periods over storm<br />
hydrographs to determine how the microbial<br />
ecology of a stream changes with rise and fall<br />
of stream stage. Our approach simplifies the<br />
pipeline analysis of multiple 16S NGS samples<br />
by automating multiple preprocessing, QC,<br />
analysis and post-processing command line<br />
steps that are called by a sequence of R scripts.<br />
n 16<br />
DOING PIPELINES BETTER<br />
M. J. Stanton-Cook, T. J. Robinson, N. L. Ben<br />
Zakour, S. A. Beatson;<br />
The University of Queensland, Brisbane, AUS-<br />
TRALIA.<br />
Over the last three years we have developed<br />
the Banzai Microbial Genomics Pipeline<br />
(https://github.com/mscook/Banzai-MicrobialGenomics-Pipeline).<br />
Banzai aims to<br />
simplify the analysis of microbial next-gen<br />
sequencing (NGS) datasets. It was specifically<br />
designed to distribute workload over internal<br />
and external High Performance Computing<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
49
Poster <strong>Abstracts</strong><br />
(HPC) resources. Banzai (in most cases) does<br />
not provide new NGS algorithms, but it harnesses<br />
the power of tried and tested NGS tools.<br />
Banzai simplifies, automates and distributes<br />
computational workloads, which is the typical<br />
bottleneck in analysis of large NGS datasets.<br />
Our local Banzai Microbial Genomics Pipeline<br />
install was initially tightly coupled to an aging<br />
HPC resource. Here, I will discuss how we<br />
future proofed and migrated our pipeline using<br />
DevOPs approaches. In particular, how we<br />
restructured the code base, abstracted out the<br />
details of the underlying compute resources,<br />
and how we now treat the entire Banzai Pipeline<br />
and underlying compute resources as an<br />
Infrastructure as Code model. This is great<br />
first hand lesson how we had to significantly<br />
change our development practice to consider<br />
long term sustainability of our pipeline. In less<br />
than three years, the Beatson Laboratory has<br />
had to scale from the simultaneous analysis of<br />
10 to 100 to 1000 genomes. By understanding<br />
and employing DevOps based approaches the<br />
genomics community will be able to build and<br />
scale reproducible analysis pipelines with minimal<br />
modification to their existing processes.<br />
n 17<br />
DEVELOPMENT OF A PLATFORM FOR<br />
PLANT VIRUS DIAGNOSTICS AND<br />
CHARACTERIZATION USING NEXT<br />
GENERATION SEQUENCING<br />
P. A. Gutiérrez, L. Muñoz Baena, H. Jaramillo<br />
Mesa, D. Muñoz Escudero, M. A. Marín Montoya;<br />
Universidad Nacional de Colombia, Medellin,<br />
COLOMBIA.<br />
Next generation sequencing methods are becoming<br />
an essential tool for the discovery and<br />
characterization of plant viruses as well as in<br />
the diagnostics of infected material in seed<br />
certification and virus management programs.<br />
Unfortunately, for traditional plant pathologists,<br />
efficient use and interpretation of the<br />
massive amount of data obtained from NGS<br />
is still a daunting task that generally requires<br />
of expert bioinformaticians to extract useful<br />
information. In this work, PVDT (Plant Virus<br />
Diagnostic Tool), an automated diagnostic<br />
platform for plant virus detection and characterization<br />
is presented. PVDT allows for raw<br />
data quality checking, sampling of reads and<br />
outputs an easy to interpret diagnosis of virus<br />
genera and their relative levels in the sample.<br />
A secondary analysis allows for virus assembly<br />
using either a sequential de novo method<br />
or mapping with a reference genome. Finally,<br />
PVDT determines whether detected viruses<br />
could be a novel species using ICTV criteria<br />
or an uncharacterized variant. The program<br />
was tested succesfully using simulated data<br />
sets and plant transcriptomes of Potato (Solanum<br />
tuberosum, S. phureja), Cape gooseberry<br />
(Physalis peruviana), Tamarillo (S. betaceum),<br />
Bell pepper (Capsicum annuum), Tomato (S.<br />
lycopersicum), Angel´s trumpet (Brugmansia<br />
candida) and Pinto peanut (Arachis pintoi).<br />
This platform is a contribution to the development<br />
of user-friendly tools for diagnostic<br />
purposes using NGS data and hopefullly will<br />
be useful to agricultural decision-making institutions.<br />
This work was funded by Universidad<br />
Nacional de Colombia (Grant VRI: 19438) and<br />
International Foundation for Science (Sweden,<br />
Grant: C/4634-2).<br />
n 18<br />
COMPARISON OF PHYLOGENIC<br />
METHODS FOR IDENTIFYING PUTATIVE<br />
TRANSMISSIONS OF ENTEROCOCCUS<br />
FAECIUM IN A HOSPITAL<br />
H. C. Lin 1 , T. Mi 1 , G. Wang 2 , W. Huang 2 , P.<br />
Mayigowda 1 , K. Murugesan 1 , A. Gupta 1 , J. T.<br />
Fallon 2 , N. Dimitrova 1 ;<br />
1<br />
Philips Research North America, Briarcliff<br />
Manor, NY, 2 New York Medical College, Valhalla,<br />
NY.<br />
Background: Enterococcus faecium is a major<br />
cause of hospital acquired infection in the<br />
US. Recently, whole genome sequencing data<br />
has been used for molecular typing, antibiotic<br />
resistance characterization and transmission<br />
50<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
route reconstruction. For the purpose of reconstructing<br />
putative transmission routes, a<br />
variety of phylogeny building methods have<br />
been developed and can help identify potential<br />
transmission events within a hospital environment.<br />
Method: We sequenced 149 E. faecium<br />
isolates of MLST type ST736 from 106 patients<br />
over 55 months at Westchester Medical<br />
Center, and reconstructed phylogenies on the<br />
samples with various distance based methods.<br />
To construct the phylogenies, we used a variant<br />
calling pipeline to determine likely SNPs<br />
within the samples, and then we applied various<br />
filters to remove inaccurate SNPs, such<br />
as removing SNPs in repetitive regions and<br />
phage regions. Afterwards, a pairwise distance<br />
matrix was built based on the SNP calls, and<br />
finally, we applied various phylogeny building<br />
methods to the distance matrix. The methods<br />
we experimented with were neighbor-joining,<br />
maximum likelihood, and Prim’s (undirected)<br />
and Edmond’s (directed) minimum spanning<br />
tree (MST) algorithms. The results of these<br />
methods were then compared to each other to<br />
determine how robust and accurate our results<br />
may be. Lastly, we also evaluated how frequently<br />
antibiotic resistance increased along<br />
the predicted transmission events in our tree.<br />
Results: The trees we generated mostly have<br />
good concordance with each other often with<br />
more than 80% of subtrees matching between<br />
results. Additionally, we also computed how<br />
frequently antibiotic resistance increased along<br />
transmissions suggested by the MST methods.<br />
Both MST methods showed that antibiotic<br />
resistance generally increased along transmission<br />
edges in the trees, and Edmond’s directed<br />
MST method showed a higher percentage of<br />
the predicted transmission events resulting in<br />
increased daptomycin resistance, with about<br />
3/4 of transmissions indicating an increase in<br />
resistance. As we expect resistance to increase<br />
as transmissions occur, this may indicate that<br />
Edmond’s directed MST produced a more<br />
accurate tree. Conclusion: Our preliminary<br />
results show that Edmond’s directed MST may<br />
produce an accurate phylogeny tree, although<br />
further verification may be needed, as other<br />
methods yield slightly different trees. In the future,<br />
we plan to examine clinical data in order<br />
to help validate the correctness of the transmissions<br />
predicted by our phylogeny trees.<br />
n 19<br />
ANALYSIS OF AN ENTEROVIRUS D68<br />
OUTBREAK THROUGH METAGENOMIC<br />
SEQUENCING<br />
H. Lin 1 , Q. Wan 1 , W. Huang 2 , G. Wang 2 , J.<br />
Zhuge 2 , S. M. Nolan 2 , J. T. Fallon 2 , N. Dimitrova<br />
1 ;<br />
1<br />
Philips Research North America, Briarcliff<br />
Manor, NY, 2 New York Medical College, Valhalla,<br />
NY.<br />
Background: Metagenomic sequencing can be<br />
used to detect the presence of microbial organisms.<br />
Many metagenomic techniques have<br />
been developed to explore and verify microbial<br />
ecology, evolution and diversity, especially for<br />
investigating infectious viruses and bacteria.<br />
By obtaining the population distribution of<br />
classified microbial genomes in certain samples,<br />
metagenomic tools have the potential to<br />
provide insights into the causality of infectious<br />
disease outbreaks. During the Enterovirus D68<br />
outbreak in 2014, 93 nasopharyngeal swab<br />
samples were obtained from patients with<br />
symptoms of respiratory infection and were<br />
sequenced at Westchester Medical Center.<br />
Methods: We used Kraken for the purpose of<br />
classifying reads based on taxonomic orders.<br />
By using Kraken, microbial reads can be identified<br />
and subsequently enable the investigation<br />
of certain infectious disease outbreak. To<br />
obtain the best performance from Kraken, we<br />
build a pipeline to automatically fetch publicly<br />
available reference genomes from NCBI at<br />
user’s demand to keep Kraken’s classification<br />
results up-to-date for detecting pathogens present<br />
within a sample. The pipeline is not only<br />
able to extract reads by viruses, bacteria, and<br />
fungi species, but can also generate aggregate<br />
analysis indicating the most prevalent microorganisms<br />
within any taxonomic group. We also<br />
provide analysis about possible cohabitations<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
51
Poster <strong>Abstracts</strong><br />
of different pathogens, in terms of a pairwise<br />
correlation ranking between any species, as<br />
well as an overall plot of co-existing species<br />
within samples. Moreover, the distribution<br />
of reads classified into any species across all<br />
patients can be plotted. Results: Our pipeline<br />
detects 1148 species among reads from the<br />
93 patient samples. Overall, the number of<br />
identified Enterovirus D68 reads ranks the 4th<br />
(8.79%) among all microbial species, but dominates<br />
viral sequences found in our samples<br />
with a proportion of 96.40% of viral reads. The<br />
next highest occurrence of viral reads consists<br />
of Human respiratory syncytial virus (2.37%)<br />
and Rhinovirus B (0.32%). Additionally, the<br />
pipeline reveals a possible cohabitation between<br />
Enterovirus D68 and other bacteria and<br />
viruses, such as Rhinovirus, which can both<br />
cause symptoms of respiratory infection. We<br />
observe that EVD68 is exceeded by Moraxella<br />
catarrhalis (41.34%), Haemophilus influenzae<br />
(20.79%) and Pseudomonas aeruginosa<br />
(16.55%), which can be pathogenic bacteria.<br />
Conclusion: We built an automatic metagenomic<br />
analysis pipeline designed for detecting<br />
pathogens that can cause infectious disease<br />
outbreaks at hospitals. The pipeline provides<br />
tools to visualize and investigate sequence read<br />
classification to conveniently discover pathogens<br />
residing within samples and discover<br />
microbial cohabitations.<br />
n 20<br />
MUTATION RATE ANALYSIS TO AID<br />
IN IDENTIFYING TRANSMISSIONS OF<br />
ENTEROCOCCUS FAECIUM IN A HOSPITAL<br />
SETTING<br />
P. Mayigowda 1 , A. Gupta 1 , H. Lin 1 , K. Murugesan<br />
1 , T. Mi 1 , G. Wang 2 , A. Dhand 2 , J. Fallon 2 ,<br />
N. Dimitrova 1 ;<br />
1<br />
Philips Research North America, Briarcliff<br />
Manor, NY, 2 New York Medical College, Valhalla,<br />
NY.<br />
Next-generation sequencing (NGS) has proved<br />
instrumental in tracing the spread and evolution<br />
of bacterial populations through time and<br />
geographical locations. With genome-wide<br />
analysis of single nucleotide polymorphisms<br />
(SNPs) we can measure genetic evolution by<br />
comparing SNP changes in a genome and estimate<br />
the mutation rate over time of a particular<br />
pathogen of interest. This estimated mutation<br />
rate can then be used to determine if there may<br />
have been a transmission between any two<br />
patients. Namely, if the number of changes<br />
occurring over time in a pathogen sequenced<br />
from two patients is within the typical number<br />
of changes we expect to see based on the expected<br />
mutation rate, then we can infer that a<br />
transmission may have occurred. If the number<br />
of changes is outside the typical mutation rate,<br />
then we can conclude that a transmission likely<br />
did not occur. We analyzed 149 non-duplicate<br />
Enterococcus faecium isolates belonging to<br />
clone ST736 to estimate the mutation rate for<br />
E. faecium isolates to help identify a probable<br />
path of the spread of the pathogen. These isolates<br />
were collected from 106 patients at Westchester<br />
Medical Center, New York, on various<br />
dates from 2009 to 2013. Out of these 106<br />
patients, 28 had two or more isolates, which<br />
were compared pairwise to estimate the mutation<br />
rate using linear regression on the number<br />
of SNP differences over time for 59 pairs of<br />
isogenic samples. To visualize the mutation<br />
rates of multidrug resistant isolates, separate<br />
mutation rates were calculated for 7 pairs of<br />
samples that maintained their status as daptomycin<br />
susceptible (dap S), and 32 pairs that<br />
maintained daptomycin non-susceptible (dap<br />
NS) status. We also analyzed pairs of isolates<br />
that converted from dap S to dap NS (8 pairs)<br />
or vice-versa (12 pairs). The mutation rate was<br />
observed to be 10.0 Substitutions per Mb per<br />
Year (SMY) after filtering for outliers (mean ±<br />
3SD) through the mean shift approach. Paired<br />
isolates maintaining dap S status had a mean<br />
mutation rate of 5.85 SMY, while the paired<br />
dap NS isolates had a mutation rate of 12.3<br />
SMY. For conversion of isolates from dap S<br />
to dap NS the mutation rate was 12.6 SMY<br />
while dap NS to dap S had a particularly high<br />
mutation rate of 53.2 SMY. These results are<br />
52<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
comparable to a recent study which reported<br />
mutation rates for different clades of a phylogenetic<br />
tree constructed from 73 enterococci<br />
isolates. The rate was 49 ± 3 SMY for a clade<br />
belonging to clonal complex CC17, and was<br />
3.6 ± 0.6 SMY and 13 ± 2 SMY for two<br />
clades with mixed STs. We have developed<br />
a method for mutation rate inference using<br />
SNPs obtained from NGS data. Standardizing<br />
the filtering process will allow expanding this<br />
approach to other species. With studies reporting<br />
variation in mutation rates within species<br />
such procedural methodology becomes all the<br />
more important. This bioinformatics approach<br />
to identify disease transmission paths can help<br />
recognize, control, and diagnose bacterial<br />
clinical outbreaks.<br />
n 21<br />
COMPARISON OF RNA AND DNA<br />
EXTRACTION KITS FOR VIRUS DETECTION<br />
BY NEXT GENERATION SEQUENCING (NGS)<br />
J. Klenner, C. Kohl, P. Dabrowski, A. Nitsche;<br />
Robert Koch Institute, Berlin, GERMANY.<br />
A crucial step in the molecular detection of<br />
viruses in clinical specimens is the efficient<br />
extraction of viral nucleic acids. The total yield<br />
of viral nucleic acid from a clinical specimen<br />
is dependent on the specimens’ volume, the<br />
initial virus concentration and the effectiveness<br />
provided by the extraction method. Recent<br />
Next Generation Sequencing (NGS)-based<br />
diagnostic approaches provide a molecular<br />
‘open view’ into the sample, as they generate<br />
sequence reads of any nucleic acid present<br />
in a specimen in a statistically representative<br />
manner. However, since a higher virus-related<br />
read output promises better sensitivity in the<br />
subsequent bioinformatic analysis, the extraction<br />
method selected determines the reliability<br />
of diagnostic NGS. In this study four commercially<br />
available nucleic acid extraction<br />
kits (QIAGEN, Hilden, Germany: QIAamp<br />
Viral RNA Mini Kit, QIAamp DNA Blood<br />
Mini Kit, QIAamp cador Pathogen Mini Kit<br />
and QIAamp MinElute Virus Spin Kit) were<br />
evaluated by NGS. The nucleic acid yields<br />
and sequence read output were compared for<br />
four different model viruses comprising Reovirus,<br />
Orthomyxovirus, Orthopoxvirus and<br />
Paramyxovirus, each at defined but varying<br />
concentrations in the same sample. The total<br />
nucleic acid extracted was divided into two<br />
aliquots; one was subjected to RNA and the<br />
other to DNA processing for NGS. The yield<br />
of nucleic acids was determined by Qubit and<br />
virus-specific quantitative real-time PCR. NGS<br />
libraries were prepared for sequencing on<br />
the Illumina HiSeq 1500 system. Finally, the<br />
percentage of reads which could be assigned<br />
to each virus after extraction with the different<br />
kits was determined via mapping. As presented<br />
here, evaluation of the different commercial<br />
nucleic acid extraction kits indicates little<br />
deviation in the numbers for RNA and DNA<br />
reads, depending on the kit used.<br />
n 22<br />
TRANSMISSION OF HIGH RISK K.<br />
PNEUMONIAE CLONES IN HEALTH CARE<br />
NETWORKS LARGELY CHALLENGES THE<br />
CURRENT INFECTION PREVENTION AND<br />
CONTROL SYSTEM<br />
K. Zhou, M. Lokate, R. H. Deurenberg, G. C.<br />
Raangs, H. Grundmann, A. W. Friedrich, J. W.<br />
Rossen;<br />
University of Groningen, University Medical<br />
Center Groningen, Groningen, NETHER-<br />
LANDS.<br />
Controlling dissemination of multidrugresistant<br />
pathogens remains one of the major<br />
challenges in hospitals and public health. Here<br />
we describe an inter-institutional transmission<br />
of an ESBL-producing ST15 Klebsiella pneumoniae<br />
between patients caused by patient<br />
referral. An epidemiological link between<br />
the patient isolates was supported by patient<br />
contact tracing and phylogenetic analysis of<br />
the isolates obtained from May to November<br />
2012 using next generation sequencing (NGS).<br />
By May 2013, a patient treated in two institutions<br />
in two cities was involved in expanding<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
53
Poster <strong>Abstracts</strong><br />
the cluster. A clone-specific multiplex PCR<br />
was developed for patient screening by which<br />
another patient was identified in September<br />
2013. Environmental surface contamination<br />
and lack of consistent patient screening were<br />
identified as risk factors. Our study highlights<br />
the challenge of controlling the transmission<br />
of K. pneumoniae high risk clones (HiRiCs),<br />
suggesting the necessity for active surveillance<br />
and inter-institutional collaboration for<br />
outbreak management. In addition, the use of<br />
NGS for typing and for developing an outbreak-specific<br />
multiplex PCR facilitated rapid<br />
patient screening procedures and was important<br />
for optimizing outbreak management.<br />
n 23<br />
WHOLE GENOME SEQUENCING OF FOUR<br />
ENTEROPATHOGENIC E. COLI STRAINS<br />
ASSIGNED TO A NEW SEQUENCE TYPE<br />
ST4554<br />
M. Ferdous 1 , K. Zhou 1 , A. M. Kooistra-Smid 2 ,<br />
A. W. Friedrich 1 , J. W. Rossen 1 ;<br />
1<br />
University of Groningen, University Medical<br />
Center Groningen, Groningen, NETHER-<br />
LANDS, 2 University Medical Center Groningen<br />
and Certe Laboratory for Infectious Diseases,<br />
Groningen, NETHERLANDS.<br />
Enteropathogenic Escherichia coli (EPEC) is a<br />
leading cause of infantile diarrhoea in developing<br />
countries. They are comprised of a large<br />
heterogeneous group of strains and serotypes.<br />
EPEC strains assigned to a new sequence type<br />
(ST4554) were isolated from four different<br />
patients with gastrointestinal complaints in<br />
two different regions of the Netherlands during<br />
July 2013-February 2014. No epidemiological<br />
link was found between the four patients.<br />
All four isolates were non-motile, beta glucuronidase<br />
negative, sorbitol fermenting and<br />
of phylogenetic group B2. All were resistant<br />
to ampicillin, three to trimethoprim and trimethoprim/sulfamethoxazole<br />
whereas the<br />
fourth one was resistant to tetracycline. Whole<br />
genome sequencing (WGS) was performed on<br />
the four strains for detailed characterization<br />
and to compare them with their closest relative<br />
E. coli strain. The serogenotype and the<br />
presence of antibiotic resistance and virulence<br />
genes were determined using the Centre for<br />
Genomic Epidemiology (CGE) web tool. Phylogenetic<br />
analysis was done using Ridom Seq-<br />
Sphere+. The serogenotype of the isolates was<br />
O157:H39. Virulence profiling revealed the<br />
presence of adhesin genes eae (type kappa), tir,<br />
iha, and secretion system genes espA, espC,<br />
espI, espJ, and etpD. Virulence genes were<br />
located on a pathogenicity island, plasmid or<br />
insertion element. The antibiotic resistance<br />
genes strA, strB, sul and dfrA were located on<br />
a pCERC1-like resistant plasmid found in E.<br />
coli strain S1.2.T2R, whereas the tetA gene<br />
was located on transposon Tn121. Phylogenetic<br />
analysis using core genome MLST revealed<br />
that three of our isolates contained no different<br />
alleles but differed from the fourth one in 62<br />
alleles. Subsequent SNP analysis revealed 20<br />
SNPs among the three isolates: 5 non-synonymous,<br />
4 synonymous and 11 in intergenic<br />
regions. E. coli strain E2348/69 was found as<br />
the most closely related of our isolates when<br />
determining the phylogeny including 60 complete<br />
genomes of E. coli available in NCBI. A<br />
genome-wide comparison of the four isolates<br />
with E. coli E2348/69 shows that they lacked<br />
most of the mobile genetic elements (MGEs)<br />
found in E. coli E2348/69 except one prophage<br />
pp1, one insertion element IE5 and two pathogenicity<br />
islands (LEE and espC pathogenicity<br />
island). They possess additional MGEs as, e.g.,<br />
prophage p16 of EPEC O26:H11 strain 11368,<br />
plasmid pO55 of E. coli O55:H7 and plasmid<br />
pSS_O46 of Shigella sonnei. Therefore, dissemination<br />
of virulence genes by MGEs may<br />
have resulted in these new pathogenic strains.<br />
WGS allowed us to get insight into the virulent<br />
and resistant determinants of the new EPEC<br />
isolates and to reveal their phylogenetic relationship<br />
with known E. coli strains.<br />
54<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
n 24<br />
MOLECULAR CHARACTERIZATION OF SHIGA<br />
TOXIN PRODUCING ESCHERICHIA COLI<br />
(STEC) ISOLATES IN THE NETHERLANDS<br />
M. Ferdous 1 , A. M. Kooistra-Smid 2 , R. F. de<br />
Boer 2 , P. D. Croughs 3 , I. H. Friesema 4 , J. W.<br />
Rossen 1 , A. W. Friedrich 1 ;<br />
1<br />
University of Groningen, University Medical<br />
Center Groningen, Groningen, NETHER-<br />
LANDS, 2 University Medical Center Groningen<br />
and Certe Laboratory for Infectious<br />
Diseases, Groningen, NETHERLANDS, 3 Star-<br />
MDC, Rotterdam, NETHERLANDS, 4 National<br />
Institute for Public Health and the Environment,<br />
Bilthoven, NETHERLANDS.<br />
Shiga toxin producing Escherichia coli (STEC)<br />
is one of the major causes of human gastrointestinal<br />
disease and has been implicated in<br />
sporadic cases and outbreaks of diarrhoea,<br />
hemorrhagic colitis and haemolytic uraemic<br />
syndrome worldwide. Whole genome sequencing<br />
(WGS) was performed on 131 STEC<br />
isolates obtained from faeces of patients with<br />
gastrointestinal complaints from the regions<br />
Groningen and Rotterdam of the Netherlands,<br />
during April 2013 to March 2014 as part of the<br />
STEC-ID-net study. Multilocus sequence type,<br />
serogenotype, Shiga toxin encoding gene (stx)<br />
subtype and the presence of antibiotic resistance<br />
and virulence genes were determined<br />
using the Centre for Genomic Epidemiology<br />
(CGE) web tool. Phylogenetic analysis was<br />
done using Ridom SeqSphere+. Based on clinical<br />
symptoms (available for 76 patients), patients<br />
were divided into severe (n=28), moderate<br />
(n=35) and mild (n=13) groups. Statistical<br />
analyses were done by the Pearson Chi-Square<br />
and Fisher’s exact test. Based on WGS data a<br />
diversity of serotypes and sequence types (ST)<br />
were found with serotypes O91:H14 (14.5%),<br />
O157:H7(13%), O26:H11(11%), O103:H2<br />
(8%), O128:H2 (5%), O63:H6 (4%), O5:H9<br />
(3%) and sequence types ST33 (14.5%), ST11<br />
(13%), ST21 (13%), ST17 (7.6%), ST25<br />
(4.5%), ST583 (4%) being the most predominant<br />
ones. Several stx subtype combinations<br />
including stx1a (41%), stx2c, (8%), stx2f<br />
(8%), stx1c+stx2b (8%), stx1a+stx2c (7%),<br />
stx1c (7%), stx1a+stx2a (6%) and stx2a (6%)<br />
being predominant were found among the<br />
isolates. The presence of stx1 and stx2 gene<br />
together was significantly (P=0.02) associated<br />
with the severe patient group (46%). Among<br />
virulence genes other than stx, the toxin encoding<br />
genes toxB, astA and non-LEE-encoded<br />
effector protein encoding gene nleC were<br />
detected with high prevalence in this group<br />
(P
Poster <strong>Abstracts</strong><br />
n 25<br />
RAPID DIAGNOSTIC TESTING OF<br />
MYCOBACTERIUM TUBERCULOSIS<br />
CULTURES USING WHOLE-GENOME<br />
SEQUENCING<br />
P. Lapierre, J. Shea, T. Halse, M. Shudt, P. Van<br />
Roey, V. Escuyer, K. Musser;<br />
NY DOH Wadsworth Center, Albany, NY.<br />
Mycobacterium tuberculosis (MTB) remains<br />
an important pathogen today, infecting more<br />
than a third of the world population. The cost<br />
associated with the diagnostic and treatment<br />
of MTB can be considerable, specifically in<br />
cases involving MDR or XDR strains. Weeks<br />
to months are usually required to identify<br />
mycobacterial species, determine drug susceptibilities,<br />
and generate molecular genotyping<br />
for epidemiological purposes using traditional<br />
methods due to the slow growth rate of<br />
MTB. Whole genome sequencing (WGS) is<br />
perceived as a potential alternative for MTB<br />
diagnostics that will greatly improve the time<br />
required for the complete molecular profiling<br />
and antibiotic resistance prediction of new<br />
MTB cases. We have conducted a retrospective<br />
study on 87 clinical isolates, consisting of 51<br />
pure isolates on solid media and 36 early positive<br />
liquid cultures (MGIT), to determine the<br />
feasibility of using WGS as part of our routine<br />
testing algorithm for MTB samples. We have<br />
developed an efficient DNA extraction and<br />
library preparation method that yield consistent<br />
depth and data quality from WGS, utilizing<br />
Illumina MiSeq instrumentation. We built<br />
a bioinformatics pipeline that compares the<br />
Illumina reads against M. tuberculosis H37Rv<br />
reference strain for identification, in silico<br />
spoligotyping, resistance profiling and phylogenetic<br />
analysis. Our results show an almost<br />
complete concordance for strain identification<br />
and spoligotype determination when compared<br />
with current molecular methods, as well as<br />
more than 90% concordance between our in<br />
silico resistance prediction and our current<br />
molecular and conventional drug susceptibility<br />
testing methods. We estimate the total time<br />
required using WGS as a complete diagnostic<br />
test for culture samples to be approximately 7<br />
days. Given the diagnostic accuracy and reproducibility<br />
of this method, and the significant<br />
reduction in time from receipt of specimen to<br />
WGS results to predict antibiotic resistance, it<br />
is financially appealing to adopt this technology<br />
in a public health setting.<br />
n 26<br />
CHARACTERIZATION OF AN IMI-1<br />
CARBAPENEMASE-PRODUCING COLISTIN-<br />
RESISTANT ENTEROBACTER CLOACAE BY<br />
GENOME SEQUENCING AND INSERTIONAL<br />
MUTAGENESIS<br />
Z. Zong;<br />
West China Hospital, Sichuan University,<br />
Chengdu, CHINA.<br />
Background: A carbapenem-resistant Enterobacter<br />
cloacae, WCHECl-1060, was recovered<br />
from blood of a patient after stem cell transplantation.<br />
It was found carrying blaIMI-1, a<br />
carbapenemase gene by PCR and sequencing<br />
and was high-level resistance to colistin (MIC,<br />
64 mg/L). However, its genetic context of<br />
blaIMI-1 and mechanisms for colistin resistance<br />
were unclear. Methods: WCHECl-1060<br />
was subjected to whole genome sequencing<br />
with a ca. 100× coverage using the Hiseq 2500<br />
sequencer. Reads were assembled to contigs<br />
using Spades. MLST was performed using<br />
the assembled contigs. Phages and genomic<br />
islands were predicted using PHAST and IslandViewer<br />
tools. Insertional mutagenesis by<br />
Tn5 transoposon was performed and colistinsusceptible<br />
mutants were selected using replica<br />
plating. Inverse PCR and sequencing were<br />
used to identified genes interrupted by Tn5<br />
insertion. Results: A total of 5,591,800 reads<br />
and 698,975,000 bases were obtained with<br />
a 55.8% GC content. Reads are assembled<br />
to 21 contigs that are ≥500 bp (N50 metric,<br />
714,400 bp) and contain 4,808,707 bases.<br />
WCHECl-1060 belonged to a new ST, ST420.<br />
In addition to blaIMI, WCHECl-1060 also has<br />
quinuolone-resistance genes oqxA and oqxB<br />
56<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
and fosfomycin-resistance gene fosA. blaIMI-1<br />
was located on chromosome and the 16.2-kb<br />
region containing blaIMI-1 (8.1 kb upstream<br />
and 7.3 kb downstream of blaIMI-1) has much<br />
lower GC content (37.1%), suggesting a foreign<br />
origin. Indeed, the 16.2-kb region was<br />
identified as a genomic island by IslandViewer.<br />
Two genes responsible for colistin resistance,<br />
phoP and yqjA, were identified by Tn5 insertional<br />
mutagenesis. phoP is part of the phoP/<br />
PhoQ two-component system and mutations<br />
of phoP can lead to colistin resistance among<br />
other Enterobacteriaceae species in previous<br />
reports. yqjA encodes an inner membrane protein<br />
but has not been associated with colistin<br />
resistance before. The 675-bp phoP and 660-bp<br />
yqjA of WCHECl-1060 have at least 6 and 3<br />
nucleotides differences from those of other<br />
Enterobacter strains with sequences available<br />
in GenBank, respectively. Conclusions:<br />
This is the first genome sequence of blaIMI-<br />
1-carrying Enterobacter strain. blaIMI-1 was<br />
on a genomic island, which may explain why<br />
only few E. cloacae carry this gene. Colistin<br />
resistance in WCHECl-1060 are likely due to<br />
mutations of phoP and yqjA.<br />
n 27<br />
COMPUTATIONAL PIPELINE FOR NEXT<br />
GENERATION SEQUENCING-BASED<br />
PATHOGEN DETECTION IN CLINICAL<br />
SETTINGS<br />
L. Albayrak 1 , M. Rojas 1 , K. Khanipov 1 , S. M.<br />
Chumakov 2 , G. Golovko 1 , M. Pimenova 1 , Y.<br />
Fofanov 1 ;<br />
1<br />
University of Texas Medical Branch, Galveston,<br />
TX, 2 University of Guadalajara, Guadalajara,<br />
Jalisco, MEXICO.<br />
In recent past, next-generation sequencing<br />
(NGS) had been exclusively performed in<br />
large sequencing centers; however, dramatic<br />
progress in sequencing technology resulted in<br />
cost reduction and the improvement of quality<br />
throughput allowing it to be used by universities<br />
and even individual labs. Moreover, the<br />
latest “benchtop” sequencers have progressed<br />
into extremely cost-effective tools that can be<br />
used for pathogen detection in clinical settings.<br />
In order to transform NGS technology from<br />
being an exclusively research tool to being<br />
routinely used for clinical diagnostics, major<br />
challenges must be resolved, including: 1. The<br />
vast amount of data and computational complexity<br />
of NGS analysis tools requiring data to<br />
be stored away from place of origin, separating<br />
sequencing from analysis and the decision<br />
making process (as well as raising security<br />
and privacy concerns); 2. The absence of standardized<br />
analysis and task specific reference<br />
databases; 3. The large and often complicated<br />
reports generated by NGS data analysis pipelines<br />
which usually require Ph.D. level scientists<br />
to interpret. To address those challenges,<br />
we present a bioinformatics pipeline capable<br />
of evaluating NGS samples quickly and accurately,<br />
with relatively low computational infrastructure<br />
requirements. By utilizing a novel<br />
data format that reduces the size of sequenced<br />
reads files and keeps only high quality sequences<br />
in binary format, we can easily store,<br />
manipulate, and transfer large amounts of data<br />
produced by NGS instruments. The use of a<br />
robust collection of reference gene sequences<br />
instead of complete genomes greatly reduces<br />
the size of the reference database without<br />
compromising ability to detect and identify<br />
pathogens. Clustering together genes based on<br />
nucleotide similarity and using just a single<br />
representative sequence further reduces redundancy<br />
among gene sequences. We propose to<br />
use a reference-by-reference search against<br />
sequenced reads rather than a read-by-read<br />
search algorithm as used by BWA and Bowtie<br />
which generate reads-centric files (e.g., BAM/<br />
SAM). The output of the proposed algorithm<br />
is much smaller than a read-centric approach<br />
and it is tied to the reference sequences instead<br />
of sequencing reads. The proposed methods<br />
of clustering, curation, compression and<br />
reference-by-reference sequence search paves<br />
way to bringing NGS based pathogen identification<br />
methods to clinical and field diagnostic<br />
settings.<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
57
Poster <strong>Abstracts</strong><br />
n 28<br />
EVALUATION OF METAGENOMIC RNA-SEQ<br />
FOR DETECTION OF ENTEROVIRUS D68<br />
AND RESPIRATORY VIRAL PATHOGENS IN<br />
CLINICAL SAMPLES<br />
W. Huang 1 , G. Wang 1 , H. Lin 2 , J. Zhuge 3 , S. M.<br />
Nolan 1 , N. Dimitrova 2 , J. T. Fallon 1 ;<br />
1<br />
New York Medical College, Valhalla, NY,<br />
2<br />
Philips Research North America, Briarcliff<br />
Manor, NY, 3 Westchester Medical Center, Valhalla,<br />
NY.<br />
Background: Next-generation sequencing<br />
(NGS) is becoming a novel approach<br />
for identifying the causative pathogens of<br />
infectious diseases directly from clinical<br />
specimens. However, practical application of<br />
NGS is hindered by lack of proven sensitive<br />
and reliable preparation protocols and by the<br />
bioinformatics challenge of building accurate<br />
and complete databases for data analysis.<br />
Here, we explored the potential application of<br />
NGS in detection of viral pathogens in clinical<br />
samples. Methods: A simple metagenomic<br />
RNA-Seq protocol, with reverse transcription<br />
and Illumina Nextera XT technology incorporated<br />
but without amplification or enrichment,<br />
was employed to analyze 93 nasopharyngeal<br />
swab specimens collected from patients in<br />
the lower Hudson Valley, New York during<br />
an outbreak of enterovirus D68 (EV-D68)-<br />
associated respiratory illness in 2014. Among<br />
these samples, 72 were positive for EV-D68<br />
using real-time reverse-transcriptase PCR<br />
(rRT-PCR) (J. Clin. Microbiol., 2015; 53:1915-<br />
1920). To detect EV-D68 and other pathogenic<br />
viruses, we employed three bioinformatics<br />
tools for analysis of the NGS data: alignment/<br />
mapping using Illumina MiSeq Reporter, the<br />
PathSEQ Virome, and the sequence-based<br />
ultra-rapid pathogen identification (SURPI)<br />
viral software. Results: NGS sequencing<br />
yielded an average of 103,531 clusters passing<br />
filter per sample. Compared to the results from<br />
rRT-PCR, we detected 65, 66 and 69 samples<br />
positive for EV-D68 using the three bioinformatics<br />
tools, respectively. We also found<br />
that the scores in the PathSEQ Virome were<br />
reversely correlated to the cycle thresholds<br />
in the rRT-PCR assay. Both PathSEQ Virome<br />
and SURPI viral excelled in the simultaneous<br />
detection of multiple viruses. PathSEQ stood<br />
out in the final report, convenient for clinical<br />
use; whereas SURPI viral was more sensitive<br />
in identification of EV-D68 and other viruses.<br />
In addition, from these NGS sequence reads<br />
we were able to detect a variety of inhabiting<br />
bacteria by using SURPI Bacteria. Conclusion:<br />
Our results support NGS approach is<br />
comprehensive and efficient tool for pathogen<br />
detection. With improved sequencing protocols<br />
and bioinformatics tools, the NGS-based assay<br />
has a great potential in actionable identification<br />
of pathogens in clinical and public health<br />
laboratory settings.<br />
n 29<br />
A NEXT GENERATION SEQUENCING<br />
METHOD FOR THE PAN-IDENTIFICATION<br />
OF VIRAL PATHOGENS IN CSF FROM<br />
ENCEPHALITIS CASES<br />
Y. Sun 1 , C. Parikh 1 , Y. Ku 1 , D. Lamson 2 , J. Au-<br />
Young 1 , K. St. George 2 , A. Felton 1 ;<br />
1<br />
Thermo Fisher Scientific, South San Francisco,<br />
CA, 2 New York State Department of Health,<br />
Albany, NY.<br />
Encephalitis is commonly caused by viral infections<br />
and can be associated with significant<br />
morbidity and mortality. While variable at<br />
different times of the year, overall, the causative<br />
agent fails to be identified in more than<br />
75% of cases with current diagnostic methods.<br />
PCR- and serology- based approaches are<br />
limited by the requirement of prior knowledge<br />
and decisions on the pathogens to be tested.<br />
We demonstrate here a deep-sequencing based<br />
method with the potential for a more universal<br />
approach to testing of viral encephalitis<br />
cases. Initially, synthetic cerebrospinal fluid<br />
(sCSF) samples were spiked with several human<br />
encephalitic viruses, at a range of viral<br />
58<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
loads commonly seen in clinical CSF samples.<br />
Cultured, genomically quantitated, DNA and<br />
RNA viruses were used for spiking. Testing on<br />
the new platform was performed in a blinded<br />
manner. Samples were sequenced on the Ion<br />
Proton platform at a depth of approximately<br />
70 million per sample. Based on these data, we<br />
developed a bioinformatics pipeline to detect<br />
viral genome in the samples. Sequencing reads<br />
of human origin were filtered by aligning to<br />
human genome (hg19). The remaining reads<br />
were mapped to a reference database comprising<br />
viral genomes from NCBI. From a panel<br />
of 13 samples, virus was successfully and<br />
accurately identified in all but one of the 11<br />
virus-positive samples, which contained very<br />
low viral titers. In addition, the read depth of<br />
viral sequences correlated inversely with the<br />
Ct values obtained from virus-specific qPCR<br />
assays of the samples. To establish limit of<br />
detection with this approach, the 70 million<br />
sequencing reads were subsampled randomly<br />
to a range of 5-50 million for each sample. The<br />
bioinformatics analysis demonstrated that 5<br />
million reads is sufficient for detection of viral<br />
genomes in these samples. Once the workflow<br />
pipeline was established, the same approach<br />
was applied in a blinded manner, to 8 human<br />
CSF samples from patients with encephalitis.<br />
The samples had been tested previously for 14<br />
pathogens in a qPCR panel designed to detect<br />
viral encephalitic agents. Using 5 million<br />
reads, viral genomes were accurately identified<br />
in 4 samples. . When read depth was increased<br />
to 40-45 million, viral genome was accurately<br />
identified in one of the remaining 4 samples.<br />
The other 3 samples, which had tested negative<br />
for 14 target agents in the qPCR panel,<br />
remained negative for viral genomes in the<br />
new assay. The described new technique was<br />
demonstrated to be very effective for the detection<br />
of viral pathogens in CSF samples. Future<br />
experiments are planned to refine the technology<br />
for the detection and identification of<br />
causative pathogens in CSF from patients with<br />
encephalitis.<br />
n 30<br />
APPLYING WHOLE GENOME SEQUENCING<br />
TO DETECT MOLECULAR EVENTS LEADING<br />
TO THE ACQUISITION OF CARBAPENEM<br />
RESISTANCE IN CLINICAL SAMPLES<br />
L. Appalla 1 , P. Mc Gann 1 , E. Snesrud 1 , A.<br />
C. Ong 1 , F. Onmus-Leone 1 , M. Koren 2 , Y. I.<br />
Kwak 1 , P. E. Waterman 1 , E. P. Lesho 1 ;<br />
1<br />
Walter Reed Army Institute of Research, Silver<br />
Spring, MD, 2 Walter Reed National Military<br />
Medical Center, Bethesda, MD.<br />
Background: Multi-drug resistant organisms<br />
(MDROs) have emerged as a global threat to<br />
public health. Tracking the dissemination of<br />
MDROs and controlling their transmission are<br />
major challenges for the healthcare community<br />
worldwide. We have applied whole genome<br />
sequencing (WGS) and analysis to a collection<br />
of serial isolates from a single patient to<br />
identify the molecular events that led to the<br />
acquisition of carbapenem resistance during<br />
treatment. Initial cultures from the patient<br />
contained extended-spectrum β-lactamase<br />
(ESBL)-producing, carbapenem-susceptible,<br />
Escherichia coli. Multiple rounds of ertapenem<br />
treatment were administered, but the infection<br />
recurred after each course of antibiotics. Subsequent<br />
cultures yielded carbapenem-resistant<br />
E. coli and Morganella morganii. Methods:<br />
All isolates were sequenced using the Illumina<br />
MiSeq desktop sequencer. Sequencing data<br />
was assembled using an in-house pipeline that<br />
incorporates FLASh, Btrim and the GS De<br />
Novo Assembler. Assemblies were then passed<br />
through quality control filters that check read<br />
coverage, contig number and contig size. Further<br />
analysis using the bioinformatics pipeline<br />
included 16S species identification, multilocus<br />
sequence typing, comprehensive identification<br />
of antibiotic resistance genes and the detection<br />
of single nucleotide changes, insertion/deletion<br />
events and large-scale genomic rearrangements<br />
that distinguish the isolates. Results: All E.<br />
coli isolates were multilocus sequence type<br />
131 and carried 9 antibiotic resistance genes<br />
(including blaCTX-M-27, which confers an<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
59
Poster <strong>Abstracts</strong><br />
ESBL phenotype) on an IncF plasmid. The<br />
isolates were identical by genome sequencing,<br />
with the exception of 150 kb of plasmid<br />
DNA present only in the carbapenem resistant<br />
isolates. This DNA sequence included a sixty<br />
kilobase IncN plasmid carrying the carbapenemase<br />
gene blaOXA-181, present in M. morganii.<br />
In the M. morganii plasmid, blaOXA-181<br />
was flanked by IS3000 and ISKpn19, but in all<br />
but one of the carbapenem resistant E. coli isolates,<br />
a second copy of ISKpn19 had inserted<br />
adjacent to IS3000. Conclusion: blaOXA-181<br />
was acquired by a member of the virulent<br />
sequence type 131 E. coli clonal group via an<br />
IncN plasmid from M. morganii. Because M.<br />
morganii tends to have high intrinsic resistance<br />
to imipenem, and because blaOXA-181 has<br />
relatively weak carbapenemase activity and a<br />
substrate profile that includes penicillins but<br />
not extended-spectrum cephalosporins, the<br />
presence of blaOXA-181 in these strains was<br />
nearly overlooked. However, WGS and advanced<br />
sequence analysis techniques revealed<br />
that this gene was responsible for the acquisition<br />
of carbapenem resistance by E. coli. WGS<br />
represents a powerful approach for the surveillance<br />
of multidrug resistant microbes.<br />
n 31<br />
THE UTILITY OF WHOLE GENOME<br />
SEQUENCING IN CHARACTERIZING<br />
ACINETOBACTER EPIDEMIOLOGY AND<br />
ANALYZING HOSPITAL OUTBREAKS<br />
M. A. Fitzpatrick, E. A. Ozer, A. R. Hauser;<br />
Northwestern University, Chicago, IL.<br />
Background: Acinetobacter baumannii is a<br />
frequent cause of nosocomial infections and<br />
outbreaks. Whole genome sequencing (WGS)<br />
is a promising new technique for bacterial<br />
strain typing and outbreak investigation. Here<br />
we compare the performance of conventional<br />
methods with WGS for strain typing<br />
clinical Acinetobacter isolates and analyzing<br />
a carbapenem-resistant A. baumannii (CRAB)<br />
outbreak. Methods: We performed band-based<br />
typing, multi-locus sequence typing (MLST),<br />
and WGS on 148 Acinetobacter calcoaceticus-<br />
Acinetobacter baumannii complex bloodstream<br />
isolates collected from 2005-2012. Clustering<br />
dendrograms and phylogenetic trees were<br />
constructed using the results of band-based<br />
and sequence-based typing, respectively.<br />
Discriminatory power and level of agreement<br />
of the techniques were compared. WGS was<br />
then used to analyze an ICU CRAB outbreak<br />
that occurred in our hospital during the study<br />
period. Results: Phylogenetic trees inferred<br />
from core genome SNPs confirmed three<br />
Acinetobacter species within this collection.<br />
Four major A. baumannii sequence types (STs)<br />
circulated in our hospital over the course of the<br />
study, three of which have a global distribution<br />
pattern and one of which is novel. WGS<br />
indicated that a threshold of 2500 core SNPs<br />
accurately distinguished A. baumannii isolates<br />
with the same ST from those with different<br />
STs. Conventional band-based techniques performed<br />
less well in accurately assigning isolates<br />
to ST lineages and exhibited poor agreement<br />
overall with sequence based techniques.<br />
When WGS was applied to a CRAB outbreak,<br />
we found that a threshold of 2.5 core SNPs distinguished<br />
non-outbreak strains from outbreak<br />
strains. WGS was more discriminatory than<br />
conventional band-based techniques and was<br />
used to construct a more accurate transmission<br />
map that resolved many of the plausible<br />
transmission routes suggested by PFGE and<br />
epidemiologic links. More detailed accessory<br />
genome analysis identified a plasmid that was<br />
circulating among isolates over the course of<br />
the outbreak. Conclusion: Our study demonstrates<br />
that WGS is superior to conventional<br />
techniques for A. baumannii strain typing and<br />
outbreak analysis. These findings support incorporation<br />
of WGS into healthcare infection<br />
prevention efforts.<br />
60<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
n 32<br />
KLEBSIELLA PNEUMONIAE IN ITALY:<br />
INSIGHTS FROM SHORT- AND LONG-TERM<br />
SCALE STUDIES<br />
F. Comandatore;<br />
University of Milan and University of Pavia,<br />
Milan and Pavia, ITALY.<br />
The diffusion of multi-drug resistant (MDR)<br />
pathogenic bacteria represents one of the most<br />
important issues for global public health.<br />
Indeed, during the last twenty years, a dramatic<br />
increase of nosocomial infections and<br />
outbreaks due to MDR pathogens has been<br />
reported world-wide. Klebsiella pneumoniae<br />
(Kp) isolates able to resist to third-generation<br />
cephalosporins and carbapenems have been<br />
reported in Countries spanning from Asia<br />
through Europe to America. Kp infections result<br />
in high morbidity and mortality in people<br />
with weak immune systems, such as inpatients<br />
of hospital intensive care units (ICUs).<br />
In Italy, since the first 2000s, an increasing<br />
number of MDR Kp nosocomial infections<br />
has been reported. We built an epidemiological<br />
network, connecting together six hospital<br />
microbiology units, a public health veterinary<br />
laboratory, and two university bioinformatic<br />
groups. During the first year (2014-2015),<br />
we developed in-house pipelines to perform<br />
SNP and wide-genome analyses on hundreds<br />
(up to thousands) bacterial genomes. Our first<br />
research project (AAC, 2015, vol. 53(4), 389-<br />
396) regarded 89 isolates from hospital collections.<br />
We included those genomes in a 319<br />
Kp genomes worldwide database, spanning<br />
the genetic variability across the species. The<br />
analysis of this genomic database provided<br />
important insights into the origin of the pathogenic<br />
clonal group Kp CG258, and about the<br />
emergence of Kp in Italy. Indeed, we described<br />
a huge genomic recombination (~1.3Mb size)<br />
that occurred during the emergence of the<br />
Kp CG258. The time-calibrated phylogenetic<br />
analysis allowed us to date that recombination<br />
around 1985. On the basis of that phylogenetic<br />
reconstruction, we were able to describe the<br />
four major Kp CG258 clades in Italy, and date<br />
their emergences from 2009 to 2010. The second<br />
research project was focused on a shorttime<br />
scale: we used genomic epidemiology<br />
approach to study a Kp outbreak that occurred<br />
in an Italian hospital during 2013 (JCM, 2015,<br />
00545-15). We were able to identify, and genetically<br />
describe, this pathogenic Kp clone. It<br />
resulted to belong to the Kp CG258 and to be<br />
phylogenetically associated to one of the four<br />
Italian major lineages described above. Furthermore,<br />
the SNP analysis showed that this<br />
Kp clone spread across the ICU through a sole<br />
carrier, and not from inpatient to inpatient. In<br />
conclusion, genomic studies allowed us to obtain<br />
insights on Kp epidemiology at short- and<br />
long-time scale providing useful information<br />
for outbreak control and genome evolution of<br />
this important nosocomial pathogen.<br />
n 33<br />
COMPARATIVE GENOMIC ANALYSIS OF<br />
THE FIRST TWO VAND-TYPE VANCOMYCIN-<br />
RESISTANT ENTEROCOCCUS FAECIUM IN<br />
THE NETHERLANDS<br />
M. Rogers, J. Sinnige, J. Top, E. Brouwer, M.<br />
Bonten, R. Willems;<br />
UMC Utrecht, Utrecht, NETHERLANDS.<br />
Introduction: Enterococcus feacium has<br />
rapidly become an important nosocomial<br />
pathogen. Vancomycin-resistant E. feacium<br />
(VRE) strains are of particular importance, as<br />
these are often multi-resistant, which drastically<br />
limits treatment options. To date, there<br />
have been nine different types (vanA-G, vanL-<br />
N) of vancomycin resistance gene clusters<br />
described of which vanA and vanB are most<br />
frequently found. Recently, two epidemiologically<br />
unrelated vanD VRE isolates were found<br />
in two Dutch hospitals. Here we report on the<br />
phylogenetic analyses of the first vanD VRE<br />
in the Netherlands. Methods: Whole genome<br />
sequencing of these two strains was performed<br />
using the Nextera XT DNA Library Prep<br />
Kit (Illumina) for library preparation and sequenced<br />
on the MiSeq System (Illumina) with<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
61
Poster <strong>Abstracts</strong><br />
a 2x250 bp MiSeq Reagent Kit v2 (Illumina).<br />
Quality filtering of the reads was performed<br />
using Nesoni 0.109 and reads were assembled<br />
using SPAdes 2.5.1 genome assembler. Genes<br />
were predicted and annotated using PROKKA.<br />
For the phylogenetic analysis, a total of 104<br />
strains were used (73 publicly available strains,<br />
29 strains from our lab and the 2 vanD positive<br />
strain). Core alignment was performed using<br />
Bowtie2 and SAMtools was used for SNP calling<br />
between the two vanD-type VRE strains<br />
(E7962 and E8043). OrthAgogue was used<br />
for prediction of gene orthology relations, followed<br />
by clustering of orthogroups via MCL.<br />
A phylogenetic tree was constructed based on<br />
the core genome of the 104 strains using RAx-<br />
ML. Results: Phylogenetic inferences revealed<br />
that both vanD VRE clustered in clade A1 containing<br />
mostly clinical strains. Whole genome<br />
analysis revealed considerable SNP difference<br />
(2981 SNPs; 1.1*10-3 SNPs/Mb) between the<br />
two strains’ recombination-free core genome,<br />
indicating that both strains were not clonally<br />
related. Furthermore, both strains carried the<br />
entire vanD gene cluster (vanRD, vanSD,<br />
vanYD, vanHD, vanD, vanXD) on their largest<br />
scaffold (size: ~222kb for E7962 and ~192kb<br />
for E8043) and SNP analysis of these scaffolds<br />
revealed only 9 SNPs (4.6*10-5 SNPs/Mb) between<br />
them. Further analysis showed the presence<br />
of a 76kb core-region (present in all 104<br />
strains) in these vanD-scaffolds, followed by<br />
an accessory-region (of 116kb for E8043 and<br />
146 kb for E7962) containing the vanD gene<br />
cluster and phage-related genes, present only<br />
in the two vanD-VRE. Conclusion: These<br />
results suggest that the vanD gene cluster is<br />
located on a mobile genetic element that was<br />
acquired by two clonally unrelated strains from<br />
a common third source or from each other.<br />
n 34<br />
ACUITAS ® RESISTOME TEST - A HIGH<br />
THROUGHPUT TRIAGE TOOL FOR STRAIN<br />
TYPING BY WHOLE GENOME SEQUENCING<br />
R. K. Kersey, G. T. Walker, T. Rockweiler, W.<br />
Chang;<br />
OpGen, Gaithersburg, MD.<br />
Multi-drug resistant organisms (MDROs) are a<br />
global healthcare issue associated with an increase<br />
in morbidity and mortality. Acuitas ® Resistome<br />
Test, a high throughput multiplex PCR<br />
test, detects approximately 50 antibiotic resistance<br />
genes in Gram-negative bacilli including<br />
genes for carbapenemases, Extended Spectrum<br />
Beta Lactamases (ESBLs) and AmpC enzymes<br />
carried by MDROs. The Resistome Test is<br />
useful for genotyping carbapenem and cephalosporin<br />
resistance genes for surveillance of<br />
transmission events in hospitals. We validated<br />
gene specificity of the Resistome Test through<br />
evaluation of 265 culture isolates with reported<br />
gene subtypes that were adjudicated by whole<br />
genome sequencing to resolve discrepancies,<br />
which fell into three categories (missed genes,<br />
incorrect genotype and additional genes reported).<br />
Genomic DNA was extracted from broth<br />
cultures of isolates followed by Nextera library<br />
preparation, Illumina MiSeq sequencing, genomic<br />
assembly and sequence analysis using<br />
Ridom SeqSphere+ (MLST+) software.<br />
Results showed 100% concordance between<br />
gene results from the Resistome Test and<br />
whole genome sequencing. Our second objective<br />
was to illustrate that the Resistome Test<br />
was also useful as a triage tool to select culture<br />
isolates for strain typing by whole genome<br />
sequencing (WGS). We tested 65 Klebsiella<br />
pneumoniae isolates from two studies by the<br />
Resistome Test. Each K. pneumoniae isolate<br />
was positive for two to five of the following<br />
antibiotic resistance genes in various combinations:<br />
KPC, CTX-M-1, ACT, CMY, OXA-50,<br />
OXA-2, FOX, SHV and TEM plus ESBL<br />
variants of TEM and SHV. The Resistome<br />
Test provided a level of genotypic resolution<br />
62<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
that resolved clinical strains of K. pneumoniae<br />
prior to whole genome sequencing. Clinical<br />
isolates with distinct Resistome Test results<br />
were shown to be distinct strains by whole<br />
genome sequencing while isolates with identical<br />
Resistome Test results were often identical<br />
strains by whole genome sequencing. The Resistome<br />
Test was able to resolve 40% of the 65<br />
isolates as distinct strains, thereby identifying<br />
two potential groups of strain types for higher<br />
resolution by whole genome sequencing. We<br />
concluded that Acuitas ® Resistome Test is useful<br />
for detecting carbapenem and cephalosporin<br />
resistance in Gram-negative bacilli and as<br />
a triage tool to select culture isolates for strain<br />
typing by WGS.<br />
n 35<br />
PATHOGEN DISCOVERY IN TRAVELERS’<br />
DIARRHEA OF UNKNOWN ETIOLOGY BY<br />
METAGENOMIC SEQUENCING<br />
Q. Zhu, M. Jones, S. Highlander;<br />
J. Craig Venter Institute, La Jolla, CA.<br />
Infectious diarrhea is responsible for about<br />
million deaths each year. We are studying<br />
travelers’ diarrhea (TD) where the known<br />
causative agents are members of Enterobacteriaceae,<br />
such as enterotoxigenic Escherichia<br />
coli, Shigella and Salmonella, viruses such<br />
as norovirus, and parasites such as Giardia.<br />
Nevertheless, in over 40% of cases, a known<br />
pathogen cannot be identified by traditional<br />
clinical tests. “Pathogen negative” diarrhea is<br />
enigmatic, although this due, in part to a lack<br />
of appropriate cultivation and screening tests<br />
and poor sensitivity of the tests. DNA sequencing<br />
is increasingly being applied in attempts to<br />
characterize agents of infectious disease. We<br />
hypothesize that unrecognized pathogens are<br />
responsible for a significant proportion of TD.<br />
These may be known organisms with unrecognized<br />
pathogenic potential or may be completely<br />
new species with new mechanisms of<br />
virulence. We have performed deep NextSeq<br />
paired-end sequencing of DNAs from stools of<br />
eight pathogen-negative and two healthy traveler<br />
controls in an attempt to identify pathogens.<br />
A total of 132.8 Gb (ca. 12-20 Gb raw<br />
data/sample) of sequencing data were retained<br />
after quality filtering. Reads were mapped to<br />
the NCBI RefSeq genomic database, resulting<br />
in an average mapping rate of 72.4% (min:<br />
33.1%, max: 93.8%). In the samples where<br />
mapping was low, we believe that many of<br />
the unmapped reads likely represent new uncharacterized<br />
organisms. Taxonomic profiles<br />
were generated based on the mapping results,<br />
and revealed significantly uneven distribution<br />
of microbial groups among samples. The low<br />
complexity samples appear to be dominated by<br />
a single pathogen, while the high complexity<br />
samples may be the result of a mixed etiology.<br />
Three TD samples are enriched for E. coli sequences,<br />
despite the fact that enterotoxins were<br />
not detected in clinical screens. Two of these<br />
carry genes for the Shiga toxin. Additional<br />
TD samples had, for example, high abundance<br />
of Akkermansia muciniphila, Streptococcus<br />
spp., Campylobacter jejuni, or Alistipes shahii<br />
reads, while the two healthy traveler controls<br />
had high abundance of reads that mapped to<br />
several Bifidobacterium species, which were<br />
not present in the diarrheal samples. De novo<br />
assembly was performed for each sample<br />
(average N50 statistic: 6516.4), followed by<br />
contig binning and scaffolding. Near complete<br />
draft genomes were successfully recovered<br />
from the metagenomes. Some represent known<br />
species (such as E. coli and Campylobacter),<br />
while others could not be taxonomically placed<br />
in proximity to any known bacterial groups.<br />
Our results provide a glimpse into the microbiome<br />
diversity composition and potential etiological<br />
sources in “no pathogen identified” TD<br />
samples, and demonstrate the power of highthroughput<br />
DNA sequencing in the discovery<br />
of pathogens in infectious disease.<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
63
Poster <strong>Abstracts</strong><br />
n 36<br />
WHOLE GENOME SEQUENCING OF PORCINE<br />
EPIDEMIC DIARRHEA VIRUS BY ILLUMINA<br />
MISEQ PLATFORM<br />
L. Wang 1 , T. Stuber 2 , M. Prarat 1 , P. Camp 2 , S.<br />
Robbe-Austerman 2 , Y. Zhang 1 ;<br />
1<br />
Animal Disease Diagnostic Laboratory, Ohio<br />
Department of Agriculture, Reynoldsburg, OH,<br />
2<br />
National Veterinary Services Laboratories,<br />
Animal and Plant Health Inspection Service,<br />
United States Department of Agriculture,<br />
Ames, IA.<br />
Porcine epidemic diarrhea virus (PEDV) belongs<br />
to the genus Alphacoronavirus of the<br />
family Coronaviridae. PEDV was identified as<br />
an emerging pathogen in US pig populations in<br />
2013. Since then, this virus has been detected<br />
in at least 31 states in the US and has caused<br />
significant economic loss to swine industry.<br />
Active surveillance and characterization of<br />
PEDV are essential for monitoring the virus.<br />
Obtaining comprehensive information about<br />
the PEDV genome can improve our understanding<br />
of the evolution of PEDV viruses,<br />
the emergence of new strains, and improve<br />
vaccine designs. This study investigated the<br />
use of deep sequencing by the next-generation<br />
sequencing (NGS) Illumina MiSeq platform to<br />
obtain complete genome sequence information<br />
from PEDV virus isolates. Clinical samples<br />
were first subjected to a real-time RT-PCR assay<br />
specific for PEDV. Positive samples were<br />
then amplified using 19 pairs of PEDV specific<br />
primers (targeted amplification method). The<br />
amplified PCR products were mixed and used<br />
as input DNA to prepare a DNA library using<br />
the Nextera XT kit for NGS. Alternatively,<br />
a random-priming method was applied to<br />
prepare the input DNA for clinical samples<br />
with high viral loads (real-time PCR with Ct<br />
value
Poster <strong>Abstracts</strong><br />
to cluster strains based on distribution of 1,281<br />
accessory genes. We identified three major<br />
clades (A - C) characterized by a large variation<br />
in r/m ratio: 22.7 (all uncommon STs from<br />
UK), 0.9 and 3.7, respectively. Within Clade<br />
B and C, with few exceptions (e.g. ST-11 and<br />
ST-230), sequence types (ST) did not form<br />
monophyletic lineages and were composed by<br />
numerous BAPS populations characterized by<br />
a clonal structure, limited genetic variation and<br />
no temporal or geographical signals. Further<br />
GWAS analysis, which includes both core<br />
and accessory genome, did not detect overall<br />
genetic features correlated to geographical<br />
separation. On the contrary, both phylogenetic<br />
analysis based on the distribution of accessory<br />
genes showed geographical clustering of the<br />
isolates within each BAPS group. They also<br />
revealed several events of consecutive gene<br />
flow between isolates of the two countries,<br />
suggesting migration within a population.<br />
Our genomic study supported the theory of<br />
homogeneous global distribution of C. jejuni<br />
genotypes probably associated with rapid<br />
animal and/or human movement. Contrary to<br />
what expected, several lineages within ST45cc<br />
appear to be genetically monomorphic pathogens<br />
and were persistently detected over the<br />
years independently from geographical origin.<br />
Epidemiological investigations based on core<br />
genes might be affected by small resolution<br />
due to the clonal nature of certain lineages and<br />
a pangenome approach is recommended.<br />
n 38<br />
COMPLETE GENOME SEQUENCE OF<br />
STAPHYLOCOCCUS AUREUS STRAIN FROM<br />
A PIG, A UNIQUE T324-ST541-V KOREAN<br />
METHICILLIN RESISTANT S. AUREUS<br />
CLONE<br />
S. Lim, D. Moon, G. Jang, H. Lee;<br />
Animal and Plant Quarantine Agency, Anyang,<br />
KOREA, REPUBLIC OF.<br />
Methicillin-resistant Staphylococcus aureus<br />
(MRSA) has been a major causative agent<br />
of nosocomial infection, and it has also been<br />
reported from non-human sources. Livestock<br />
associated MRSA such as sequence type (ST)<br />
398 and ST541 has been reported in pigs with<br />
a high frequency in Korea. Especially, a spa<br />
type t324, sequence type ST541, and staphylococcal<br />
cassette chromosome mec element (SC-<br />
Cmec) type V (t324-ST541-V) was one of the<br />
predominant clones in pig production industry<br />
in Korea. To better understand of occurrence,<br />
genetic repertoire, and relatedness with other<br />
MRSA types, we sequenced and assembled<br />
the complete genomes of this predominant<br />
clone. A t324-ST541-V MRSA isolate designated<br />
K12PJN53 was isolated from healthy<br />
pig in 2012. The draft genome sequence of<br />
K12PJN53 was obtained by combined analyzing<br />
the results of Illumina MiSeq and Roche<br />
454 FLX sequencing systems. Each sequencing<br />
reads were assembled by the CLC genomic<br />
workbench 5.5 and the GS Assembler 2.6. A<br />
total of 458 genome sequences were obtained<br />
from S. aureus subsp. aureus in EzGenome<br />
database were compared with K12PJN53 by<br />
calculating average nucleotide identity (ANI)<br />
values. The genome of K12PJN53 consists of<br />
a single circular 2,880,108 bp chromosome<br />
with 32.88% GC content and two plasmids. A<br />
total of 2,042 protein coding regions, 57 tRNA<br />
genes, and 10 rRNA genes were detected.<br />
Among the annotated contigs, 14, 17, and 20<br />
contigs were annotated to antibiotic resistance,<br />
adherence and toxin genes, respectively.<br />
Tetracycline, macrolide, lincosamide and<br />
streptogramin B, and aminoglycoside resistant<br />
genes were found outside of the SCCmec elements<br />
with insertion sequence (IS) 256 or 431.<br />
In addition, metal-resistant genes were also<br />
identified in the internal and external regions<br />
of SCCmec elements. Several virulence genes<br />
such as elastin binding protein, fibrinogen<br />
binding protein, clumping factors A, hemolysin,<br />
and exfoliative toxin A were presented<br />
in K12PJN53, however, Panton-Valentine<br />
leukocidin was not detected. The genomic<br />
distance based on ANI of K12PJN53 strain<br />
was similar to the ST398 strains, which have<br />
emerged in European countries. This study is<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
65
Poster <strong>Abstracts</strong><br />
the first report of the draft genome sequence of<br />
novel livestock-associated MRSA ST541 strain<br />
isolated from a pig in Korea. This genome<br />
sequence assists to understand features of<br />
ST541 lineage including antibiotic resistance<br />
and virulence genes.<br />
n 39<br />
WHOLE GENOME SEQUENCING OF<br />
SALMONELLA NEWPORT CLONE<br />
JJPX01.0061 REVEALS PHYLOGENETIC<br />
EVIDENCE FOR ENDEMIC PERSISTENCE<br />
AND EXTENSIVE MICROEVOLUTIONARY<br />
DIVERSIFICATION AMONG EASTERN SHORE<br />
SURFACE WATERS<br />
R. L. Bell, C. Ferreira, E. Reed, C. Wang, E.<br />
Burrows, T. Muruvanda, J. Zheng, M. W. Allard,<br />
J. Pettengill, E. Strain, E. Brown;<br />
FDA, College Park, MD.<br />
Recurrent outbreaks of Salmonella enterica<br />
serovar Newport, XbaI PFGE pattern<br />
JJPX01.0061, have been linked to the consumption<br />
of tomatoes grown along the Eastern<br />
Shore of Virginia (VES) at least 6 times since<br />
2002. Environmental surveys of this region<br />
suggest that this subtype is endemic, persisting<br />
in surface waters of the VES microcosm.<br />
The genomic diversity of a large population of<br />
JJPX01.0061 isolated from disparate surface<br />
waters across the VES was investigated using<br />
whole genome sequencing (WGS) approaches.<br />
More than 70 environmental JJPX01.0061<br />
isolates spanning seven years from the VES<br />
were subjected to whole genome shotgun<br />
sequencing, assembled, and aligned using<br />
a reference-based mapping approach to a<br />
closed Newport genome (CFSAN024225). A<br />
maximum likelihood tree was then constructed<br />
based on total SNP variation and evaluated<br />
in light of geographic variation based on the<br />
location from which each isolate was collected.<br />
Genomic diversity within this PFGE<br />
subtype clustered the strains into four distinct<br />
clades or “genomovars” separated by a range<br />
of only 5 to more than 100 SNPs. Two clades<br />
(C and D) sorted isolates uniquely based on<br />
specific creeks and were identical (C, 0 SNPs<br />
intraclade variation) or nearly identical (D, 1<br />
SNPs). Two additional clades (A and B) were<br />
polyphyletic with respect to creek location<br />
suggesting two independent introductions<br />
of JJPX01.0061 variants into each of these<br />
locales. Finally, clade B, comprised largely<br />
of Newports from 2007, was separated from<br />
more recent Newport groupings by at least 100<br />
SNPs suggesting that substantial microevolutionary<br />
change has accrued within this lineage,<br />
a find consistent with its establishment and<br />
prolonged environmental persistence. Genetic<br />
diversification of JJPX01.0061supports its<br />
long-term and endemic persistence within this<br />
regional microcosm. Occasional reintroduction<br />
of distinct genomic variants into common<br />
creek environments is also seen pointing to<br />
a potential role for geese or other water fowl<br />
species in the local mixing of isolates.<br />
n 40<br />
MOLECULAR TYPING OF BRUCELLA<br />
MELITENSIS BY WHOLE GENOME<br />
SEQUENCING<br />
G. Garofolo 1 , J. T. Foster 2 , K. Drees 2 , I. Platone<br />
1 , K. Zilli 1 , M. Marcacci 1 , M. Ancora 1 , C.<br />
Cammà 1 , I. Mangone 1 , F. De Massis 1 , P. Calistri<br />
1 , E. Di Giannatale 1 ;<br />
1<br />
Istituto Zooprofilattico Sperimentale<br />
dell’Abruzzo e del Molise, Teramo, ITALY,<br />
2<br />
Department of Molecular, Cellular, and Biomedical<br />
Sciences, University of New Hampshire,<br />
Durham, NH, USA, NH.<br />
Brucella melitensis is the causative agent of<br />
brucellosis in sheep and goats, causing public<br />
health concerns through the consumption of<br />
contaminated milk or in people in close contact<br />
with livestock. Despite concerted eradication<br />
campaigns, the disease remains endemic<br />
throughout much of the Mediterranean basin.<br />
Brucella is highly clonal so single nucleotide<br />
polymorphism (SNPs) in a phylogenetic<br />
framework can be used for determining its<br />
66<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
evolution. Determining which lineages are<br />
present throughout Italy is crucial to understanding<br />
brucellosis epidemiology as well as to<br />
place Italian samples into a global evolutionary<br />
context. The aim of this study was to evaluate<br />
an NGS approach using different analyses<br />
to investigate B. melitensis diversity in Italy.<br />
Previous genetic assessment using variable<br />
number tandem repeats (VNTRs) revealed the<br />
presence of the west Mediterranean lineage<br />
structured in several clades that were sometimes<br />
geographically constrained. We selected<br />
16 B. melitensis isolates for whole genome<br />
sequencing representing maximum genetic and<br />
geographic diversity. Libraries were sequenced<br />
with both paired-end Illumina and Ion Torrent<br />
sequencing, and our analyses included 59<br />
publicly available B. melitensis genomes. We<br />
performed SNP analysis in read alignments<br />
and whole genomes using NASP pipeline, and<br />
in parallel we applied a gene-based approach<br />
using MLST+. Approximately 22,000 putative<br />
SNPs were identified among the B. melitensis<br />
samples. The MLST+ revealed 1,748 targets<br />
for the first chromosome and 876 targets for<br />
the second chromosome totaling 2,624 loci.<br />
Both approaches found that the Italian isolates<br />
formed 3 sub-clades as part of the B. melitensis<br />
strain Ether lineage, a strain that was isolated<br />
in Italy fifty years ago, suggesting that this<br />
lineage has been well established and successful<br />
for at least over half century in the Italian<br />
peninsula. Finally, we compared our results to<br />
a recently published shotgun metagenomic B.<br />
melitensis sequence from bones from a medieval<br />
grave in Sardinia (Italy) and confirm that<br />
this same lineage has been present in the region<br />
for centuries. This study is a step forward<br />
in understanding of Brucella evolution in the<br />
Mediterranean area, and demonstrates the utility<br />
of WGS SNP analysis, and the feasibility<br />
of MLST+ as a fast and reliable typing system<br />
for analyzing the epidemiology of Brucella in<br />
endemic regions.<br />
n 41<br />
GENOMIC EPIDEMIOLOGY AND<br />
TRANSMISSION OF SALMONELLA<br />
CHOLERAESUIS VAR. KUNZENDORF IN<br />
EUROPEAN PIGS AND WILD BOAR<br />
P. Leekitcharoenphon, F. M. Aarestrup, R. S.<br />
Hendriksen;<br />
Technical University of Denmark, Kgs. Lyngby,<br />
DENMARK.<br />
Salmonella Choleraesuis is a relative infrequent<br />
serovar adapted to pigs but also have a<br />
propensity to cause extraintestinal infections<br />
in humans. S. Choleraesuis var. Kunzendorf<br />
are responsible for the majority of outbreaks<br />
among pigs. The global transmission was<br />
believed to be a result of imported breeding<br />
pigs from Canada and the USA into Taiwan.<br />
In Europe, S. Choleraesuis is a relatively rare<br />
serovar, both in slaughter pigs and in breeding<br />
herds. In Denmark, only a few outbreaks have<br />
been reported among pig herds within the last<br />
decade; 1999 - 2000 and 2012 - 2013 and in<br />
both cases it has been impossible to identify<br />
the route of transmission and source of infection.<br />
In order to understand transmission and<br />
epidemiology of S. Choleraesuis, we have<br />
sequenced 108 S. Choleraesuis isolates from<br />
pig and wild boar from 12 European countries<br />
and USA. We applied SNP-based phylogenetic<br />
methods based on whole genome sequences to<br />
identify the population structure. We used Baysian<br />
phylogeny to estimate dates of divergence<br />
and phylogeographic analyses of lineages by<br />
using BEAST with Bayesian Skyline model of<br />
population size change and relaxed uncorrelated<br />
lognormal clock as the molecular clock.<br />
The S. Choleraesuis isolates yielded 2,428<br />
SNPs. We estimated that the ancestral emergence<br />
of S. Choleraesuis was in 1946. The<br />
mean evolutionary rates were approximated to<br />
be 1.58 x 10-6 SNPs/site/year corresponding<br />
to 7.5 SNPs/year. The isolates were divided<br />
into two complex clusters and they resided<br />
in sub-clusters according to countries and<br />
neighbour countries of isolation. According to<br />
the source of isolation, the wild boar isolates<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
67
Poster <strong>Abstracts</strong><br />
from Austria were clustered together but the<br />
wild boar isolates from Germany, Hungary<br />
and Estonia were clustered with strains from<br />
pig. The outbreak isolates from Denmark were<br />
distantly divided into two groups according to<br />
outbreak period. The recent outbreak isolates<br />
(2012-2013) contained an extra Q1 replicon,<br />
whereas some earlier outbreak isolates (1991-<br />
2000) had an extra I1 replicon. Some of the<br />
earlier outbreak strains were isolated from<br />
different organs from the same animal. Those<br />
isolates clustered by the pig where they were<br />
isolated from. Danish isolates together with<br />
farm geographical information were subjected<br />
to the discrete phylogeographic analysis using<br />
a standard Continuous-Time Markov Chain<br />
(CTMC). The spatial and temporal transmission<br />
of S. Choleraesuis isolates between different<br />
farms in Denmark was epidemiologically<br />
concordant with the data showing the contact<br />
between farms. These results provide the advantage<br />
of using WGS for elucidating evolution<br />
and transmission of S. Choleraesuis and<br />
emphasize the usefulness of the phylodynamic<br />
approaches to monitor the emergence and<br />
spread over time of these particular strains.<br />
These findings may help to promoting and<br />
establishing future prevention and control<br />
measurement of similar successful clones.<br />
n 42<br />
USING WHOLE GENOME SEQUENCING<br />
AND PHYLOGENETIC METHODOLOGIES<br />
TO CLUSTER SALMONELLA ENTERITIDIS<br />
ISOLATES BY SOURCE<br />
E. L. Stevens;<br />
Food and Drug Administration, College Park,<br />
MD.<br />
The Center for Food Safety and Applied Nutrition<br />
(CFSAN) collaborated with other state<br />
and federal labs to develop GenomeTrakr<br />
which houses the genomic sequences of thousands<br />
of clinical and environment/food isolates<br />
at the National Center for Biotechnology Information<br />
(NCBI). This repository of genetic data<br />
facilitates both surveillance and response to<br />
foodborne outbreaks. Whole genome sequencing<br />
(WGS) allows clusters of clinical isolates<br />
to be grouped and linked to the environmental/<br />
food source using the science of molecular<br />
phylogenetics and has been successfully applied<br />
in resolving outbreaks. Source attribution<br />
is the ultimate goal of combining sequence and<br />
epidemiological data for outbreak resolution.<br />
One important pathogen, Salmonella enterica<br />
serotype Enteritidis (SE), which caused 19%<br />
Salmonella illnesses in 2013 (19%), has been<br />
highlighted by epidemiologic insight as frequently<br />
coming from shell eggs. The Food<br />
and Drug Administration (FDA) published<br />
the 2010 Egg Rule to reduce Salmonella incidence<br />
rates caused by SE. However, other food<br />
and non-food sources such as beef, turkey,<br />
and chicken consumption, contact with live<br />
poultry, and international travel have been<br />
implicated in SE illness. Past methods have<br />
been unable to fully resolve some outbreaks<br />
because of the genetic similarity of SE, which<br />
WGS can resolve. Therefore, SE WGS data<br />
from GenomeTrakr was collected with the<br />
following parameters: (1) if it came from the<br />
United States; (2) had associated metadata<br />
indicating source, geographic location, clinical/environmental<br />
status, isolation source (e.g.<br />
chicken), and collection date. From there, SE<br />
isolates were summarized according to isolation<br />
source (e.g. specific to egg products), and<br />
a cluster analysis was performed to identify the<br />
variability of SE strains among the different<br />
food commodities to see if further analysis can<br />
determine genetic loci that may be predictive<br />
of SE source attribution. These aims involved<br />
using phylogenetic methods based on WGS<br />
to link environmental/food isolates with each<br />
other due to their shared similarity (i.e. few<br />
single nucleotide polymorphisms (SNPs)<br />
between them). This work may ultimately provide<br />
information in which the sequence data<br />
of a single isolate derived from a clinical case<br />
of salmonellosis could help inform the epidemiologic<br />
analysis by suggesting a potential<br />
source of the illness. Furthermore, these results<br />
can help to inform the effect of The 2010 Egg<br />
Rule.<br />
68<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
n 43<br />
CORRELATING PROPORTIONAL<br />
ABUNDANCES OF SALMONELLA, IN<br />
A COMPLEX METAGENOMES, WITH<br />
COVERAGE OF THE SALMONELLA GENOME<br />
K. G. Jarvis 1 , N. Daquigan 1 , J. R. White 2 , C. J.<br />
Grim 1 , D. E. Hanes 1 ;<br />
1<br />
FDA/CFSAN, Laurel, MD, 2 Resphera Biosciences,<br />
Baltimore, MD.<br />
Background: Culturing Salmonella from fresh<br />
produce requires resuscitation in nonselective<br />
pre-enrichment broths, followed by parallel<br />
selective enrichments in Tetrathionate (TT)<br />
and Rappaport-Vassiliadis (RV) broths, to inhibit<br />
background microflora. The cilantro microbiome<br />
consists mainly of Bacteroidetes and<br />
Proteobacteria. However, overnight nonselective<br />
pre-enrichment results in a shift to a lower<br />
proportional abundance of Proteobacteria and<br />
a dramatic increase in Firmicutes. Purpose:<br />
This study evaluates microbiome shifts over<br />
time in spiked cilantro cultures, and correlates<br />
the relative abundance of Salmonella to coverage<br />
of the 4.8 Mb genome. Methods: The<br />
microbiome of cilantro, spiked with S. enterica<br />
Newport at ~4CFU/25gm, was sequenced after<br />
0, 6, and 24-hours in nonselective Tryptic Soy<br />
Broth, utilizing an Illumina MiSeq. High-coverage<br />
control microbiomes, which were transferred<br />
to selective TT and RV broths, in parallel,<br />
and incubated for 24-hours following nonselective<br />
pre-enrichment, were also sequenced.<br />
Most Probable Number (MPN) analysis was<br />
performed at 6 and 24-hours to estimate CFU/<br />
ml Salmonella. MetaPhlAn and Resphera Insight<br />
software were used to analyze shotgun<br />
metagenomes, and 16S rRNA sequences, respectively.<br />
Metagenomic sequence reads were<br />
mapped to an S. enterica Newport genome<br />
by BLASTn to estimate Salmonella genome<br />
coverage. Results: The 0-hour cultures had<br />
9,511 BLASTn hits, with 579,864 (12%) bases<br />
covered, and an average read coverage of<br />
3.68. At 6-hours, BLASTn hits to Salmonella<br />
increased to 59,951 with an average read coverage<br />
of 16.96 accounting for 832,017 (17%)<br />
bases. BLASTn hits to Salmonella at 24-hours<br />
reached 175,532, and accounted for 1,081,888<br />
(22%) bases, with an average read coverage<br />
of 39.17. BLASTn hits in RV and TT cultures<br />
reached 18,110,119 and 21,096,773, representing<br />
average read coverages of 842 and 1033<br />
respectively, equating to 4,874,108 (99.94%)<br />
and 4,876,681 bases (99.99%) of the Newport<br />
genome represented in the RV and TT cultures.<br />
Proportional abundances of Salmonella 16S<br />
rRNA genes in 0, 6, and 24-hours cultures<br />
were 0.004% (1 of 25,000 reads, or 4 CFU/<br />
ml), 0.00% (0 of 25,000 reads, or 9 CFU/ml),<br />
and 5.81% (1,453 of 25,000 reads or 3.9 X 107<br />
CFU/ml), respectively. Salmonella proportional<br />
abundances in shotgun metagenomes were<br />
0.02%, 0.01%, 0.04%, 92%, and 86% for the<br />
0, 6, 24-hour, and, RV, and TT cultures, respectively.<br />
Conclusions: Correlating the proportional<br />
abundance of Salmonella with coverage<br />
of the genome, and CFU/ml in metagenomes<br />
is essential for understanding detection limits<br />
in complex food matrices such as cilantro.<br />
Careful consideration of analysis software to<br />
account for discrepancies in metagenomic databases<br />
is also important.<br />
n 44<br />
COMPARATIVE GENOMICS OF DRUG-<br />
RESISTANT SALMONELLA ENTERICA<br />
ISOLATED FROM DAIRY CATTLE AND<br />
HUMANS<br />
L. Carroll 1 , M. Wiedmann 1 , H. den Bakker 2 ,<br />
J. Siler 1 , M. Davis 3 , W. Sischo 3 , T. Besser 3 , L.<br />
Warnick 1 , R. Pereira 1 ;<br />
1<br />
Cornell University, Ithaca, NY, 2 Texas Tech<br />
University, Lubbock, TX, 3 Washington State<br />
University, Pullman, WA.<br />
Salmonella enterica is a pathogen of concern<br />
for both humans and cattle. It can be spread<br />
from cattle to human populations through direct<br />
contact with animals shedding Salmonella,<br />
as well as through the food chain. Infections<br />
caused by multidrug-resistant isolates can<br />
be challenging to treat, making multidrugresistant<br />
Salmonella a relevant human health<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
69
Poster <strong>Abstracts</strong><br />
hazard. The objective of this study was to use<br />
whole genome sequencing to study the evolutionary<br />
relationship of antibiotic-resistant<br />
S. Typhimurium, S. Newport, and S. Dublin<br />
isolated from dairy cattle and humans in Washington<br />
state and New York state from 2008 to<br />
2012. A total of 90 drug-resistant Salmonella<br />
isolates were selected for this study, 45 of<br />
which were from Washington state (20 from<br />
dairy cattle and 25 from humans) and 45 from<br />
New York state (21 from dairy cattle and 24<br />
from humans). Isolates were selected based<br />
on location, source, and serotype stratified by<br />
year. All isolates were tested for phenotypic<br />
antimicrobial resistance to 12 drugs using Kirby-Bauer<br />
disk diffusion, and all isolates were<br />
resistant to at least one drug. Genomes of all<br />
isolates were sequenced at Cornell University<br />
using the Illumina HiSeq platform and assembled<br />
de novo using SPAdes. In silico MLST<br />
and serotyping were performed using SRST2<br />
and SeqSero, respectively. SRST2 was also<br />
used in conjunction with the ARG-ANNOT<br />
database to detect the presence of antibiotic<br />
resistance genes in the genome of each isolate.<br />
Maximum likelihood trees were constructed<br />
using the core genome of each serotype using<br />
kSNP, and the Cortex variation assembler<br />
was used to detect SNPs and indels. The most<br />
common drug classes for which resistance<br />
genes were detected were aminoglycoside,<br />
sulfonamide, and tetracycline. Aminoglycoside<br />
and tetracycline were the two most common<br />
drug classes for which two or more resistance<br />
genes for each class were identified within the<br />
same isolate. Phylogenetic analyses of SNPs<br />
in the core genome of each serotype showed<br />
evidence for geospatial clustering. Further<br />
analyses will evaluate evolutionary phylogeny<br />
within each serotype and compare genotypic<br />
and phenotypic antibiotic resistance for all<br />
isolates to gain further insight into the spread<br />
of drug-resistant Salmonella between dairy<br />
cattle and humans in New York and Washington<br />
state.<br />
n 45<br />
NEXT GENERATION SEQUENCING AND<br />
CITRUS DISEASES; CURRENT STATUS AND<br />
FUTURE PROSPECTUS<br />
M. Shafiq, F. Ali, M. Saleem Haider;<br />
University of the Punjab, Lahore, PAKISTAN.<br />
Next-generation sequencing (NGS) high<br />
throughput technologies became available at<br />
the onset of the 21st century. They provide<br />
a highly efficient, rapid, and low cost DNA<br />
sequencing platform beyond the reach of the<br />
standard and traditional DNA sequencing technologies<br />
developed in the late 1970s. They are<br />
continually improved to become faster, more<br />
efficient and cheaper. They have been used<br />
in many fields of biology since 2004. Citrus<br />
is a large genus that includes several major<br />
cultivated species, including C. sinensis (sweet<br />
orange), Citrus reticulata (tangerine and mandarin),<br />
Citrus limon (lemon), Citrus grandis<br />
(pummelo) and Citrus paradisi (grapefruit).<br />
The draft genome of sweet orange (Citrus<br />
sinensis) has been estimated using NGS technologies.<br />
NGS has also been to study the genomes<br />
of several varieties of clementine mandarins<br />
and also to sequence full chloroplast<br />
citrus genome. NGS using RNAseq analysis<br />
has also been used to study the regulation of<br />
citrus genes expression during disease (citrus<br />
greening and CTV infection) development. It<br />
is expected that NGS will play very significant<br />
roles in many research and non-research areas<br />
of citrus biology and this technology will boost<br />
the future citrus research in Pakistan.<br />
n 46<br />
QUANTIFYING THE SWINE RESISTOME<br />
USING METAGENOMIC SEQUENCING<br />
P. Munk, V. D. Andersen, H. Vigre, F. M. Aarestrup;<br />
Techincal University of Denmark, Kgs. Lyngby,<br />
DENMARK.<br />
Monitoring of agricultural antimicrobial resistance<br />
has traditionally depended on the cultivation<br />
and phenotypic analysis of indicator<br />
70<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
bacteria. However, resistance genes are widely<br />
distributed and present in zoonotic agents as<br />
well as commensal bacteria that can exchange<br />
them by horizontal gene transfer. By directly<br />
sequencing animal fecal DNA, the entire intestinal<br />
resistome can be characterized at once.<br />
In this study, we aimed to develop a workflow<br />
appropriate for quantifying agricultural resistance<br />
and use it to characterize the Danish<br />
swine herd resistome. Ten Danish swine herds<br />
with diverse antimicrobial usage were enrolled<br />
in the study. In each herd, a fecal floor<br />
sample was obtained from 30 random pens.<br />
Those random samples were pooled for each<br />
herd and DNA was extracted using a modified<br />
QIAamp stool mini kit protocol. DNA was<br />
fragmented to 300 bp and prepared with the<br />
NEXTflex PCR-Free DNA Library Prep kit.<br />
Libraries were sequenced on a HiSeq2500 (PE,<br />
2x100 bp), producing roughly 7 billion bp/<br />
sample. MGmapper, a BWA-based pipeline<br />
for metagenomics, was used to map qualitytrimmed<br />
reads to the ResFinder database (2130<br />
resistance genes). To ensure high specificity,<br />
we only counted read pairs where both reads<br />
aligned with 50+ bp to the same reference<br />
gene. To minimize unspecific read mapping<br />
without excluding ambiguous alignments, read<br />
counts from gene variants were aggregated to<br />
gene and drug levels. The relationship between<br />
resistance gene abundances and herd-level<br />
antimicrobial consumption of more than seven<br />
drug classes was analyzed using correlation<br />
analysis. A total of 123 resistance genes were<br />
observed and 64 were detected in at least half<br />
of the samples. In all samples, tet(Q), mefA<br />
and tet(W) comprised over half the detected<br />
resistance genes. Besides tetracycline and<br />
macrolide resistance genes, beta-lactam and<br />
lincosamide resistance genes were also very<br />
prevalent. Positive correlations between drug<br />
use and gene abundances were frequently significant<br />
(p < 0.05). This was not the case for<br />
negative correlations, suggesting antimicrobial<br />
use in Danish swine herds is associated with<br />
an increase in resistance gene abundances.<br />
In conclusion, our metagenomic approach<br />
facilitates herd-level resistance monitoring in<br />
swine. The resolution is sufficient to observe<br />
antimicrobial-induced effects on the swine<br />
resistome and quantify more than 100 resistance<br />
genes. The data may furthermore be well<br />
suited to study other functional sequences like<br />
virulence genes and transposable elements,<br />
making metagenomics an attractive option for<br />
routine environmental monitoring.<br />
n 47<br />
MOLECULAR AND GENOMIC TYPING OF<br />
POULTRY ASSOCIATED SALMONELLA<br />
ENTERICA STRAINS FROM NIGERIA<br />
N. Useh 1 , H. Suzuki 2 , N. Akange 1 , M. Thomas 3 ,<br />
A. Foley 3 , M. Keena 3 , E. Nelson 3 , J. Christopher-Hennings<br />
3 , J. Scaria 3 ;<br />
1<br />
University of Agriculture, Makurdi, NIGERIA,<br />
2<br />
Yamguchi University, Yamaguchi, JAPAN,<br />
3<br />
South Dakota State University, Brookings, SD.<br />
Non-typhoidal Salmonellosis is one of the<br />
common cause of bacterial diarrhea worldwide.<br />
Globally 94 million cases of gastroenteritis<br />
and 115,000 deaths each year is estimated<br />
to be caused as a result of non-typhoidal<br />
Salmonellosis. Transmission of pathogenic<br />
Salmonella strains between different countries<br />
has increased due to global travel and food<br />
import. While North America and Europe have<br />
constituted active Salmonella surveillance<br />
programs, very limited epidemiologic data is<br />
available in developing countries, particularly<br />
in sub-Saharan Africa. Therefore, we have<br />
conducted an epidemiologic investigation of<br />
Salmonella prevalence in poultry samples from<br />
Nigeria. Salmonella was isolated by enrichment<br />
culture in tetrathionate broth followed by<br />
growth on XLT4 agar. Identify of the strains<br />
were further confirmed by Matrix Assisted<br />
Laser Desorption Ionization Time-of-Flight<br />
(MALDI-TOF). After positive identification,<br />
virulence of each strain was estimated using<br />
Human Colo-rectal cell (Caco-2) invasion<br />
assay. Total of 40 isolates were typed using<br />
Caco-2 invasion assay. These isolates were further<br />
typed using Next Generation Sequencing<br />
(NGS). Sequencing libraries were prepared us-<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
71
Poster <strong>Abstracts</strong><br />
ing Illumina Nextera XT kit and the sequencing<br />
was performed using Illumina 2 x 250 base<br />
paired end sequencing chemistry. Genome<br />
assembly of the strains was performed using<br />
Spades Assembler v 3.5.0 and annotated using<br />
Prokka v1.11. Comparative genomic analysis<br />
of the isolates was performed using ITEP tool<br />
kit. We further mapped the presence/absence<br />
of 200 Salmonella virulence associated genes<br />
in the sequenced genomes. Strain typing<br />
results revealed that the Salmonella isolates<br />
we collected belong to serotypes Anatum and<br />
Newington. There were several we fold differences<br />
in the Caco-2 invasiveness of the strains.<br />
We find that while few strains were not invasive,<br />
most strains were highly invasive. There<br />
was no clear correlation between Caco-2 invasiveness<br />
of the strains and the presence and<br />
absence of virulence genes in their genomes.<br />
This might indicate the presence of unknown<br />
virulence genes in these strains or the condition<br />
specific expression of known virulence<br />
genes. Currently, we are performing the detailed<br />
genomic comparison of invasive and<br />
non-invasive isolates, and this might give better<br />
understanding about the virulence mechanisms<br />
in Salmonella enterica serotype Antaum<br />
and Salmonella enterica serotype Newington.<br />
n 48<br />
16S RRNA SEQUENCING OF FOOD AND<br />
ENVIRONMENTAL SAMPLES ON THE<br />
ILLUMINA MISEQ<br />
N. Daquigan 1 , J. White 2 , C. Grim 1 , D. Hanes 1 ,<br />
K. Jarvis 1 ;<br />
1<br />
U.S. Food and Drug Administration, Laurel,<br />
MD, 2 Resphera, Baltimore, MD.<br />
Purpose: Microbiome profiling of food and<br />
environmental samples can enhance our understanding<br />
of associated microbial community<br />
complexities with the potential for pathogen<br />
identification. Sequencing low diversity amplicon<br />
libraries on the Illumina MiSeq requires<br />
introduction of diversity into the primer set.<br />
Here, a 16S rRNA sequencing protocol was<br />
developed and optimized for enteric pathogen<br />
surveillance in environmental and food<br />
samples. Methods: Four pairs of 16S rRNA<br />
gene primers, with 0-3 additional nucleotides<br />
to increase sequence diversity, were designed<br />
to span the V1-V3 regions of the 16S rRNA<br />
gene and Illumina adapter and Nextera index<br />
sequences. Cucumber, naturally contaminated<br />
Masala spice mix, and cilantro samples were<br />
cultured following FDA BAM methods. Some<br />
cilantro samples were spiked with Salmonella<br />
enterica at ~4 CFU/25g. DNA was extracted<br />
using the QIAcube for food and a standard<br />
CTAB protocol for hospital biofilm samples.<br />
16S rRNA amplicon libraries were normalized<br />
manually or with SequelPrep plates, pooled at<br />
8-11pM, multiplexed up to 96 samples per run,<br />
spiked with 10% PhiX at 12.5 or 20pM, and<br />
sequenced with Illumina 600-cycle v3 chemistry.<br />
Sequences were quality filtered using the<br />
Quantitative Insights in Microbial Ecology<br />
package and analyzed for taxonomic composition<br />
using Resphera Insight. Results: Newly<br />
designed 16S rRNA primers demonstrated<br />
identity to S. enterica, Campylobacter jejuni,<br />
Listeria monocytogenes, Shigella dysenteriae,<br />
Shigella sonnei, and Vibrio cholera using<br />
BLASTn analysis. In 96 sample MiSeq runs,<br />
SequelPrep plates reduced sample preparation<br />
time and provided consistent sample normalization<br />
(0.0005 - 2.0% reads passing filter per<br />
sample). Increasing library concentrations<br />
to 11pM with 20pM PhiX resulted in average<br />
data yields of 9.0G as compared to 7.7G<br />
with 8pM libraries. Quality filtering identified<br />
higher levels of non-specific amplicon contaminants<br />
in Masala (28%) as compared to cilantro<br />
(2%), cucumber (0.6%), and biofilm (0.7%)<br />
samples. Microbiome proportional abundances<br />
included: Staphylococcus (15%), Enterobacter<br />
(15%), and Bacillus (10%) species for Masala<br />
(10,000 reads); Pseudomonas (40%), Flavobacterium<br />
(19%) and Janthinobacterium (8%)<br />
species in cilantro (25,000 reads); Acinetobacter<br />
(15%), Rhizobium (12%), and Pseudomonas<br />
(12%) species in cucumbers (10,000<br />
reads); and Elizabethkingia (25%), Pseudomonas<br />
(13%), and Serratia (8%) species in<br />
biofilm samples (15,000 reads). S. enterica<br />
72<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
averaged 11.3% (2835 of 25,000 reads, n=6)<br />
and 0.46% (46 of 10,000 reads, n=2) in spiked<br />
cilantro and culture-positive Masala, respectively.<br />
Conclusions: Pathogens associated with<br />
food and environmental contamination, such<br />
as S. enterica, are important disease outbreak<br />
sources. Optimizing all aspects of our 16S<br />
rRNA sequencing protocol enables sequencing<br />
of multiple sample types important for public<br />
health safety.<br />
n 49<br />
SALMONELLA ENTERICA SEROVAR<br />
KENTUCKY ISOLATES FROM DAIRY COWS<br />
AND POULTRY DEMONSTRATE DIFFERENT<br />
EVOLUTIONARY HISTORIES AND HOST-<br />
SPECIFIC POLYMORPHISMS<br />
B. J. Haley, J. S. Van Kessel, J. S. Karns;<br />
USDA, ARS, EMFSL, Beltsville, MD.<br />
Salmonella enterica subsp. enterica serovar<br />
Kentucky is commonly isolated from dairy<br />
cows and poultry in the United States. Although<br />
it is not among the most frequently<br />
isolated serovars from cases of human salmonellosis,<br />
its high prevalence in livestock and<br />
poultry indicate it is a potential public health<br />
threat, particularly in light of the global spread<br />
of multi-drug resistant S. Kentucky ST198. To<br />
investigate genomic differences between S.<br />
Kentucky isolated from dairy farms and poultry<br />
operations, the genomes of 41 S. Kentucky<br />
ST152 isolates recovered from dairy cows and<br />
four S. Kentucky ST152 isolates recovered<br />
from poultry were sequenced using an Illumina<br />
NextSeq 500. Publically available S. Kentucky<br />
data were retrieved from the NCBI Sequence<br />
Read Archive and phylogenies were inferred<br />
after SNPs were detected using both ParSNP<br />
and the CFSAN SNPfinder. Phylogenetic inference<br />
demonstrated that S. Kentucky ST152<br />
isolates from poultry evolved from those<br />
frequently recovered from dairy cows. The S.<br />
Kentucky genomes in the clades dominated by<br />
cow isolates are differentiated from those in<br />
the clade dominated by poultry isolates by nine<br />
conserved single nucleotide polymorphisms.<br />
A significant number of the SNPs were located<br />
in open reading frames identified as iron-scavenging<br />
genes suggesting these differences are<br />
at least partly responsible for the host-specificity.<br />
Interestingly, among the isolates, there was<br />
no evidence of mixing of cow and poultry S.<br />
Kentucky. However, there was one isolate that<br />
appeared as intermediate and was rooted near<br />
the most recent common ancestor. Within the<br />
cow-specific clade, isolates were, in general,<br />
clustered by geography, however some geographical<br />
mixing was observed. Results of this<br />
analysis indicate a geographical location and<br />
source (cow or poultry) could be inferred from<br />
the genome sequences of S. Kentucky.<br />
n 50<br />
GENOMIC EPIDEMIOLOGY OF SALMONELLA<br />
ENTERICA SEROTYPE CERRO<br />
A. Thachil 1 , H. Suzuki 2 , A. Glaser 1 , M.<br />
Thomas 3 , S. Das 3 , G. Gopinath 4 , J. Jean-Gilles<br />
Beaubrun 4 , N. Addy 4 , H. Chase 4 , A. Jayaram 4 ,<br />
Y. Yoo 4 , T. Chung 4 , D. Hanes 4 , J. Scaria 3 ;<br />
1<br />
Cornell University, Ithaca, NY, 2 Yamaguchi<br />
University, Yamaguchi, JAPAN, 3 South Dakota<br />
State University, Brookings, SD, 4 FDA, Laurel,<br />
MD.<br />
Salmonella enterica is classified into more<br />
than 2500 serotypes. Salmonella enterica serotype<br />
Cerro is a strain type mainly associated<br />
with dairy cattle. In 1990s the incidence of this<br />
serotype in cattle was relatively rare. However,<br />
in the last decade Salmonella enterica serotype<br />
Cerro has emerged as a common serotype<br />
found in lactating cows in Northeastern United<br />
States. Although primarily adapted to cows,<br />
Salmonella enterica serotype Cerro has also<br />
caused human infections and outbreaks. This<br />
includes the 1985 ‘Carne Seca’-associated<br />
outbreak in New Mexico and 2012 outbreak<br />
in Arkansas prisons. To obtain better insights<br />
into the possible cause of the increased incidence<br />
of Salmonella enterica serotype Cerro,<br />
we performed a Next generation Sequencing<br />
(NGS) based comparative genomic analysis of<br />
Cerro strains isolated from farms in New York,<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
73
Poster <strong>Abstracts</strong><br />
Pennsylvania, Vermont and South Dakota.<br />
Strains were isolated and were identified as<br />
Salmonella using standard protocols. Serotyping<br />
of the strains was performed at National<br />
Veterinary Service Laboratory, Ames, Iowa.<br />
After serotyping over 200 isolates, 70 isolates<br />
were chosen for NGS. Sequencing libraries<br />
were prepared using Illumina Nextera XT<br />
kit and the sequencing was performed using<br />
Illumina 2 x 250 base paired end sequencing<br />
chemistry. The strains were assembled<br />
independently using CLC genome workbench<br />
and Spades Assembler v 3.5.0, and annotated<br />
using RAST and Prokka v1.11. Comparative<br />
genomic analysis of the isolates from our collection<br />
and other Cerro genomes available<br />
in NCBI was performed using ITEP tool kit.<br />
Also, 2800 Salmonella core gene dataset was<br />
used with in-house perl scripts to identify<br />
whole genome core gene SNP profiles of Cerro<br />
isolates and representative strains from other<br />
serovars. Phylogenetic analysis and illustration<br />
were completed using MEGA6 suite. To<br />
determine whether changes in virulence genes<br />
contributed to the higher incidence of this<br />
serotype, we mapped the presence/absence of<br />
nearly 200 Salmonella virulence associated<br />
genes in the sequenced genomes. Our preliminary<br />
results indicate that Salmonella enterica<br />
serotype Cerro of bovine and swine origin has<br />
similar genome properties. We are currently<br />
refining the final results from the comparative<br />
genome analysis and the complete data will be<br />
presented in the conference presentation.<br />
n 51<br />
USE OF NEXT GENERATION SEQUENCING<br />
TO EXPLORE THE DIVERSITY OF<br />
TOXINOTYPE V CLOSTRIDIUM DIFFICILE<br />
STRAINS ORIGINATING FROM A CLOSED<br />
POPULATION OF HUMANS AND SWINE<br />
K. N. Norman, H. M. Scott;<br />
Texas A&M University, College Station, TX.<br />
Clostridium difficile typically causes nosocomial<br />
infections; however, there have been<br />
increasing reports of community-acquired C.<br />
difficile infection (CA-CDI). These community-acquired<br />
cases have no recent history of<br />
hospitalization. The finding of similar strains<br />
in humans and animals, as well as retail meat,<br />
has raised concern that C. difficile is a potential<br />
foodborne pathogen. Previously we isolated<br />
C. difficile from a closed population of humans<br />
and swine to investigate the potential for<br />
C. difficile to transfer from swine to humans<br />
through occupational exposure. We found<br />
that there was not a significant difference in<br />
the prevalence of C. difficile in wastewater<br />
from humans who worked with swine versus<br />
humans who did not work with swine. Interestingly,<br />
the majority of strains isolated from both<br />
the human wastewater and swine fecal samples<br />
were classified as toxinotype V, North American<br />
Pulsed-field type 7 (NAP7). Pulsed-field<br />
gel electrophoresis (PFGE) and ribotyping<br />
are the standard methods used to differentiate<br />
between C. difficile strains; however these<br />
typing methods may not be the best suited<br />
methods for C. difficile, particularly in regards<br />
to toxinotype V strains. Typing of C. difficile is<br />
further complicated because PFGE is generally<br />
used in the United States, while ribotyping is<br />
commonly used in Europe, making comparisons<br />
between strains and studies difficult. Next<br />
generation sequencing may provide a more<br />
discriminatory method to differentiate between<br />
strains and will also provide evolutionary<br />
information about the strains. We conducted<br />
whole genome sequencing on 36 swine and<br />
28 human epidemiologically related, toxinotype<br />
V, NAP7 strains isolated from the closed<br />
population on the Illumina MiSeq sequencing<br />
platform. Library preparation was performed<br />
using Nextera XT DNA sample prep kits and<br />
each strain was individually indexed. MiSeq<br />
Reporter and Geneious Pro Software were<br />
used to assemble and align sequences, identify<br />
potential nucleotide polymorphisms, and<br />
facilitate phylogenetic analyses. We are currently<br />
analyzing the data including the single<br />
nucleotide polymorphisms found in the 64<br />
sequenced strains. Whole genome sequencing<br />
data will facilitate the comparison between<br />
PFGE-derived and genome sequencing-derived<br />
74<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
diversity. The diversity and evolution of toxinotype<br />
V, NAP7 strains are especially important<br />
because these strains are commonly found<br />
in both food animals and humans and many<br />
questions still remain about the potential role<br />
of food animals in CA-CDI. Understanding the<br />
true diversity within C. difficile toxinotypes<br />
and North American Pulsed-field types is also<br />
essential for discussions regarding appropriate<br />
and standardized typing methods.<br />
n 52<br />
A PATHOGENOMICS APPROACH TO<br />
IMPROVE THE ACCURACY OF VETERINARY<br />
CLINICAL DIAGNOSTICS<br />
S. Das 1 , M. Thomas 1 , A. Pillatzki 1 , L. Holler 1 ,<br />
M. Keena 1 , A. Foley 1 , E. Nelson 1 , J. Christopher-Hennings<br />
1 , H. Suzuki 2 , J. Scaria 1 ;<br />
1<br />
South Dakota State University, Brookings, SD,<br />
2<br />
Yamaguchi University, Yamaguchi, JAPAN.<br />
The recent advances in Next Generation Sequencing<br />
(NGS) technology promise cheap<br />
and fast whole genomic data and offer the<br />
possibility to revolutionize veterinary clinical<br />
diagnostics. It is now possible to improve the<br />
accuracy of clinical diagnostics when pathology<br />
data is combined with NGS based clinical<br />
specimen sequencing. This approach offers<br />
faster results and avoids the need for many<br />
different tests to obtain the final interpretation.<br />
We test such an approach in the clinical<br />
evaluation of Salmonellosis cases in Animal<br />
Disease Research and Diagnostic Laboratory<br />
(ADRDL), South Dakota. All bovine<br />
and swine salmonellosis cases submitted to<br />
ADRDL from September 2014 till date was<br />
included in this study. Salmonella isolates were<br />
identified using standard microbiological protocols.<br />
Further characterization of Salmonella<br />
strains were then performed using NGS. Sequencing<br />
libraries for NGS was prepared using<br />
Illumina Nextera XT kit and the sequencing<br />
was performed using Illumina 2 x 250 base<br />
paired end sequencing chemistry. Genome<br />
assembly of the strains was performed using<br />
Spades Assembler v 3.5.0 and annotated<br />
using Prokka v1.11. Comparative genomic<br />
analysis of the isolates was performed using<br />
ITEP tool kit. Antibiotic resistance gene profile<br />
and virulence gene profile of the isolates was<br />
determined using a custom PERL script. These<br />
results were then combined with pathological<br />
data. This combined approach revealed that in<br />
most cases Salmonella infection was accompanied<br />
by presence of other enteric pathogens<br />
such as rotavirus, Clostridium perfringens,<br />
hemolytic Escherichia coli and porcine epidemic<br />
diarrhea virus. Salmonella enterica serotype<br />
Dublin was found to be associated with<br />
septicemia and colitis in the absence of other<br />
pathogens. Antibiotic resistance gene profile<br />
obtained based NGS data provided further<br />
insights into the possible antibiotic susceptibility/resistance<br />
pattern of the strains.<br />
n 53<br />
QUALITY ASSESSMENT OF SINGLE<br />
NUCLEOTIDE POLYMORPHISM IN THE<br />
HIGHLY CLONAL SALMONELLA ENTERITIDIS<br />
D. Ogunremi;<br />
Canadian Food Inspection Agency, Ottawa,<br />
ON, CANADA.<br />
Single nucleotide polymorphism (SNPs) has<br />
emerged as the most informative genetic<br />
variation for the characterization of the highly<br />
clonal Salmonella Enteritidis (SE) to meet<br />
the goals of epidemiological and evolutionary<br />
investigations. Raw reads, contigs, draft<br />
or polished genomes are used to identify SNP<br />
variants typically by aligning the nucleotide<br />
reads of an isolate to a reference genome.<br />
Two relatively well-assembled genomes can<br />
be compared to determine the number of SNP<br />
variants between them. In addition, referencefree<br />
SNP detection from raw reads has also<br />
been proposed and used. There are limitations<br />
to detecting SNPs from the currently available<br />
software and bioinformatics pipelines<br />
and these include the use of a significantly<br />
disparate reference genome to align raw reads,<br />
sequencing errors, mis-assembly and repeats.<br />
To that end, large scale studies investigating<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
75
Poster <strong>Abstracts</strong><br />
SNPs typically involve Sanger sequencing of<br />
PCR amplicons representing the targets for the<br />
purpose of confirming any variants. Because of<br />
the responsibility imposed on food safety regulatory<br />
agencies to act promptly in forestalling<br />
human exposure to contaminated food, rapid<br />
but accurate SNP detection is required and<br />
this precludes lengthy testing procedures such<br />
as Sanger sequencing before acting to protect<br />
consumers. Bioinformatics pipelines for SNP<br />
detection will not only benefit from rigorous<br />
evaluation and routine quality assurance<br />
methods but may also require a systematic<br />
assessment using biologically relevant isolates<br />
as part of a pre-deployment procedure. To this<br />
end, polished genomes were developed for<br />
the chromosomes of three related SE isolates<br />
obtained from the same poultry facility on the<br />
same day, and were deposited in the GenBank<br />
(Accession numbers CP009084.2, CP009085.2<br />
and CP011942). Using whole genome analysis<br />
tools, bi-directional Sanger sequencing<br />
of target amplicons, and pyrosequencing, the<br />
three isolates were shown to differ from each<br />
other by 3-9 SNPs. Investigation was done to<br />
determine whether true SNPs can be accurately<br />
and consistently detected using the same bioinformatics<br />
platform applied at the different processing<br />
stages of each genome from raw reads<br />
to polished genomes. Variant calling from raw<br />
reads mapped to a reference genome led to the<br />
detection of a higher number of SNPs when<br />
compared to finished genomes, many of which<br />
could not be confirmed by Sanger sequencing<br />
(i.e., false SNPs). Progressive analysis of the<br />
nucleotide data refined the detection of SNPs<br />
by reducing the number of false positives arising<br />
from mis-assembly or repeats. The development<br />
of a wet chemistry, high-throughput,<br />
cost-efficient SNP-PCR was useful as a quality<br />
tool for confirming true SNPs and relatedness<br />
among isolates, providing confidence in testing<br />
results and averting the need to routinely<br />
develop polished genomes or carry out further<br />
lengthy testing procedures.<br />
76<br />
n 54<br />
ASM Conferences<br />
COMPARATIVE SEQUENCE ANALYSIS OF<br />
MULTI-DRUG RESISTANT INCA/C PLASMIDS<br />
FROM SALMONELLA ENTERICA<br />
M. Hoffmann 1 , J. Pettengill 1 , J. Miller 1 , N.<br />
Gonzalez-Escalona 1 , S. L. Ayers 2 , J. Payne 1 , S.<br />
Zhao 2 , J. Meng 3 , M. W. Allard 1 , P. F. Dermott 2 ,<br />
E. W. Brown 1 , S. R. Monday 1 ;<br />
1<br />
US Food and Drug Administration, College<br />
Park, MD, 2 US Food and Drug Administration,<br />
Laurel, MD, 3 University of Maryland, College<br />
Park, MD.<br />
Multidrug-resistant (MDR) determinants are<br />
often encoded on mobile plasmids. Presently,<br />
there are minimal data regarding antibiotic<br />
resistance and dissemination, especially as it<br />
applies to plasmid evolution and maintenance<br />
in zoonotic bacterial pathogens. Using next<br />
generation sequencing it is quite difficult to<br />
filter out large plasmid from chromosomal<br />
contigs, particularly if the plasmid has a<br />
low copy number approaching one, as with<br />
the chromosome. The Pacific Biosciences<br />
(PacBio) RS II Sequencer provides very long<br />
reads that greatly facilitates distinguishing<br />
plasmid sequences. We report here a very efficient<br />
plasmid isolation protocol for use with<br />
Salmonella enterica serovars that produces<br />
purified DNA that meets the criteria necessary<br />
for sequencing with PacBio technology.<br />
A total of six different Salmonella enterica<br />
isolates, representing six different serovars and<br />
containing the MDR-AmpC resistance profile,<br />
isolated from retail poultry meats, were used in<br />
this study. Salmonella plasmids were obtained<br />
using a modified mini preparation and transformed<br />
into Escherichia coli DH10Br. A Qiagen<br />
Large-Construct kit was used to recover<br />
highly concentrated and purified plasmid DNA<br />
that was sequenced using PacBio technology.<br />
The size of the closed IncA/C plasmids ranged<br />
from 104 kb to 191 kb and shared a stable,<br />
conserved backbone containing 98 core genes,<br />
with only six core gene differences. The six<br />
IncA/C plasmids encoded a number of antimicrobial<br />
resistance genes, including those for<br />
quaternary ammonium compounds and mer-
Poster <strong>Abstracts</strong><br />
cury. The numbers of resistance determinants<br />
varied from 8 to 17 (S. Newport (13), S. Typhimurium<br />
(13), S. Heidelberg (17), S. Infantis<br />
(14), S. Agona (8), and S. Kentucky (14))<br />
with some having two copies of blacmy-2,<br />
quacEΔ1, sugE, merE, merD, and merA. Additionally,<br />
we performed a comparative sequence<br />
analyses that included 1) 14 additional IncA/C<br />
plasmids derived from Salmonella enterica and<br />
2) 38 IncA/C additional plasmids derived from<br />
different genera to provide an evolutionary<br />
picture of antimicrobial resistance mediated by<br />
this common plasmid backbone. These findings<br />
shed light on the variations of the IncA/C<br />
in resistant bacteria from different sources.<br />
n 55<br />
APPLICATION OF WHOLE-GENOME<br />
SEQUENCING APPROACHES TO THE<br />
REAL-TIME CHARACTERIZATION OF<br />
BACTERIAL PATHOGENS IN FOOD-TESTING<br />
LABORATORIES<br />
C. D. Carrillo, D. Lambert, A. Koziol, P. Manninger,<br />
M. Gauthier, B. W. Blais;<br />
Canadian Food Inspection Agency, Ottawa,<br />
ON, CANADA.<br />
The timely identification and characterization<br />
of foodborne bacteria for risk assessment<br />
purposes is a key operation in a food safety<br />
investigation. Current methods require several<br />
days and/or provide low-resolution characterization.<br />
We have implemented procedures<br />
for the rapid production of whole-genome sequence<br />
(WGS) data for the characterization of<br />
bacterial pathogens (Salmonella spp., Listeria<br />
monocytogenes and Shiga-toxigenic Escherichia<br />
coli (STEC)) isolated in food-testing<br />
laboratories at the Canadian Food Inspection<br />
Agency (CFIA). To demonstrate the feasibility<br />
and accuracy of WGS as an alternative<br />
to traditional procedures, an analysis of over<br />
500 historical and contemporary isolates was<br />
conducted to compare WGS approaches to<br />
the characterization achieved through current<br />
methods. Genomic DNA was isolated from<br />
single colonies or broth cultures and sequencing<br />
libraries were constructed using the Nextera<br />
XT DNA sample preparation kit (Illumina,<br />
Inc., San Diego, CA). Sequence was generated<br />
on the Illumina MiSeq instrument with raw<br />
data sampling from the instrument during the<br />
sequencing run (for urgent samples) and/or<br />
following completion of the run. Automated<br />
pipelines for bioinformatic analysis were generated<br />
to perform read corrections, de novo<br />
assembly, quality assessment, sequence vector<br />
screening and removal, and identification of<br />
pertinent genetic markers. The entire set of<br />
input parameters and software versions used<br />
to conduct these analyses was automatically<br />
captured in a traceability report. Characteristic<br />
genetic markers were accurately identified in<br />
all cases where WGS data met minimum quality<br />
standards and a report of genomic analysis<br />
(ROGA) could be generated within 1 to 3 days<br />
following reception of colony isolates. ROGAs<br />
were presented in a user-friendly format for<br />
ease of use by risk assessors and recall specialists.<br />
Real-time WGS produces high-resolution<br />
characterization of bacterial pathogens at a<br />
cost and timeframe similar to methods that are<br />
currently in use and has the potential to replace<br />
lengthy biochemical characterization and<br />
typing procedures used in contemporary foodtesting<br />
laboratories.<br />
n 56<br />
GENERATING FORENSIC INSIGHT THROUGH<br />
EVIDENCE-ASSOCIATED MICROBIOME<br />
ANALYSIS<br />
A. Materna, F. Strino, J. Johansen, P. Liboriussen,<br />
L. Schauser;<br />
QIAGEN, Aarhus, DENMARK.<br />
NGS has revolutionized the field of microbial<br />
ecology, by revealing insight into environmental<br />
as well as host-associated microbiomes.<br />
Techniques for monitoring microbiome<br />
composition are today being used for a wide<br />
variety of purposes, including forensics applications.<br />
Here we present a user-friendly<br />
and NGS-platform independent bioinformatics<br />
solution for microbiome profiling and<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
77
Poster <strong>Abstracts</strong><br />
demonstrate its utility for crime scene investigation.<br />
Within the EU-funded “MiSAFE”<br />
research project (FP 7) our analysis solution<br />
was validated through a mock crime scene<br />
investigation. Partners extracted DNA from<br />
soil samples obtained from suspects’ boots<br />
and crime scenes. Then 2x 300bp paired read<br />
libraries were generated from 16S rRNA amplicons<br />
and sequenced on an Illumina MiSeq<br />
instrument. The resulting NGS data were<br />
subjected to our analysis workflow consisting<br />
of i) preprocessing and quality control, ii) clustering<br />
of data into operational taxonomic units<br />
(OTUs) and taxonomic assignment of OTUs,<br />
iii) detection of PCR artifacts (chimeras), iv)<br />
annotation of results with sample metadata,<br />
v) rarefaction analysis, beta diversity estimation<br />
and principal coordinate analysis (PCoA).<br />
The OTUs can be clustered de novo, or using<br />
common 16S sequence databases as reference.<br />
A number of additional statistical tools for<br />
microbiome analysis and microbial ecology<br />
complete the analysis solution, including richness<br />
calculation, hierarchical clustering, or<br />
the Multiple Response Permutation Procedure<br />
(MRPP) PERMANOVA testing for significant<br />
differences between two or more groups. The<br />
obtained results are richly visualized and can<br />
be browsed in the context of investigationrelevant<br />
metadata, which resulted in supporting<br />
evidence for the suspect being involved in<br />
the criminal act.<br />
n 57<br />
RAPID DETECTION AND GENETIC<br />
CHARACTERIZATION OF BURKHOLDERIA<br />
PSEUDOMALLEI AND ITS NEAR-NEIGHBORS<br />
THROUGH MULTIPLEX AMPLICON<br />
SEQUENCING<br />
J. Delisle 1 , J. Schupp 1 , J. Sahl 1 , R. Colman 1 , Y.<br />
Hueftle 1 , H. Heaton 1 , J. Gillece 1 , A. Vazquez 2 ,<br />
C. Hall 2 , J. Busch 2 , M. Mayo 3 , B. Currie 3 , D.<br />
Engelthaler 1 , P. Keim 1 , D. M. Wagner 2 ;<br />
1<br />
Translational Genomics Research Institute,<br />
Flagstaff, AZ, 2 Northern Arizona University,<br />
Flagstaff, AZ, 3 Menzies School of Health Research,<br />
Darwin, AUSTRALIA.<br />
Burkholderia pseudomallei, the causative agent<br />
of melioidosis, is a public health threat and potential<br />
bioterrorism agent endemic to Southeast<br />
Asia and Northern Australia. Current methodologies,<br />
such as real-time PCR, allow for rapid<br />
detection but only limited characterization. A<br />
rapid and robust assay for the detection and<br />
characterization of clinical and forensic materials<br />
suspected of containing Burkholderia<br />
pseudomallei would be of enormous benefit<br />
to epidemiological and forensic investigations.<br />
Next-generation sequencing of multiple<br />
informative genetic loci can provide efficient,<br />
rapid detection and differentiation from near<br />
neighbor species, as well as fine scale genetic<br />
characterization. We have developed a 68 locus<br />
amplicon sequencing assay that results in<br />
1) detection of B. pseudomallei; 2) differentiation<br />
from B. mallei and other near-neighbor<br />
species; 3) potential detection of strain mixtures;<br />
4) differentiation within B. pseudomallei;<br />
and 5) virulence gene characterization (11<br />
gene targets) within 24-48 hours, and from<br />
both culture and complex environmental or<br />
clinical samples. The system couples highly<br />
multiplexed amplification reactions with a<br />
universal amplicon indexing system, resulting<br />
in efficient multilocus amplicon sequencing<br />
from potentially hundreds of samples in a<br />
single Illumina MiSeq sequencing run. Utilizing<br />
redundant targets identified with Blast<br />
Score Ratio analysis for species identification,<br />
we show virtually 100% specificity using a<br />
panel of B. pseudomallei, B. mallei and close<br />
near-neighbors, such as B. thailandensis, B.<br />
oklahomensis, and B. humptydooensis, among<br />
others. We demonstrate detection of B. pseudomallei<br />
from as little as 10 genome copies of<br />
moderately to highly degraded DNA as well as<br />
differentiation within B. pseudomallei strains,<br />
utilizing variation within the targeted speciesspecific,<br />
MLST and virulence loci.<br />
78<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
n 58<br />
SHORT TANDEM REPEATS FOR MOLECULAR<br />
DETECTION OF PATHOGENS<br />
X. Guo 1 , Q. Yu 2 , P. Liu 1 , J. Watt 1 , Y. Li 3 , S.<br />
Sammons 3 ;<br />
1<br />
Eagle Medical Service and Centers for Disease<br />
Control and Prevention, Atlanta, GA,<br />
2<br />
Emory University, Atlanta, GA, 3 Centers for<br />
Disease Control and Prevention, Atlanta, GA.<br />
With recent development in sequencing technology,<br />
a large number of pathogen genome<br />
sequences provide it possible for comparative<br />
analysis of short tandem repeats (sTR) to<br />
understand their epidemiologic significance<br />
as genetic marker on gene expression regulation,<br />
host-pathogen interaction, new pathogen<br />
molecular identification and disease diagnosis.<br />
Here with bioinformatics methods including<br />
scientific programming and computational<br />
biological statistics, we were trying to identify<br />
important sTR in virus genomes and apply<br />
them on rapid identification of new pathogen<br />
and their diagnosis grouping. Result indicated<br />
that, among 36 NCBI-available west and<br />
central Africa (group A-D) monkeypox virus<br />
genomes, ten sTR with largest statistically<br />
significant difference were identified with their<br />
sequence patterns, copy number change and<br />
genetic mutations. Clustering MPVs with their<br />
copy number changes on these 10 sTR, almost<br />
the same grouping results can be obtained<br />
as clustering them by the alignment of their<br />
whole genome sequences. At the same time,<br />
a 5-nt repeat rearrangement has been found<br />
as TC-31k representatively among different<br />
groups of MPV. The TC-31k has their specific<br />
arrangement of ccatt/ccatc (T/C) within the<br />
west Africa, group C/D and group A/B of<br />
central Africa MPVs. Combined with the copy<br />
number change of a tandem repeat AGATT, the<br />
grouping property of a newly found MPV can<br />
be identified by one or two PCR sequencing<br />
or directly from NGS raw data. Considering<br />
the fact that TC-31k encodes a Kelch-like<br />
protein motif which is restricted in pox virus,<br />
the T/C rearrangement may reflect the variety<br />
of binding interaction with their host proteins.<br />
As a summary, through the systemly biological<br />
analysis of MPV genomes, several significant<br />
sTR were found to be group-specific for DNA<br />
sequence clustering and the molecular detection<br />
of newly found MPV pathogens.<br />
Keywords: short tandem repeat, pathogen<br />
identification, monkeypox virus, NGS<br />
n 59<br />
A WGMLST APPROACH TO SUBSPECIES<br />
IDENTIFICATION AND CHARACTERIZATION<br />
OF BACTERIA IN A CULTURE COLLECTION<br />
A. Luquette 1 , K. Chase 1 , H. Pouseele 2 , K. De<br />
Bruyne 2 , M. Wolcott 1 ;<br />
1<br />
USAMRIID, Frederick, MD, 2 Applied Maths<br />
NV, Sint-Martens-Latem, BELGIUM.<br />
Identification and characterization at the subspecies<br />
level is a challenge within bacterial<br />
species with high sequence similarity. There<br />
are a multitude of methods for subspecies<br />
identification, each with limitations in the ability<br />
to resolve close genetic neighbors. Nextgeneration<br />
sequencing allows for the entire<br />
genome to be used for sequence typing in a<br />
cost effective manner. While traditional multilocus<br />
sequence typing (MLST) schemes use<br />
only 6-10 housekeeping loci for identification,<br />
whole genome multi-locus sequence typing<br />
(wgMLST) schemes typically use 2000-4000<br />
loci from across the genome, providing greater<br />
sequence typing resolution. wgMLST pipelines<br />
were developed using commercial software<br />
(BioNumerics 7.5 software; Applied Maths<br />
NV) for Bacillus anthracis and Francisella tularensis<br />
for use by the Department of Defense<br />
Unified Culture Collection (UCC). Following<br />
whole genome sequencing, alleles were identified<br />
with two approaches: de novo assembly<br />
followed by a BLAST search and assemblyfree<br />
identification directly from the sequence<br />
reads. All calculations were done in the cloud<br />
so very limited local computational memory<br />
was needed. Alleles for different subspecies<br />
were generated and compared with current alleles<br />
in known sequence types. This approach<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
79
Poster <strong>Abstracts</strong><br />
also allows multiple schemes and subsets to be<br />
generated based on specific genome data, such<br />
as virulence factors. The development, application,<br />
and results of the typing schemes for<br />
Bacillus anthracis and Francisella tularensis<br />
will be presented, demonstrating the usefulness<br />
of the system. Next-generation sequencing<br />
and wgMLST will extend our capabilities to<br />
identify and characterize these and many other<br />
organisms from the UCC.<br />
n 60<br />
EFFECT OF SINGLETON REMOVAL ON THE<br />
MICROBIAL COMMUNITY STRUCTURE<br />
RESULTING FROM ILLUMINA PAIRED-END<br />
MISEQ 16S AMPLICON DATA<br />
K. Wong 1 , T. Shaw 2 , M. Molina 3 ;<br />
1<br />
Oak Ridge Institute for Science and Education,<br />
Oak Ridge, TN, 2 Institute of Bioinformatics,<br />
University of Georgia, Athens, GA,<br />
3<br />
USEPA, Office of Research and Development,<br />
Ecosystems Research Division, Athens, GA.<br />
Application of next generation sequencing<br />
(NGS) to study the microbial ecology in<br />
diverse environments has been increasingly<br />
popular in the last several years. The technique,<br />
which has been applied in different research<br />
areas, has the ultimate goal of gaining a<br />
better understanding of human and ecosystem<br />
functioning, as well as environmental sustainability.<br />
Because the bioinformatics analysis of<br />
NGS data is at the maturing stage, the quality<br />
control (QC) procedures involved in analyzing<br />
the data varies among different published studies.<br />
Since Singleton removal has been recently<br />
recommended in the QC step to remove artificial<br />
reads generated during Illumina sequencing,<br />
we investigated its effects on the community<br />
structure and diversity index from the<br />
16S amplicon data produced by the Illumina<br />
Paired-End technique. Our experimental samples<br />
were fresh and aged cow manure analyzed<br />
using the Quantitative Insight Into Microbial<br />
Ecology (QIIME) platform. Singleton removal<br />
did not have a significant effect on relative<br />
abundance results except for the removal of<br />
unknown sequences in most samples. However,<br />
while the alpha diversity of fresh manure<br />
was higher than that of aged ones before singleton<br />
removal, a decreased in the diversity of<br />
fresh manure samples was observed after the<br />
removal took place. Overall, results indicate<br />
that singleton removal does have a significant<br />
effect on microbial diversity results; therefore,<br />
we recommend to add a step for removal of<br />
singletons to the standard QC procedure when<br />
analyzing Illumina sequencing data.<br />
n 61<br />
COMPARISON OF BACTERIAL DIVERSITY<br />
BETWEEN ENDOSCOPIC MUSOCAL BIOPSY<br />
AND FECAL SAMPLE AMONG FECAL<br />
MICROBIOTA TRANSPLANT RECIPIENTS<br />
J. P. Haydek 1 , W. M. Tauxe 1 , E. Neish 2 , A.<br />
Ward 3 , T. Dhere 1 , C. S. Kraft 1 ;<br />
1<br />
Emory University School of Medicine, Atlanta,<br />
GA, 2 Emory University, Atlanta, GA,<br />
3<br />
Emory Healthcare, Atlanta, GA.<br />
Background: Endogenous bacteria on the colonic<br />
mucosal border play a fundamental role<br />
in nutritional absorption, epithelial stability<br />
and innate immunity. However, core questions<br />
regarding the natural microbiome, including<br />
diversity among different intestinal sites, usage<br />
of fecal bacteria as a proxy for gut flora,<br />
and variation between adherent and planktonic<br />
bacteria, remain unanswered. Clostridium difficile<br />
infection (CDI) is a diarrheal disease associated<br />
with 15-25% of antibiotic associated<br />
diarrhea cases, and causes between 500,000<br />
and 3 million new cases each year, with a total<br />
systematic cost estimated between $436 million<br />
and $3 billion. Fecal microbiota transplant<br />
(FMT), a procedure consisting of infusion<br />
of donor fecal material to a recipient with<br />
refractory CDI, has been increasingly shown<br />
to be a safe and effective therapy for chronic<br />
recurrent CDI, but questions still remain about<br />
its exact mechanisms of action and the microbial<br />
shifts associated with the procedure.<br />
Bacterial diversity of the colonic mucosa and<br />
feces have long been assumed to be equiva-<br />
80<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
lent, although this premise has increasingly<br />
been challenged. Methods: In a prospective<br />
study, stool samples and endoscopic mucosal<br />
biopsies were taken from 4 FMT recipients<br />
at the time of procedure, 2 weeks after the<br />
procedure, and 10 weeks after the procedure.<br />
DNA from the samples was lysed using the<br />
Mo Bio Powersoil® kit followed by extraction<br />
on the Qiagen EZ1 Advanced XL platform.<br />
High-throughput sequencing (paired end dual<br />
indexing, 250 base pair bidirectional reads)<br />
was performed using an Illumina MiSeq. Data<br />
analysis was performed using the R software<br />
suite and Vegan R package. Results: Four<br />
samples taken from the distal colon at time of<br />
FMT were compared to stool samples prior to<br />
the FMT procedure. Bray-Curtis dissimilarly<br />
metrics were calculated, with an average value<br />
of 0.77. Shannon diversity was also calculated<br />
for each individual sample, with no significant<br />
difference between the stool and endoscopic<br />
mucosal samples (P=0.15). Using a 1-sample<br />
t-Test, Bray-Curtis measures between samples<br />
were compared against the null hypothesis (no<br />
diversity difference) and was determined to<br />
be significant (P= 0.003). Conclusions: Highthroughput<br />
sequencing analysis comparing<br />
endoscopic mucosal biopsies and fecal samples<br />
show significant differences in bacterial diversity.<br />
Larger sample sizes and further analysis<br />
are needed to characterize the bacterial differences<br />
among colonic mucosal and fecal sites.<br />
n 62<br />
REVEALING HUMAN PATHOGENS IN<br />
LIVESTOCK FAECES BY METAGENOMES<br />
BASED ON HIGH-THROUGHPUT<br />
SEQUENCING<br />
A. Zhang, Y. Mao, T. Zhang;<br />
The University of Hong Kong, Hong Kong,<br />
HONG KONG.<br />
In recent years, human pathogens have raised<br />
researcher awareness as one of the pollutants<br />
in the environment. The diversity and<br />
abundance of pathogens from livestock faeces<br />
samples can be used to evaluate the environmental<br />
pollutant risk of these sources. In this<br />
study, we investigated 12 metagenomic DNA<br />
sequencing data sets of pigs and chicken faeces<br />
and swine wastewater and treated water. The<br />
metagenomic data set of 2.5G for each sample<br />
was derived from Illumina HiSeq 2000 2 ×<br />
100 bp paired-end sequencing. Metagenomic<br />
Phylogenetic Analysis (MetaPhlAn) was conducted<br />
to classify microbial communities and<br />
reveal human pathogens at species level for the<br />
distribution, diversity and abundance analysis.<br />
In summary, 63 different bacterial pathogen<br />
species were detected with the maximum<br />
abundance of 61% among all classified species<br />
of the database. Both the principal coordinates<br />
analysis (PCoA) and network analysis demonstrate<br />
a clear clustering pattern that microbial<br />
structures of pathogens share little similarity<br />
among different domestic animals. Pathogenic<br />
bacteria detected in the same host species of<br />
different growth periods have various distribution<br />
and abundance pattern. Relatively strong<br />
co-occurrence correlation (P-value < 0.01 and<br />
Spearman’s coefficient ρ > 0.6) was revealed<br />
between pathogens of Shigella sp. and Fusobacterium<br />
sp., which were clustered in the<br />
same module. The methodology demonstrated<br />
in this study may provide more effective and<br />
accurate approach for pathogenic pollution<br />
evaluation, and also suggestions on control of<br />
potential pathogens in livestock farm management.<br />
Acknowledgement: This research is<br />
financed by the Research Grants Council of<br />
Hong Kong (GRF17209914E).<br />
n 63<br />
EXPLORING PATHOGEN PLASMID DNA<br />
IN WASTEWATER AND SLUDGE USING<br />
METAGENOMICS APPROACH<br />
A. Li, T. Zhang;<br />
The University of Hong Kong, Hong Kong,<br />
HONG KONG.<br />
Plasmids work as mobile gene elements for<br />
genetic materials exchange between different<br />
microorganisms including human pathogens.<br />
Virulence plasmids may turn the bacteria<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
81
Poster <strong>Abstracts</strong><br />
cells into pathogenic strains. Therefore, the<br />
abundance of pathogen plasmid DNA would<br />
demonstrate the potential threats of the environmental<br />
samples to the public health. Municipal<br />
wastewater treatment plants (WWTPs)<br />
are hotspots for plasmid horizontal transfer<br />
because of the highly exchanging rate of biological<br />
compositions and chemical materials<br />
in activated sludge and anaerobic digestion<br />
sludge, while they also play significant roles<br />
in eliminating and digesting various human<br />
pathogens. In this study, we collected plasmid<br />
DNA samples from influent, activated sludge<br />
and digested sludge of two WWTPs. About<br />
3 Gb sequences of each sample was derived<br />
from next generation sequencing with Illumina<br />
HiSeq 2000 platform using the PE101 strategy.<br />
All the datasets were uploaded to MG-RAST<br />
for function annotation, and metagenomic phylogenetic<br />
analysis was conducted to screen out<br />
human pathogen at species level for the taxonomy,<br />
diversity and abundance analysis. The<br />
plasmid metagenomic datasets were compared<br />
with four corresponding total DNA metagenomic<br />
datasets obtained from previous studies<br />
to reveal the difference between the plasmid<br />
and the total DNA metagenomes. Compared<br />
with the total DNA metagenomes, the plasmid<br />
metagenomes extracted from the same sectors<br />
of the WWTP had significantly higher annotation<br />
rates, indicating that the functional genes<br />
located on plasmids can be commonly shared<br />
by the known microorganisms. This study also<br />
showed the distribution pattern of the pathogen<br />
plasmid DNA in the plasmid metagenomes.<br />
The abundance of pathogen DNA in the influent<br />
plasmid metagenomes was much higher<br />
than those in other metagenomes, and the<br />
digested sludge plasmid metagenomes had the<br />
lowest pathogen plasmid DNA abundance. All<br />
in all, the methodology used in this study indicated<br />
a novel way to evaluate the potential risk<br />
of pathogen plasmid DNA to the public health.<br />
Acknowledgement: This research is financed<br />
by the Research Grants Council of Hong Kong<br />
(GRF7190/12E).<br />
n 64<br />
A<br />
C. A. Gulvik, J. J. Avillan, E. Alyanak, M.<br />
Sjölund-Karlsson, B. M. Limbago;<br />
Centers for Disease Control and Prevention,<br />
Atlanta, GA.<br />
Clostridium difficile isolates were collected<br />
during 2010-2011 as part of the Emerging Infections<br />
program C. difficile Infection surveillance.<br />
Isolates (n = 53) were selected to represent<br />
the diversity of strain types as determined<br />
by geographic location of isolation, pulsedfield<br />
gel electrophoresis (PFGE) type, and PCR<br />
ribotype; other molecular data included PCR<br />
detection of toxin genes tcdA, tcdB, cdtA,<br />
and cdtB, and size of tcdC deletions. Epidemiological<br />
metadata associated with isolates<br />
includes patient age, U.S. state of residence,<br />
isolation year, and epidemiologic classification<br />
as healthcare- or community-associated).<br />
Paired-end Illumina sequencing was performed<br />
on isolate genomes to assess congruence of<br />
molecular typing methods commonly used for<br />
surveillance. Systematic comparison of nine<br />
genome assemblers widely used for C. difficile<br />
and other bacterial genomes revealed iterativeor<br />
multi-de Bruijn assembly with IDBA and<br />
SPAdes provided the fewest contigs, largest<br />
N50, most predicted genes, and largest contig.<br />
Maximum likelihood phylogeny inference<br />
was performed using aligned whole genomes,<br />
which averaged 60X coverage. Phylogenetic<br />
trees enabled us to classify five isolates with<br />
unclassifiable PFGE patterns. The overall<br />
concordance of genome extracted multi-locus<br />
sequence types (STs), PCR ribotypes (RT), and<br />
PFGE groups was very good when compared<br />
to whole genome phylogeny, and provides<br />
an illustration of how each group of U.S. C.<br />
difficile is related to others. This is useful because<br />
the nomenclature of various C. difficile<br />
typing methods (e.g., NAP01, ST-3, RT 027)<br />
lacks evolutionary context, whereas whole<br />
genome phylogeny provides a single illustra-<br />
82<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
tive comparator for contextualizing isolates<br />
regardless of the molecular typing method<br />
used. Two isolates occurred in clades with<br />
bootstraps of 65 and 100%, which otherwise<br />
contained single PCR RTs. Repeat sequencing<br />
and PCR ribotyping of these two isolates confirmed<br />
the unusual placement of a single RT<br />
014 isolate within the RT 020 clade, and vice<br />
versa. Fluoroquinolone resistance determinants<br />
were common (82%) among the hypervirulent<br />
epidemic RT 027 genomes; only one non-027<br />
isolate (RT 017) harbored a Thr82Ile mutation<br />
in GyrA, which confers fluoroquinolone resistance<br />
in other species. These molecular and<br />
epidemiological data will be publicly available<br />
through NCBI, and isolates have been deposited<br />
for distribution with BEI Resources.<br />
n 65<br />
IMPLEMENTATION OF WHOLE GENOME<br />
SEQUENCING (WGS) FOR SURVEILLANCE<br />
AND OUTBREAK DETECTION OF SHIGA<br />
TOXIN-PRODUCING ESCHERICHIA COLI<br />
(STEC) IN THE UNITED STATES<br />
R. L. Lindsey 1 , H. Carleton 1 , K. Joensen 2 , F.<br />
Scheutz 3 , L. Garcia-Toledo 1 , D. Stripling 1 , H.<br />
Martin 1 , N. Strockbine 1 , L. S. Katz 1 , L. Gladney<br />
1 , T. Griswold 1 , S. Im 1 , E. M. Ribot 1 , E.<br />
Trees 1 , H. Pouseele 4 , P. Gerner-Smidt 1 ;<br />
1<br />
Centers for Disease Control and Prevention,<br />
Atlanta, GA, 2 Technical University of Denmark,<br />
Lyngby, DENMARK, 3 Statens Serum<br />
Institut, Copenhagen, DENMARK, 4 Applied<br />
Maths, Sint-Martens-Latem, BELGIUM.<br />
Introduction: Shiga toxin-producing Escherichia<br />
coli (STEC) is an important foodborne<br />
pathogen capable of causing severe disease in<br />
humans. Current methods for characterization<br />
of STEC are expensive and time-consuming.<br />
Work has begun to replace traditional methods<br />
with those using whole genome sequence<br />
(WGS) data by developing an allele database<br />
of individual Escherichia genes in BioNumerics<br />
7.5, (Applied Maths, Austin, TX). This<br />
will allow characterization of Escherichia<br />
in a single workflow using a multi-locus sequence<br />
typing (MLST) approach. Materials<br />
and Methods: The Escherichia allele database<br />
was built with 314 annotated reference<br />
genomes from a geographically diverse collection<br />
of human, animal and environmental<br />
strains as well as genes encoding virulence<br />
factors, antimicrobial resistance and O and<br />
H antigens from databases at the Center for<br />
Genomic Epidemiology (DTU, Lyngby, Denmark).<br />
The reference genomes represent 50<br />
E. coli serogroups, four Shigella species and<br />
four additional Escherichia species. Multiple<br />
subschema will be built within the database<br />
to perform identification, characterization and<br />
subtyping, including classical, extended, core<br />
and whole genome MLST. To test the ability of<br />
the BioNumerics-based whole genome MLST<br />
approach to correctly identify, characterize and<br />
cluster strains, we analyzed 500 Escherichia<br />
isolates from sporadic and outbreak-related<br />
infections and compared the findings to those<br />
obtained previously with phenotypic and<br />
molecular subtyping methods. Results and<br />
Discussion: The Escherichia allele database<br />
contains 18,883 loci. For the 500 Escherichia<br />
isolates analyzed, there was 95% concordance<br />
in the results generated by the traditional and<br />
wgMLST approaches. Conclusions: The<br />
BioNumerics-based wgMLST approach provides<br />
a single, cost effective strategy to identify<br />
and characterize isolates for surveillance<br />
and outbreak investigations. The analysis tools<br />
in BioNumerics will enable end-users in public<br />
health laboratories to analyze WGS data they<br />
generate with little bioinformatics expertise,<br />
making the system equally efficient for local<br />
and central investigations. The system will be<br />
refined through continued collaboration with<br />
domestic and international partners.<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
83
Poster <strong>Abstracts</strong><br />
n 66<br />
DELINEATING COMMUNITY OUTBREAKS<br />
OF SALMONELLA ENTERICA SEROVAR<br />
TYPHIMURIUM USING WHOLE GENOME<br />
SEQUENCING: INSIGHTS INTO GENOMIC<br />
VARIABILITY WITHIN AN OUTBREAK<br />
S. Octavia 1 , Q. Wang 2 , M. Tanaka 1 , S. Kaur 1 , V.<br />
Sintchenko 2 , R. Lan 1 ;<br />
1<br />
University of New South Wales, Sydney,<br />
AUSTRALIA, 2 Westmead Hospital, Sydney,<br />
AUSTRALIA.<br />
Whole genome next generation sequencing<br />
(NGS) was used to retrospectively examine 57<br />
isolates from five epidemiologically confirmed<br />
community outbreaks (designated as Outbreak<br />
1 to Outbreak 5) caused by Salmonella enterica<br />
serovar Typhimurium (S. Typhimurium) phage<br />
type DT170. Most of the human and environmental<br />
isolates confirmed epidemiologically<br />
to be involved in the outbreaks were either<br />
genomically identical or differed by one to two<br />
single nucleotide polymorphisms (SNPs) with<br />
the exception of Outbreak 1. The isolates from<br />
Outbreak 1 differed by up to 12 SNPs, which<br />
suggests that the food source of the outbreak<br />
was contaminated with more than one strain<br />
while the other four outbreaks were caused by<br />
a single strain. In addition, NGS analysis ruled<br />
in isolates that were initially not considered to<br />
be linked with the outbreak, which increased<br />
the total outbreak size by 107%. The mutation<br />
process was modelled using known mutation<br />
rates to derive a cut-off value for the number<br />
of SNP difference to rule-in or rule-out a case<br />
being part of an outbreak. For an outbreak with<br />
less than one month ex vivo/in vivo evolution<br />
time, the maximum number of SNP differences<br />
between isolates is two and four SNPs<br />
using the slowest and the fastest mutation rates<br />
respectively. NGS of S. Typhimurium significantly<br />
increases the resolution of investigations<br />
of community outbreaks. It can also inform<br />
more targeted public health response by providing<br />
important supplementary evidence to<br />
rule-in and rule-out cases of disease associated<br />
with foodborne outbreaks of S. Typhimurium.<br />
n 67<br />
DEFINING CORE GENOME OF SALMONELLA<br />
ENTERICA SEROVAR TYPHIMURIUM<br />
FOR GENOMIC SURVEILLANCE AND<br />
EPIDEMIOLOGICAL TYPING<br />
S. Fu 1 , S. Octavia 1 , M. Tanaka 1 , V. Sintchenko 2 ,<br />
R. Lan 1 ;<br />
1<br />
University of New South Wales, Sydney,<br />
AUSTRALIA, 2 Westmead Hospital, Sydney,<br />
AUSTRALIA.<br />
Salmonella enterica serovar Typhimurium is<br />
the most common Salmonella serovar causing<br />
food borne infections in Australia and many<br />
other countries. Twenty one S. Typhimurium<br />
strains from Salmonella reference collection A<br />
(SARA) were analyzed using Illumina highthroughput<br />
genome sequencing. SNPs in 21<br />
SARA strains range from 46 SNPs to 11,916<br />
SNPs with an average of 1,577 SNPs per<br />
strain. Together with 47 selected from publicly<br />
available S. Typhimurium genomes, the S.<br />
Typhimurium core genes (STCG) was determined.<br />
The STCG consists of 3,846 genes,<br />
which is much larger than the set of 2,882<br />
Salmonella core genes (SCG) found previously.<br />
The STCG together with 1,576 core<br />
intergenic regions (IGRs) was defined as the S.<br />
Typhimurium core genome. Using 93 S. Typhimurium<br />
genomes from 13 epidemiologically<br />
confirmed community outbreaks, we demonstrated<br />
that typing based on S. Typhimurium<br />
core genome (STCG+ core IGRs) provides<br />
superior resolution and higher discriminatory<br />
power than that based on SCG for outbreak<br />
investigation and molecular epidemiology<br />
of S. Typhimurium. Both STCG and STCG+<br />
core IGRs typing achieved 100% separation<br />
of all outbreaks in comprison to SCG typing<br />
which failed to separate isolates from two<br />
outbreaks from background isolates. Defining<br />
the S. Typhimurium core genome allows<br />
standardization of genes/regions to be used for<br />
high resolution epidemiological typing of S.<br />
Typhimurium for genomic surveillance.<br />
84<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
n 68<br />
COMPARATIVE PHYLOGENOMIC ANALYSIS<br />
AND SEQUENCING OF M. TUBERCULOSIS<br />
DAYCARE OUTBREAK STRAINS IN CORK,<br />
IRELAND<br />
O. O. Ojo 1 , M. B. Prentice 2 ;<br />
1<br />
Southern University at New Orleans, New<br />
Orleans, LA, 2 University College Cork, Cork,<br />
IRELAND.<br />
Background and Hypothesis: In 2006, Cork-<br />
Kerry Health Region of Ireland had a tuberculosis<br />
(TB) notification rate of 16.3/100,000<br />
population, the highest nationwide. From 1999<br />
to 2006, TB rate in children
Poster <strong>Abstracts</strong><br />
fore, vast amounts of sequence data must be<br />
generated and analyzed to identify rare pathogen<br />
sequences. SPIDR-WEB is a sample-toresult<br />
process that relies on efficient laboratory<br />
and in silico steps. Clinical samples mostly<br />
comprise non-informative host RNAs or abundant<br />
housekeeping gene transcripts. SPIDR-<br />
WEB incorporates removal of non-informative<br />
RNAs (RNR), thereby enriching all other<br />
RNAs, including those from pathogens. This<br />
step enables either higher sensitivity and specificity,<br />
or less expensive and faster sequencing.<br />
Our custom EDGE bioinformatics data analysis<br />
platform provides rapid read classification<br />
at all taxonomic levels, and reliably detects<br />
all organisms present in a sample. EDGE is<br />
an efficient process, as it uses databases with<br />
pre-computed signatures, instead of aligning<br />
sequencing reads to the entire Genbank. In<br />
addition to RNR and EDGE, SPIDR-WEB<br />
includes robust, inexpensive and rapid sample<br />
lysis, RNA extraction, and library preparation<br />
steps.<br />
n 70<br />
EPIGENOMIC CHARACTERIZATION OF<br />
NEISSERIA GONORRHOEAE ISOGENIC<br />
MUTANTS AND CLINICAL ISOLATES TO<br />
EXAMINE THE ROLE OF DNA METHYLATION<br />
IN ANTIMICROBIAL RESISTANCE<br />
D. Trees, A. Abrams;<br />
CDC, Atlanta, GA.<br />
The emergence of multidrug-resistant Neisseria<br />
gonorrhoeae has hampered the control and<br />
prevention of gonorrhea in the United States<br />
and globally. Historically, most antimicrobial<br />
resistance in N. gonorrhoeae has resulted from<br />
the accumulation of mutations in a variety of<br />
chromosomal genes. The presence of these<br />
mutations results in levels of resistance to antibiotics<br />
that reduce the likelihood of successful<br />
therapy, and it has resulted in the elimination<br />
of some antibiotics as therapeutic agents. With<br />
the appearance of the mosaic form of penA,<br />
ceftriaxone MIC values increased 4-10 fold<br />
above those previously noted. An apparent<br />
consequence of the mosaic form of penA is the<br />
occurrence of treatment failures with various<br />
cephalosporins, and strains that contain the<br />
mosaic form are able to mutate to still higher<br />
levels of resistance to cefixime, cefpodoxime,<br />
and ceftriaxone. To increase the number of<br />
penA mutants available for analysis we used<br />
an approach, replicative mutagenesis, which<br />
allowed us to isolate large numbers of mutants<br />
in the penA gene. We used this approach to<br />
isolate a set of nine mutants in gonococcal<br />
strain 3502 (ceftriaxone MIC 0.06 µg/mL)<br />
which contains a mosaic-type penA gene. The<br />
MIC values to ceftriaxone for the 3502APMx<br />
mutants are >1.0 µg/mL. These 3502APMx<br />
mutants can be manipulated to increase their<br />
ceftriaxone MIC values to 6.0-8.0 µg/mL<br />
(3502APMx-x strains). The effects of mutations<br />
in the mosaic penA can be enhanced by<br />
second-site mutations to make gonococcal infections<br />
essentially untreatable. Whole genome<br />
analyses of the isogenic mutants did not identify<br />
significant novel genomic mutations that<br />
were shared among 3502APMx or 3502AP-<br />
Mx-x mutants. Therefore, genomic mutations<br />
alone did not fully explain the MIC patterns<br />
observed. To further elucidate the source<br />
of these increased MICs we looked at the<br />
methylation patterns of 3502 and the isogenic<br />
mutants. Initial results from the PacBio base<br />
modification detection analyses demonstrated<br />
that the 3502 reference, the nine AMP mutants,<br />
and a clinical control sample contained several<br />
shared m6A motifs and one m4C motif. It was<br />
also observed that as the ceftriaxone MICs increased,<br />
the number of modified motifs detected<br />
expanded, including some novel motifs that<br />
were classified as “unknown” by the software<br />
program. Methylated sites found in mutant<br />
isolates were associated with genes involved in<br />
transcription, translation, putative restrictionmodification<br />
systems, membranes, piliation,<br />
and phase variation. These results suggest that<br />
methylation might play a role in gonococcal<br />
antimicrobial resistance. Future study will be<br />
aimed to characterize the methylation patterns<br />
of additional clinical and laboratory reference<br />
86<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
isolates with varying degrees of antimicrobial<br />
resistance. These results will help to shed light<br />
on the role of epigenetics in antimicrobial<br />
resistance.<br />
n 71<br />
DEVELOPMENT OF A BIONUMERICS<br />
DATABASE AND EVALUATION OF AVERAGE<br />
NUCLEOTIDE IDENTITY USING MUMMER<br />
(ANI-M), RIBOSOMAL MULTI-LOCUS<br />
SEQUENCE TYPING (RMLST), AND RPOB<br />
GENE PHYLOGENY FOR IDENTIFICATION OF<br />
ENTERIC BACTERIA BY WHOLE GENOME<br />
SEQUENCE ANALYSIS<br />
G. Williams 1 , J. Pruckler 1 , R. L. Lindsey 1 , L.<br />
Gladney 1 , A. Huang 1 , L. S. Katz 1 , L. Garcia-<br />
Toledo 1 , S. Im 1 , K. Roache 1 , M. Turnsek 1 ,<br />
Z. Kucerova 1 , D. Stripling 1 , H. Martin 1 , B.<br />
Dinsmore 1 , S. van Duyne 1 , H. Carleton 1 , H.<br />
Pouseele 2 , N. Strockbine 1 , C. Tarr 1 , P. Fields 1 ,<br />
P. Gerner-Smidt 1 , C. Fitzgerald 1 ;<br />
1<br />
Centers for Disease Control and Prevention,<br />
Atlanta, GA, 2 Applied Maths, Sint-Martens-<br />
Latem, BELGIUM.<br />
Background: Conventional phenotypic and<br />
genotypic methods employed for identification<br />
of enteric bacteria, including Campylobacter,<br />
Escherichia, Shigella, Salmonella and Listeria,<br />
are labor-intensive, expensive, and require<br />
multiple workflows. We have begun development<br />
of an Enteric Identification Whole<br />
Genome Sequence (WGS) database. The<br />
PulseNet infrastructure (BioNumerics v 7.5) is<br />
being used to build the database, with the goal<br />
of identifying these enteric bacteria in a single<br />
workflow using WGS. Materials and Methods:<br />
Three different methods are being evaluated<br />
for inclusion in the BioNumerics Enteric<br />
Identification database: 1) Average Nucleotide<br />
Identity using MUMmer (ANI-m), which<br />
describes a pairwise distance between two<br />
genomes, 2) Ribosomal Multi-Locus Sequence<br />
Typing (rMLST), which is a presence/absence<br />
binary phylogeny for ribosomal genes, and 3)<br />
rpoB gene phylogeny, which describes the sequencing<br />
of rpoB and comparing it in a larger<br />
phylogeny. A set of genome assemblies - 157<br />
Campylobacter, 126 Escherichia, 23 Shigella,<br />
and 73 Listeria genomes, representing 23, 5,<br />
4, and 15 species for each genus, respectively<br />
- generated at CDC, provided by external partners,<br />
or publicly available through NCBI were<br />
selected to evaluate the methods. Results:<br />
ANI-m showed identities of ≥95% for members<br />
within a species for the six most common<br />
clinically-relevant Campylobacter species and<br />
≤92% for inter-species comparisons; ≥95%<br />
for members within a species for Escherichia<br />
and Shigella, and ≤90% for inter-species<br />
comparisons; ≥95% for members within a species<br />
for Listeria and ≤91% for inter-species<br />
comparisons. Due to the diversity of its four<br />
lineages, L. monocytogenes had slightly lower<br />
intra-species identity values (≥92%) and interlineage<br />
identity values (≥92% and ≤95%).<br />
Intra-lineage identity values for L. monocytogenes<br />
were consistent with ANI values of other<br />
Listeria species (≥95%). Total allele assignments<br />
for the 53 rMLST loci ranged from 14<br />
to 53 across the validation set, with fewer loci<br />
called for species rarely received at CDC. An<br />
rMLST phylogeny appropriately clustered all<br />
genomes in this evaluation to the species level<br />
when two or more genomes were represented<br />
for a species. Where the full-length rpoB gene<br />
was annotated, phylogenies appropriately clustered<br />
each species, with intra-species similarity<br />
≥91% and ≥95% for subspecies. Conclusions:<br />
This Enteric WGS Identification BioNumerics<br />
database will provide a single, unified,<br />
cost-effective approach for accurate species<br />
identification. Through continued collaboration<br />
with domestic and international partners,<br />
we will continue to test and refine the database<br />
and CLIA validate the reference identification<br />
methods within the next year.<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
87
Poster <strong>Abstracts</strong><br />
n 72<br />
TRANSFORMING PUBLIC HEALTH<br />
MICROBIOLOGY FOR CAMPYLOBACTER<br />
WITH WHOLE GENOME SEQUENCING:<br />
PULSENET AND BEYOND<br />
J. Pruckler 1 , D. Wagner 1 , G. Williams 1 , H.<br />
Carleton 1 , C. Bennett 1 , L. Joseph 1 , E. Trees 1 ,<br />
A. Huang 1 , L. S. Katz 1 , L. Gladney 1 , M. C.<br />
Maiden 2 , W. Miller 3 , Y. Chen 4 , S. Zhao 4 , P.<br />
McDermott 4 , J. Whichard 1 , H. Pouseele 5 , E. M.<br />
Ribot 1 , C. Fitzgerald 1 , P. Gerner-Smidt 1 ;<br />
1<br />
Centers for Disease Control and Prevention,<br />
Atlanta, GA, 2 University of Oxford, Oxford,<br />
UNITED KINGDOM, 3 United States Department<br />
of Agriculture, Albany, CA, 4 United States<br />
Food and Drug Administration, Laurel, MD,<br />
5<br />
Applied Maths, Inc., Sint-Martens-Latem,<br />
BELGIUM.<br />
Background: Conventional phenotypic and<br />
genotypic methods employed for identification<br />
and subtyping of Campylobacter are labor<br />
intensive, expensive, and imprecise. We have<br />
begun development of Enteric Reference Identification<br />
and Campylobacter subtype characterization<br />
whole genome sequence (WGS)<br />
databases. The PulseNet infrastructure (BioNumerics)<br />
will be used in conjunction with the<br />
existing BIGSdb platform to build the databases,<br />
with the goal of characterizing Campylobacter<br />
in a single workflow using WGS.<br />
Methods: Reference genomes (n=103) provided<br />
by FDA and USDA and strains (n=100)<br />
sequenced at CDC were used to develop the<br />
database. Assemblies and annotations were<br />
performed using the Computational Genomics<br />
Pipeline v0.4. These genomes cover the known<br />
members of the species and genera within<br />
Campylobacteraceae and the known genetic<br />
diversity of C. jejuni. The data will be used<br />
to set criteria for the current PubMLST.org/<br />
campylobacter locus definitions. Multiple subschema<br />
are being set up within the databases to<br />
perform identification and scalable, hierarchical<br />
subtyping that will include seven locus,<br />
ribosomal, core genome and whole genome<br />
(MLST, rMLST, cgMLST and wgMLST).<br />
Results: To date 203 reference genomes<br />
have been sequenced, annotated and used for<br />
development of the BioNumerics databases<br />
including an additional 600 isolates that are<br />
being used to validate the prototype databases.<br />
Conclusions: These WGS BioNumerics<br />
databases will provide a single, unified, costeffective<br />
approach for accurate species identification<br />
and subtyping to aid the surveillance of<br />
sporadic and outbreak related Campylobacter<br />
infections. Through continued collaboration<br />
with domestic and international partners, we<br />
will test and refine the nomenclature, databases<br />
and CLIA validate the reference identification<br />
subschema within the next year.<br />
n 73<br />
METAGENOMIC PATHOGEN DETECTION AND<br />
GUT MICROBIOME RESPONSE TO ACUTE<br />
SALMONELLA INFECTION<br />
A. D. Huang 1 , M. R. Weigand 1 , A. Pena-Gonzalez<br />
2 , K. T. Konstantinidis 2 , C. L. Tarr 1 ;<br />
1<br />
Centers for Disease Control and Prevention,<br />
Atlanta, GA, 2 Georgia Institute of Technology,<br />
Atlanta, GA.<br />
Background: Current diagnostic testing<br />
for bacterial foodborne pathogens relies on<br />
culture-based techniques even though many<br />
microorganisms, including known pathogens,<br />
cannot be cultured. Powerful sequence-based<br />
approaches such as metagenomics have potential<br />
to derive epidemiologically-relevant<br />
information directly from complex samples,<br />
bypassing the need to isolate individual organisms.<br />
However, such methods have not been<br />
systematically applied to foodborne pathogen<br />
detection because standardized bioinformatics<br />
techniques for analysis have not been established.<br />
Methods: We applied shotgun metagenomics<br />
to anonymized residual stool samples<br />
collected from foodborne outbreaks attributed<br />
88<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
to Salmonella to evaluate metagenomics as<br />
a diagnostic and disease surveillance tool, as<br />
well as to gain insight into the gut microbial<br />
community responses to foodborne bacterial<br />
infection. These outbreaks were geographically<br />
isolated and the etiologic agents were<br />
identified by culture methods as distinct strains<br />
of Salmonella enterica serovar Heidelberg.<br />
We performed shotgun sequencing on these<br />
samples using the Illumina MiSeq platform.<br />
Community and taxonomic analysis were<br />
performed using Parallel-META, Metaphlan,<br />
and GOTTCHA. Subspecies analysis was<br />
performed using BLAST recruitment analysis.<br />
Further phylogenetic analysis was performed<br />
on metagenomic assemblies of samples and<br />
resulting contigs matching S. enterica. Results:<br />
Sample consistency and human DNA<br />
sequence abundance varied greatly, often<br />
reducing the sequencing depth of the targeted<br />
microbial communities, yet referenced-based<br />
detection of Salmonella serovar Heidelberg<br />
was possible by metagenomic read recruitment<br />
as well as metagenomic assembly, even in<br />
samples with high human DNA content (90-<br />
96%). Taxonomic profiling revealed similar<br />
microbial community structures between individual<br />
patients from each localized outbreak;<br />
samples from different outbreaks clustered<br />
separately and were distinct from a subset of<br />
‘healthy’ references selected from the Human<br />
Microbiome Project. Microbial gut communities<br />
consistently showed reduced species<br />
diversity in each foodborne outbreak compared<br />
to ‘healthy’ references. Conclusions: These<br />
results highlight the potential utility of metagenomic-based<br />
diagnostic tools for foodborne<br />
pathogen identification and epidemiologically<br />
relevant clustering, even in samples with high<br />
human DNA abundance. Furthermore, shotgun<br />
metagenomic approaches offer additional insight<br />
into gut microbial community responses<br />
to foodborne illness that may hold clues to<br />
pathogen ecology.<br />
n 74<br />
INTEGRATING WHOLE GENOME<br />
SEQUENCING OF SALMONELLA ENTERICA<br />
SEROVAR ENTERITIDIS INTO THE PUBLIC<br />
HEALTH LABORATORY FOR SURVEILLANCE<br />
AND OUTBREAK INVESTIGATIONS<br />
K. J. Levinson 1 , M. Dickinson 2 , S. Wirth 2 , M.<br />
Anand 3 , D. J. Baker 2 , D. Bopp 2 , L. Thompson 2 ,<br />
K. A. Musser 2 , P. Lapierre 2 , W. J. Wolfgang 2 ;<br />
1<br />
School of Public Health, SUNY Albany,<br />
Albany, NY, 2 Wadsworth Center/NYSDOH,<br />
Albany, NY, 3 Bureau of Communicable Disease<br />
Control/NYSDOH, Albany, NY.<br />
Salmonella enterica serovar Enteritidis is<br />
a leading cause of foodborne illness in the<br />
United States. Pulsed-field gel electrophoresis<br />
(PFGE) is the gold standard for outbreak detection<br />
of enteric pathogens. However, the low<br />
genetic diversity of S. Enteritidis and frequent<br />
exchanges of mobile genetic elements limits<br />
how well PFGE can discriminate between<br />
isolates and identify clusters that may be<br />
epidemiologically linked. Two-thirds of all S.<br />
Enteritidis isolates received at the Wadsworth<br />
Center have PFGE patterns that are considered<br />
“endemic” and over half of these are pattern<br />
JEGX01.0004. Consequently, these cases<br />
are not routinely investigated by epidemiologists.<br />
To improve discrimination between<br />
sporadic and outbreak associated isolates, the<br />
Wadsworth Center began performing whole<br />
genome sequencing (WGS) single nucleotide<br />
polymorphism (SNP) based phylogenetic typing<br />
on all S. Enteritidis isolates in addition to<br />
PFGE typing. The goal of this project was to<br />
explore the utility of incorporating WGS-based<br />
typing into routine public health laboratory<br />
surveillance and to establish a standard for reporting<br />
WGS data in a manner that was useful<br />
for both laboratorians and epidemiologists. Using<br />
a pipeline developed in-house, we created<br />
cumulative SNP based phylogenetic trees from<br />
514 S. Enteritidis isolates in real time over a<br />
period of 20 months. Based on retrospective<br />
studies, we used a SNP diversity of 0-5 to de-<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
89
Poster <strong>Abstracts</strong><br />
fine a genomic cluster. Within the 80 genomic<br />
clusters identified, we found that 20% contained<br />
multiple PFGE patterns and conversely,<br />
we found 35 genomic clusters within PFGE<br />
pattern JEGX01.0004. We then analyzed these<br />
clusters in a time-dependent manner and present<br />
an intuitive non-tree based plot for rapid<br />
identification of “clusters of interest” (defined<br />
as clusters that contained at least 4 isolates<br />
collected within 60 days of each other). Importantly,<br />
these parameters can be modified in<br />
real time based on epidemiological feedback.<br />
By sequencing all S. Enteritidis isolates concurrently<br />
with PFGE typing, we show it is<br />
possible to cluster isolates both genomically<br />
and epidemiologically in a manner that is more<br />
discriminatory than PFGE typing. We also<br />
demonstrate the utility and feasibility of this<br />
method in the public health laboratory setting.<br />
As we anticipate that the increase of clusters<br />
detected will pose a challenge in conducting<br />
epidemiological follow-up, we are now focused<br />
on modifying the methods to better prioritize<br />
these clusters to create a more valuable<br />
tool for epidemiologic investigations.<br />
n 75<br />
DETECTION OF ANTIMICROBIAL<br />
RESISTANCE MARKERS IN NEISSERIA<br />
GONORRHOEAE AND MIXED GONOCOCCAL<br />
INFECTION DIRECTLY FROM CLINICAL<br />
SAMPLES USING NEXT GENERATION<br />
SEQUENCING<br />
R. Graham, A. Jennison;<br />
Queensland Department of Health, Coopers<br />
Plains, AUSTRALIA.<br />
Introduction: The rise in Neisseria gonorrhoeae<br />
strains with reduced susceptibility to<br />
antibiotics is a major public health concern,<br />
and the ongoing surveillance of antimicrobial<br />
resistance (AMR) in community strains of<br />
N. gonorrhoeae is essential. In many regions<br />
however, the majority of N. gonorrhoeae<br />
cases are now diagnosed by molecular assays<br />
only, meaning that no isolate is obtained for<br />
antimicrobial susceptibility testing. A number<br />
of molecular markers have been identified<br />
for reduced susceptibility to antimicrobials,<br />
however, screening for these markers requires<br />
multiple tests, and these may miss novel mutations.<br />
By sequencing the entire genome of N.<br />
gonorrhoeae directly from a clinical sample<br />
using next generation sequencing (NGS), both<br />
known and novel mutations associated with<br />
AMR can be identified. In addition, typing information<br />
such as NG-MAST and MLST types<br />
can be collected, and potential mixed gonoccocal<br />
infections can be detected. Methods: DNA<br />
was extracted from eleven urine Cobas PCR<br />
media specimens positive for N. gonorrhoeae<br />
by the Roche Cobas 4800 CT/NG test. DNA<br />
was enriched for microbial DNA using the<br />
NEBNext microbiome kit and sequenced using<br />
the Ion Torrent PGM workflow. Sequences<br />
that aligned to the human genome were filtered<br />
out and the remaining sequences were de novo<br />
assembled into contigs and searched for the<br />
regions of interest using Ridom SeqSphere.<br />
MLST and NG-MAST alleles were assigned<br />
according to the schemes at PubMLST.net<br />
and NG-MAST.net respectively. Results: All<br />
eleven of the clinical samples tested generated<br />
a sufficient number of N. gonorrhoeae<br />
sequence reads to provide full coverage of the<br />
genome at a depth of 6-130x. Complete MLST<br />
and NG-MAST profiles could be generated<br />
for each of the samples. None of the samples<br />
had more than one sequence type detected<br />
in the one sample, which would have been<br />
indicative of a same site mixed gonococcal<br />
infection. However, when samples of two different<br />
sequence types were artificially created<br />
in vitro, the two distinct allele sequences could<br />
be detected, suggesting that naturally occurring<br />
mixed infections would be identified. The presence<br />
of ten different AMR markers was investigated,<br />
and mutations associated with reduced<br />
susceptibility to cephalosporins, quinolones<br />
and tetracycline were identified. Conclusions:<br />
We found that multiple levels of N. gonorrhoeae<br />
typing information could be generated<br />
directly from clinical samples using NGS. This<br />
study also demonstrated proof of principle<br />
for utilising NGS to identify same site mixed<br />
90<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
gonococcal infections in a culture independent<br />
manner. Ten AMR markers were examined,<br />
here and with the complete genome information<br />
provided by NGS there is the potential for<br />
many more to be investigated and for novel<br />
markers to be identified, highlighting the potential<br />
of this technology to enable continued<br />
AMR surveillance in areas where culture is no<br />
longer performed.<br />
n 76<br />
SUBTYPING OF E. COLI STEC BY<br />
TRADITIONAL LABORATORY METHODS AND<br />
WGS, A COMPARISON!<br />
E. Litrup, K. Kiil;<br />
Statens Serum Institut, Copenhagen, DEN-<br />
MARK.<br />
The traditional subtyping of E. coli STEC<br />
strains consists of many different and somewhat<br />
laborious and slow methods. In Denmark,<br />
we began whole genome sequencing of all E.<br />
coli STEC strains isolated from humans in<br />
January 2015. From January through June,<br />
we received and typed more than 70 STEC<br />
strains by both the traditional methods and<br />
WGS. The methods performed in our laboratory<br />
are serotyping, PCR assays and dot blot<br />
hybridization among others. Whole Genome<br />
Sequencing was performed in-house on an<br />
Illumina MiSeq with the Nextera Library<br />
Preparation kit and 250bp paired reads. For the<br />
comparison of WGS and traditional subtyping,<br />
we focused on the serotype, the stx1 and stx2<br />
subtypes and the presence of several genes<br />
e.g. the eae and ehxA gene. For detection of<br />
the serotype and virulence genes, we used the<br />
CGE finders (Serotype Finder and Virulence<br />
Finder https://cge.cbs.dtu.dk/services/), but<br />
we also used reference based mapping to the<br />
databases behind the finders with srst2 (https://<br />
github.com/katholt/srst2). The serotype was<br />
the subtyping method with the most divergent<br />
results as expected. The Serotype Finder was<br />
able to detect all but one serovar detected by<br />
the traditional serotyping. Additionally, almost<br />
all the strains typed as rough or smooth<br />
in the laboratory, were assigned an O-type<br />
by the Serotype Finder tool. Further, all the<br />
non-motile strains with no phenotypic H-type,<br />
were assigned an H-type by the Serotype<br />
Finder tool. Finally, there were some cross<br />
reactions between different sera in the laboratory,<br />
where the Serotype Finder was also able<br />
to give a result regarding which H type was<br />
more similar to the one sequenced. Regarding<br />
the detection of the stx1 and stx2 variants, eae<br />
and ehxA genes, we found an almost 100%<br />
correlation to the laboratory results achieved<br />
by PCR and dot blot hybridization. Only in a<br />
few cases did the laboratory achieve different<br />
results. We used the Virulence Finder on<br />
contigs assembled denovo by CLC and also we<br />
used reference based mapping to the databases<br />
behind the finder using SRST2, and we saw<br />
no differences in the performance of the two<br />
approaches. It is believed that reference based<br />
mapping is important for gene detection in E.<br />
coli as it can be challenging to assemble the<br />
reads into acceptable contigs, but in the case of<br />
these four genes and their variants this was not<br />
the case. This might be due to the location of<br />
these genes; they are probably located in parts<br />
of the genome that are easy to assemble.<br />
n 77<br />
DR. JEKYLL AND MR. HYDE: THE CASE<br />
FOR MIXED NOCARDIA POPULATIONS<br />
IN CLINICAL ISOLATES AS REVEALED BY<br />
WHOLE GENOME SEQUENCE ANALYSIS<br />
A. C. Lauer, B. Lasker, J. R. McQuiston;<br />
Centers for Disease Control and Prevention,<br />
Atlanta, GA.<br />
Nocardia species are aerobic GC-rich,<br />
partially-acid fast, opportunistic, pathogenic<br />
actinomycetes. Every year, approximately 200<br />
nocardiae isolates are received by the Special<br />
Bacteriology Reference Laboratory (SBRL) at<br />
CDC, however the true incidence of nocardiosis<br />
in the U.S. remains unknown. Despite reports<br />
of resistance to co-trimoxazole, the treatment<br />
of choice for nocardiosis, the molecular<br />
mechanisms of resistance in nocardiae are not<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
91
Poster <strong>Abstracts</strong><br />
well understood. To investigate the molecular<br />
mechanisms responsible for resistance in nocardiae,<br />
a total of 144 isogenic isolates from<br />
7 N. farcinica and 5 N. nova clinical isolates,<br />
with known genetic relationships and resistance<br />
profiles, were produced in our laboratory.<br />
The genomes of these isogenic susceptible and<br />
resistant isolates were compared to detect genetic<br />
changes that directly correlated with the<br />
emergence of resistance. Surprisingly, variant<br />
detection yielded large numbers of high quality<br />
SNPs (~150,000) between isogenic isolates.<br />
Further obstacles were encountered that necessitated<br />
further investigation into the nature of<br />
the nocardial genome when only 49% to 84%<br />
of the sequence reads mapped to the available<br />
complete genomes. Therefore, genomes were<br />
also assembled de novo using SPAdes, Velvet,<br />
Mira, and CLC Genomics Workbench and all<br />
assemblies were evaluated using QUAST. No<br />
assembler, however, produced a single contig.<br />
Optical mapping data indicated that less than<br />
10% of the contigs from the best assemblies<br />
mapped to the optical map and revealed a 2<br />
MB difference between our isolates and the<br />
published reference strain for N. nova. Mapping<br />
of reads to housekeeping genes (16S,<br />
gyrB, rpoB) revealed the presence of a high<br />
level of minority alleles at specific positions.<br />
Alignment of the consensus sequences for<br />
isogenic isolates at specific genes showed that<br />
the minority alleles of one isolate were the<br />
dominant alleles in other isolates such that<br />
nucleotides appeared to alternate at specific<br />
bases. After contamination by other bacteria<br />
was ruled out, K-mer based trees show that<br />
despite the many differences between isolates,<br />
those which are isogenic group together and<br />
are phylogenetically closer to each other than<br />
to other members of the same species but from<br />
different parental lineages. The data suggest<br />
that clinical Nocardia isolates may occur naturally<br />
as mixed populations, and may explain<br />
why published nocardiae genomes often show<br />
multiple copies of housekeeping genes that<br />
are normally single-copy housekeeping genes<br />
such as rpoB and gyrB. Further investigations<br />
are needed to verify this hypothesis. A better<br />
understanding of the ecology of the organisms<br />
sequenced by SBRL is needed such that DNA<br />
extraction, genome sequencing, and genome<br />
analysis can be informative and representative<br />
of the true biology of the organisms before<br />
genomics can be solely relied upon for surveillance.<br />
n 78<br />
EVALUATING THE POTENTIAL OF USING<br />
NEXT-GENERATION SEQUENCING FOR<br />
DIRECT CLINICAL DIAGNOSTICS OF FECAL<br />
SAMPLES FROM DIARRHEA PATIENTS<br />
O. Lukjancenko 1 , K. G. Joensen 2 , F. M. Aarestrup<br />
1 ;<br />
1<br />
Technical University of Denmark, Kongens<br />
Lyngby, DENMARK, 2 Statens Serum Institut,<br />
Copenhagen, DENMARK.<br />
Diarrhea is a major global disease burden and<br />
rapid, precise identification of the causative<br />
pathogens is important to initiate treatment as<br />
well as prevent further spread and potential<br />
outbreaks. The current routine diagnostics<br />
involve a number of different procedures,<br />
and often the causative agent is not identified<br />
in time to guide clinical care. In addition,<br />
in many clinical cases the causative agent<br />
is never identified. With next-generation sequencing<br />
(NGS) becoming cheaper, it has huge<br />
potential in the routine diagnostic settings. The<br />
aim of this study was to conduct a pilot project<br />
to evaluate the potential of performing NGSbased<br />
diagnostics through direct sequencing of<br />
fecal samples. A total of 61 clinical diarrheal<br />
fecal samples, including 48 pathogen-positive<br />
and 13 pathogen-negative, were obtained<br />
from diarrhea patients as part of the routine<br />
diagnostics at the Department of Clinical Microbiology<br />
at Hvidovre University Hospital in<br />
Denmark. Ten control samples from healthy<br />
individuals were additionally included. Complete<br />
DNA content was extracted from fecal<br />
samples and sequencing was performed on the<br />
Illumina MiSeq system. Sequence data was<br />
analyzed by the MGmapper (http://cge.cbs.<br />
dtu.dk/services/MGmapper/) software to as-<br />
92<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
sess the species distribution. Sequence-based<br />
diagnostic prediction was performed for pathogenic<br />
bacteria and Giardia by evaluating the<br />
relative abundance of pathogens in the samples<br />
as well as the presence of pathogen-specific<br />
virulence genes. The results obtained from the<br />
sequence-based diagnostic were compared to<br />
the conventional findings for 51 (of the total<br />
61) diarrheal samples, 40 of which by conventional<br />
diagnostic methods were found positive<br />
for bacterial pathogens and 11 of which were<br />
found negative. The NGS-based diagnostic approach<br />
proved to enable detection of the same<br />
bacterial pathogens as the classical approach<br />
in 37 of the 40 clinically-positive samples as<br />
well predict responsible pathogens in eight of<br />
the eleven samples from clinically ill patients<br />
that had been found negative by conventional<br />
methods. Overall, the NGS-based diagnostic<br />
approach enabled pathogen-detection similar<br />
to the current routine diagnostics, and this<br />
analytical approach has the potential to be<br />
extended to be applicable for detection of<br />
other pathogens. With further expansion of<br />
the analysis and obtainment of more sequence<br />
data per sample, the method does hold promise<br />
for better diagnostic detection. At present,<br />
however, this type of metagenomic analysis is<br />
too expensive per sample, and the turnaround<br />
time too long for it to become part of routine<br />
diagnostics.<br />
n 79<br />
EXPLORATION OF WHOLE GENOME<br />
SEQUENCING OF SHIGA TOXIN PRODUCING<br />
E. COLI IN THE NEW YORK STATE PUBLIC<br />
HEALTH LABORATORY TO IDENTIFY<br />
SEROGROUP, VIRULENCE FACTORS, AND<br />
GENOMIC CLUSTERS<br />
S. E. Wirth, T. Quinlan, D. J. Baker, T. Halse,<br />
P. Lapierre, K. A. Musser, W. J. Wolfgang;<br />
Wadsworth Center, NYSDOH, Albany, NY.<br />
In the United States, it is estimated that shiga<br />
toxin producing E.coli (STEC) is responsible<br />
for at least 265,000 infections each year. STEC<br />
serogroup O157 causes about one quarter<br />
of these infections while the rest are due to<br />
non-O157 serogroups. Accurate and timely<br />
identification of STEC is crucial to patient care<br />
and epidemiological tracebacks that are carried<br />
out to identify sources. At the Wadsworth<br />
Center, some virulence factors and the more<br />
common serogroups are currently characterized<br />
for each isolate by using up to seven realtime<br />
PCR assays. Concurrently, pulsed-field<br />
gel electrophoresis (PFGE) is performed to<br />
identify the PFGE subtype to aid in epidemiological<br />
investigations. The use of multiple<br />
molecular tests for identification and typing is<br />
time-consuming and expensive. Studies have<br />
shown that Whole Genome Sequencing (WGS)<br />
can yield information regarding virulence factors,<br />
serogroup, and genomic subtyping from<br />
a single dataset potentially leading to reduced<br />
costs and improved turn around time. Furthermore,<br />
genomic subtyping can improve cluster<br />
resolution compared to the gold standard of<br />
PFGE. To begin to evaluate WGS as a “onestop-shop”<br />
for STEC identification and typing,<br />
we have sequenced and analyzed sporadic and<br />
outbreak-associated STEC isolates. WGS was<br />
performed on an Illumina MiseqTM using 2 x<br />
250 paired end chemistry. After passing quality<br />
control metrics, sequence reads were analyzed<br />
using an in-house developed bioinformatic<br />
pipeline and the Virulence Finder database.<br />
Serogroup and virulence profiles were assigned<br />
from raw reads and assembled genomes. In<br />
addition, the pipeline mapped raw reads to<br />
a single reference genome and a SNP based<br />
phylogenetic tree was produced. Our analysis<br />
of STEC WGS data shows a high degree of<br />
concordance between SNP-based phylogenetic<br />
clusters, PFGE clusters, and epidemiologically<br />
defined outbreaks. Additionally, we can reliably<br />
ascertain certain isolate serogroups and<br />
virulence factors. Our next steps are to refine<br />
the pipeline, analyze isolates prospectively in<br />
real-time, integrate the process into our clinical<br />
laboratory information system, and seek New<br />
York State clinical laboratory certification to<br />
report serogroup and virulence data.<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
93
Poster <strong>Abstracts</strong><br />
n 80<br />
NANOPORE + ION TORRENT SEQUENCING,<br />
ASSEMBLY, AND ANNOTATION OF CULTURE-<br />
FREE STREAMBED PLASMIDS REVEALS<br />
HITCHHIKING GENES FOR RESISTANCE TO<br />
MULTIPLE HUMAN CLINICAL ANTIBIOTICS<br />
S. D. Turner 1 , E. Gehr 2 , K. Libuit 2 , C. Kapsak<br />
2 , J. Herrick 2 ;<br />
1<br />
University of Virginia, Charlottesville, VA,<br />
2<br />
James Madison University, Harrisonburg, VA.<br />
Background: Transmissible plasmids affect<br />
environmental ecosystems by facilitating<br />
exchange and recombination of antibiotic resistance<br />
genes. This exchange occurs between<br />
native bacterial populations and introduced<br />
fecal pathogens selected for resistance in farm<br />
animals. As such, native bacteria in aquatic<br />
and soil habitats may act as incubators and<br />
sites for recombination of genes that are subsequently<br />
transferred to human pathogens.<br />
Methods: Transmissible plasmids were<br />
captured exogenously from stream sediment<br />
samples by releasing cells from sediment and<br />
conjugating with a rifampicin-resistant strain<br />
of E. coli. Transconjugants were selected on<br />
tetracycline-and rifampicin-amended medium.<br />
Plasmids were purified and electroporated<br />
into an electrocompetent E. coli strain and<br />
tested for decreased antibiotic susceptibility<br />
relative to the un-electroporated strain using<br />
a modified Stokes disk diffusion method. Antibiotics<br />
tested were tetracycline, gentamicin,<br />
ciprofloxacin, sulfamethoxazole/trimethoprim,<br />
imipenem, tobramycin, kanamycin, aztreonam,<br />
ticarcillin, piperacillin, tazobactam, and<br />
cefepime. Two plasmids were sequenced using<br />
both the Oxford Nanopore MinION and<br />
Ion Torrent PGM. Long-read nanopore data<br />
was assembled with the Celera assembler, the<br />
assembly was error-corrected using the more<br />
accurate short read data, and the resulting<br />
polished assemblies were annotated against<br />
UniProt, RefSeq, Pfam, and TIGRFAMs.<br />
Unassembled reads were further screened<br />
for genes encoding antibiotic resistance. Results:<br />
23 of 30 captured plasmids conferred<br />
decreased susceptibility to multiple antibiotics<br />
in addition to tetracycline. The most common<br />
phenotypes were tetR, kanR, ticR, pipR, tetR,<br />
kanR, ticR, pipR, and fepR. One plasmid conferred<br />
decreased susceptibility to seven of the<br />
tested antibiotics (tet, tob, kan, tic, tzp, pip,<br />
and fep). Nanopore sequencing and assembly<br />
of this plasmid resulted in two contigs, each<br />
approximately 15kb. Contigs were screened<br />
for antibiotic resistance against ResFinder,<br />
containing 2203 genes across 13 drug classes.<br />
The following genes were identified with<br />
>93% identity: tetC, tetG, aadA9, aadA2, and<br />
sul1 (2X) - suggesting the presence of one or<br />
more Class 1 integrons - floR, aph(3’)-Ic, strB,<br />
and blaCARB-2. A second plasmid exhibiting<br />
decreased susceptibility to tet, cip, kan, tic, and<br />
pip was also sequenced, resulting in a single<br />
~90kb contig. Annotation revealed a similar resistance<br />
profile. Conclusions: The presence of<br />
genes encoding resistance to multiple human<br />
clinical antibiotics on transmissible plasmids<br />
selected using tetracycline suggests that there<br />
may be a significant reservoir of antibiotic<br />
resistance genes in stream sediments and that<br />
these may be capable of transmission to pathogenic<br />
Enterobacteriaceae.<br />
n 81<br />
AVERAGE NUCLEOTIDE IDENTITY<br />
(ANI) ANALYSIS OF A DIVERSE SET OF<br />
ESCHERICHIA<br />
S. B. Im 1 , L. Rishishwar 2 , A. D. Huang 1 , L. S.<br />
Katz 1 , H. A. Carleton-Romer 1 , E. Trees 1 , N.<br />
Strockbine 1 , R. L. Lindsey 1 ;<br />
1<br />
Centers for Disease Control and Prevention,<br />
Atlanta, GA, 2 Georgia Institute of Technology,<br />
Atlanta, GA.<br />
The advent of high throughput sequencing<br />
technologies has provided the opportunity to<br />
explore new methods to characterize organisms,<br />
not only more efficiently, but also more<br />
discriminately than traditional phenotypic<br />
methods. Average nucleotide identity (ANI)<br />
94<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
is a value that can be used to compare the<br />
relatedness of two genomic sequences. Findings<br />
from ANI analysis have been shown to<br />
correlate strongly with those using DNA-DNA<br />
hybridization (DDH) methods; with ANI<br />
values of ≥ 94% being equivalent to ≥ 70%<br />
DDH for bacteria belonging to the same species.<br />
In previous studies with a limited number<br />
of strains, ANI was reported to successfully<br />
distinguish members of the genus Escherichia.<br />
To evaluate the limitations of ANI analysis to<br />
distinguish members of Escherichia/Shigella,<br />
we expanded our analysis to include more<br />
strains between species, as well as within and<br />
between common serogroups that are relevant<br />
to public health. One hundred seventy strains<br />
from nine different species and 20 different<br />
serogroups were tested; 13 diverse Salmonella<br />
spp. were included as an outgroup. To detect<br />
the ANI variability within a serogroup, at least<br />
seven strains were analyzed from each of the<br />
common Shiga-toxin producing E. coli serogroups:<br />
O26, O45, O103, O111, O121, O145<br />
and O157. ANI between two genomes was<br />
computed using the BLAST algorithm, where<br />
each genome in the dataset was compared in<br />
an all-against-all pairwise fashion. Each comparison<br />
yielded a two-way ANI value describing<br />
the genetic relatedness between organisms.<br />
We found ANI values of ≥ 98% between and<br />
among E. coli isolates belonging to serogroups<br />
O26, O69, O118, O146, O103, O121, O145,<br />
O45, O91 and O128; ≥ 97% between and<br />
among isolates belonging to serogroups O157,<br />
O127, and the four Shigella species; ≤ 89%<br />
between E. coli/Shigella strains and those of<br />
other Escherichia species; and ≤ 80% between<br />
isolates of the genera Escherichia and Salmonella.<br />
Pairwise ANI values showed higher<br />
percent similarity between closely related serogroups<br />
indicating a clear distinction between<br />
Escherichia species and Salmonella.<br />
n 82<br />
SEQUENCE AND STRUCTURAL VARIANCE<br />
IN RECENT BORDETELLA PERTUSSIS<br />
GENOMES<br />
M. M. Williams, M. R. Weigand, Y. Peng, K. E.<br />
Bowden, M. L. Tondella;<br />
Centers for Disease Control and Prevention,<br />
Atlanta, GA.<br />
Pertussis disease, caused by the bacterium Bordetella<br />
pertussis, has resurged in the U.S. in<br />
recent decades, despite high vaccine coverage.<br />
Genome analysis of 424 isolates was undertaken<br />
to determine causes for pertussis increase.<br />
Of these isolates, 411 were obtained from 34<br />
US states for the period 2000-2014. Three vaccine<br />
strains were also included, and the remaining<br />
10 isolates were from other countries<br />
and historic collections, isolated between 1939<br />
and 1967. Sequence variants, including insertions,<br />
deletions, small replacements, SNPs and<br />
MNPs (single and multiple nucleotide polymorphisms,<br />
respectively) were determined by<br />
mapping Illumina sequencing reads against the<br />
Tohama I vaccine reference. A subset of 171<br />
complete genomes were assembled from long<br />
read sequence data obtained using the Pacific<br />
Biosciences RS II platform. Assemblies were<br />
validated by comparison to genome optical<br />
maps and sequence accuracy was confirmed by<br />
independent analysis using DNA cluster sequencing<br />
(Illumina MiSeq or HiSeq platform).<br />
Genome structure variation was determined by<br />
whole genome alignment with progressiveMauve.<br />
A total of 4,655 variants were detected<br />
in 424 genomes, the majority of which were<br />
SNPs (89%). Twenty-nine percent of variants<br />
occurred in non-coding regions, 26% were<br />
synonymous, and 45% were non-synonymous<br />
variants. The locus displaying the most variation<br />
was prn, the gene encoding pertactin,<br />
a component of acellular pertussis vaccines<br />
that has mutated rapidly in the last five years,<br />
resulting in pertactin non-expression in the<br />
majority of current US isolates. Variants were<br />
found in 951 other open reading frames, involved<br />
in a wide variety of cellular functions.<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
95
Poster <strong>Abstracts</strong><br />
Phylogenetic analysis of SNPs clustered isolates<br />
by time. By contrast, 35 unique genomic<br />
structural patterns were observed among the<br />
171 assembled genomes of two vaccine strains<br />
and 169 isolates obtained from 27 states in<br />
2000-2014. Pattern variation was due to inversion<br />
of genome sections, many with IS481, an<br />
insertion sequence element found in 240-256<br />
copies per genome, located at the predicted<br />
boundary sites. Although B. pertussis does<br />
not demonstrate great variety at the nucleotide<br />
level, several genomic rearrangements are<br />
circulating in current strains, even within the<br />
same epidemic. Implications for improving<br />
molecular epidemiology and understanding<br />
pertussis pathogenicity will be explored.<br />
n 83<br />
NON-TRAVEL ASSOCIATED SALMONELLA<br />
TYPHIMURIUM ST313 IN THE UK<br />
P. Ashton, C. Lane, S. Nair, T. Peters, E. de<br />
Pinna, K. Grant, T. Dallman;<br />
Public Health England, London, UNITED<br />
KINGDOM.<br />
Salmonella enterica serovar Typhimurium<br />
typically causes self-limiting gastroenteritis.<br />
However, invasive non-typhoidal Salmonella<br />
disease has been recently documented in many<br />
sub-Saharan African countries, causing significant<br />
morbidity and mortality. The majority<br />
of these invasive isolates belong to one of two<br />
monomorphic lineages within a single Multi-<br />
Locus-Sequence-Type, ST313. There is also<br />
evidence that ST313 is more invasive than<br />
other closely related S. Typhiumrium in both<br />
humans (clinical reports) and chickens (in vivo<br />
experiments). Here, we present data on 16 isolates<br />
of ST313 received by the Public Health<br />
England Salmonella Reference Service. Three<br />
of sixteen isolates were associated with travel<br />
to Africa. All of these isolates caused invasive<br />
disease and they cluster within the previously<br />
described ST313 phylogenetic lineages. The<br />
majority of isolates of ST313 in England<br />
(13/16) are non-travel associated. Only 1 of<br />
these 13 isolates was associated with invasive<br />
disease and they cluster into 4 new phylogenetic<br />
lineages, significantly increasing the<br />
diversity observed within ST313. Accessory<br />
genome analysis reveals that there are genomic<br />
elements exclusively associated with the sub-<br />
Saharan African lineages including pro-phage<br />
and antibiotic resistance determinants. These<br />
results suggest that ST313 in England is epidemiologically,<br />
clinically and genomically<br />
heterogeneous, with a geographic relationship<br />
underlying this heterogeneity.<br />
n 84<br />
EFFECT OF SEQUENCING READ ERROR<br />
CORRECTION METHOD ON HQSNPS<br />
DISCOVERY AND PHYLOGENETIC TREE<br />
TOPOLOGY IN OUTBREAKS<br />
D. D. Wagner, H. Carleton, E. Trees;<br />
Centers for Disease Control and Prevention,<br />
Atlanta, GA.<br />
Benchtop next-generation sequencing (NGS)<br />
machines offered by the Illumina platform<br />
allow public health laboratories to rapidly<br />
sequence multiple bacterial strains in multistate<br />
foodborne outbreaks. High-quality single<br />
nucleotide polymorphisms (hqSNPs) provide a<br />
means for inferring phylogenetic associations<br />
among closely-related outbreak strains. Yet,<br />
NGS technologies often introduce miscalled<br />
bases or other types of errors that diminish<br />
the number of phylogenetically-informative<br />
hqSNPs. The current study shows how bioinformatics<br />
methods based upon free software<br />
packages mitigate NGS sequencing errors<br />
and increase counts of phylogeneticallyinformative<br />
hqSNPs. NGS datasets from five<br />
foodborne pathogen outbreaks were cleaned<br />
in QUAKE, BayesHammer, and FastX. In<br />
three Salmonella enterica outbreak clusters<br />
representing serovars Enteritidis, Newport,<br />
and Baildon, reads corrected/trimmed through<br />
Fastx, Quake, and BayesHammer all increased<br />
the counts of informative hqSNP positions by<br />
at least 9 positions or at most 195 positions<br />
96<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
when compared with hqSNPs inferred on<br />
uncorrected reads. In a cluster of Salmonella<br />
Paratyphi B var. (L+) tartrate+, the Bayes-<br />
Hammer-cleaned reads and FastX-trimmed<br />
reads yielded 314 and 311 informative hqSNP<br />
positions, respectively. In the same set, the<br />
uncorrected reads yielded 292 informative<br />
hqSNP positions, unexpectedly outperforming<br />
the Quake-cleaned reads with 241 hqSNP<br />
positions. In the fifth data set, a cluster of E.<br />
coli O157:H7, FastX-trimmed reads yielded<br />
66 hqSNP positions and Quake-cleaned reads<br />
yielded 65 hqSNP positions. BayesHammercleaned<br />
reads and uncorrected reads both performed<br />
worse for the O157:H7 set, with 61 and<br />
59 hqSNP postions, respectively. In the Enteritidis<br />
and Baildon sets, BayesHammer-corrected<br />
reads yielded hqSNP phylogenies with<br />
median bootstrap support values, 61% and<br />
24%, respectively, across all internal branches<br />
of each tree. By comparison, FastX-trimmed or<br />
untrimmed reads had a median bootstrap support<br />
of at least 49% for the Baildon cluster and<br />
0% bootstrap support for the Enteritidis cluster.<br />
For the Paratyphi B hqSNP phylogeny, median<br />
bootstrap values were 100% across all four<br />
read-trimming methods, but ranged from 1%<br />
support in the untrimmed tree up to 20% support<br />
in the FastX tree. Quake-corrected reads<br />
yielded hqSNP trees with the highest median<br />
bootstrap values for the Salmonella Newport<br />
cluster (median support = 51%) and the E. coli<br />
O157 cluster (median support =27%). These<br />
results indicate that BayesHammer enables<br />
discovery of the largest numbers of hqSNPs<br />
and while BayesHammer and Quake both perform<br />
well for inferring hqSNPs phylogenetic<br />
trees. Yet, as Quake may decrease the counts<br />
of phylogenetically-informative SNP positions,<br />
BayesHammer and FastX are likely the best<br />
first-pass cleaning/correction tools for hqSNPs<br />
pipelines.<br />
n 85<br />
EVALUATION OF WHOLE GENOME<br />
SEQUENCING TO CONFIRM OR REFUTE<br />
CLONALITY OF CONVENTIONAL GENOTYPE-<br />
DEFINED CLUSTERS OF MYCOBACTERIUM<br />
TUBERCULOSIS<br />
L. Cowan, J. Posey;<br />
Centers for Disease Control and Prevention,<br />
Atlanta, GA.<br />
Since 2004, the Division of Tuberculosis Elimination<br />
has conducted genotyping surveillance<br />
of Mycobacterium tuberculosis. Genotyping<br />
data is integrated with patient demographic<br />
and clinical data and routinely analyzed to<br />
identify suspected outbreaks. However, the<br />
discriminatory power of the current genotyping<br />
methods are sometimes insufficient, and some<br />
suspected outbreaks identified by genotyping<br />
include a mix of outbreak and sporadic cases<br />
or do not represent an outbreak. Genomic surveillance<br />
has the power to increase the accuracy<br />
of outbreak detection systems by analyzing<br />
the entire genome versus less than 1% using<br />
conventional methods. We conducted whole<br />
genome sequencing (WGS) for 20 (number of<br />
isolates per cluster, 10 - 100) suspected large<br />
outbreaks identified by routine genotyping.<br />
Reference-guided assemblies of Illumina sequence<br />
read sets were created using LaserGene<br />
SeqMan NGen (DNAStar). Polymorphisms<br />
between the clustered isolates were identified<br />
by comparing assembled genomes including<br />
coverage statistics read depth and distribution<br />
of base calls for reliable differences. The final<br />
set of polymorphisms were confirmed in each<br />
assembly by visually checking each position in<br />
the mapped reads of the assembly. The number<br />
of single nucleotide polymorphisms (SNPs)<br />
identified for each cluster ranged from 8 to<br />
101. WGS exhibited a higher level of resolution<br />
for each cluster as compared to conventional<br />
genotyping methods and identified cases<br />
that were not involved in recent transmission<br />
among the samples analyzed. The preliminary<br />
results indicate that WGS data could result in<br />
more focused targeting of limited public health<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
97
Poster <strong>Abstracts</strong><br />
resources leading to more effective epidemiologic<br />
field investigations and an improved<br />
ability to identify where public health intervention<br />
will have the greatest impact. This curated<br />
data set can be used to evaluate bioinformatic<br />
pipelines for variant detection using whole<br />
genome SNP or multi-locus sequence typing<br />
(wgMLST) platforms.<br />
n 86<br />
RAPID WHOLE GENOME SEQUENCING<br />
AND DE NOVO ASSEMBLY PIPELINE FOR<br />
BORDETELLA PERTUSSIS USING MULTIPLE<br />
PLATFORMS<br />
Y. Peng, M. M. Williams, M. R. Weigand, K.<br />
Bowden, M. L. Tondella;<br />
Centers for Disease Control and Prevention,<br />
Atlanta, GA.<br />
In the U.S. and many other developed countries,<br />
pertussis is currently the least well controlled<br />
vaccine‐preventable bacterial disease<br />
despite excellent vaccination coverage. Whole<br />
genome sequences of Bordetella pertussis, the<br />
causative agent of pertussis disease, will help<br />
us better understand the epidemiologic and<br />
clinical relevance of current circulating strains,<br />
develop novel diagnostic assays, and elucidate<br />
the possible reasons for the current increase<br />
in pertussis in the U.S. and around the world.<br />
However, with hundreds of insertion sequences,<br />
repeat regions, and large rearrangements<br />
in the B. pertussis genome, whole-genome<br />
sequencing and assembly is challenging. There<br />
are currently over 400 B. pertussis incomplete<br />
genome assemblies publically available,<br />
most of them are composed of over hundreds<br />
contigs assembled from short read sequencing.<br />
Our group has developed a rapid whole<br />
genome sequencing and assembly pipeline for<br />
B. pertussis by taking advantage of the latest<br />
long read PacBio RSII sequencing platform;<br />
highly accurate, high coverage and low cost<br />
Illumina sequencing platforms; and the OpGen<br />
genome optical mapping system. High quality<br />
genomic DNA was isolated with the Qiagen<br />
Puregene Yeast/Bact. Kit. Utilizing P6 PacBio<br />
chemistry and 240 minutes movie recording,<br />
one SMRT cell produced over 50 X coverage<br />
subreads with N 50<br />
as high as 13 kb, which was<br />
able to assemble into one complete genome using<br />
HGAP 3.0. After removing the end overlap<br />
to close the circular genome, the structure of<br />
de novo assemblies was further tested and confirmed<br />
by optical mapping and finally polished<br />
by mapping with high quality short Illumina<br />
reads. So far, nearly 200 genomes have been<br />
completed using this multi-platform pipeline<br />
and sequencing of more than 200 additional<br />
isolates is currently underway. Deep genome<br />
variation analysis (SNPs, rearrangements and<br />
IS-elements, etc.) will direct future work with<br />
other ‘omics’ approaches, including RNASeq<br />
and peptide profiling, to determine causes for<br />
pertussis increase.<br />
n 87<br />
DEVELOPMENT OF A SUBTYPING ASSAY<br />
FOR DIRECT DETECTION AND TARGETED<br />
SEQUENCING OF SHIGA TOXIN-PRODUCING<br />
ESCHERICHIA COLI (STEC) FROM CLINICAL<br />
SAMPLES<br />
L. M. Gladney 1 , D. Fasulo 2 , R. L. Lindsey 1 , A.<br />
Huang 1 , E. Trees 1 , N. Strockbine 1 , E. R. Ribot 1 ,<br />
H. A. Carleton 1 , J. Besser 1 ;<br />
1<br />
Centers for Disease Control, Atlanta, GA,<br />
2<br />
Pattern Genomics, LLC, Madison, CT.<br />
Shiga toxin-producing Escherichia coli<br />
(STEC) is an important foodborne pathogen<br />
that causes approximately 265,000 infections<br />
and nearly 30 outbreaks per year in the US.<br />
STEC has been traditionally diagnosed by<br />
culture and subtyped using a variety of methods<br />
to help detect outbreaks. Recently, cultureindependent<br />
diagnostic tests (CIDTs) have<br />
been introduced in many clinical labs in place<br />
of culture-based methods. These methods are<br />
attractive because they are cost-effective, can<br />
be performed at point-of-care, and do not require<br />
culture of the pathogen. As a result, the<br />
current laboratory-based surveillance system<br />
in the US, PulseNet, is threatened because it<br />
relies on pure cultures to perform subtyping by<br />
98<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
pulsed-field gel electrophoresis (PFGE) and<br />
whole genome sequencing (WGS). To address<br />
this issue, we are developing a PCR-based<br />
subtyping assay that may be used directly with<br />
complex samples such as stool. First, we identified<br />
targets that are conserved in, and unique<br />
to STEC utilizing 342 genome sequences (104<br />
STEC, 201 other E.coli and 37 other Enterobacteriaceae).<br />
Using Daydreamer TM software<br />
(Pattern Genomics), we screened for DNA<br />
signatures that are unique to STEC and not<br />
present in any other E.coli. We then filtered<br />
out any signatures that were not specific due<br />
to a blast hit to anything other than STEC on<br />
Genbank. We performed a similar analysis<br />
to find unique signatures for two additional<br />
Escherichia species that may be phenotypically<br />
similar to E.coli and present in stool. We<br />
identified five targets and designed primers<br />
for STEC (primarily the Stx1 and Stx2 genes),<br />
which captures all of the top 7 (100%) and 16<br />
(80%) of top 20 serogroups as well as ten possible<br />
targets for closely related species E. fergusonii<br />
and E. albertii. We performed in silico<br />
PCR on our 342 genomes to confirm the presence<br />
and absence of the targets in our training<br />
dataset and to access the overall successfulness<br />
of the design. Among STEC serotypes included<br />
in our training set, the coverage of our<br />
primers was 98% and 77% for the Stx1 and<br />
Stx2 targets, respectively, while the accuracy<br />
was 100%. The low coverage of the Stx2 target<br />
may be due to fragmented assemblies with<br />
shorter Stx2 sequences in the reference genomes.<br />
Three additional targets capture one or<br />
more of the genomes not accounted for by the<br />
Stx1 or Stx2 target and are 100% accurate, but<br />
do not cover all STEC. The coverage and accuracy<br />
was 100% for the species targets. Next,<br />
we plan to develop targets to subtype lineages<br />
of STEC that we identify in our dataset and<br />
evaluate heterogeneity in adjacent regions to<br />
the STEC targets which may be used to differentiate<br />
strains in a targeted amplicon sequencing<br />
approach. This PCR assay should help<br />
public health labs bridge the gap between isolate<br />
characterization and shotgun metagenomic<br />
sequencing to control foodborne disease.<br />
n 88<br />
THE APPLICATION OF WHOLE<br />
GENOME MULTI-LOCUS SEQUENCE<br />
TYPING TO CHARACTERIZE LISTERIA<br />
MONOCYTOGENES<br />
H. A. Carleton 1 , L. S. Katz 1 , S. Stroika 1 , A.<br />
Sabol 1 , K. Roache 1 , Z. Kucerova 1 , E. M. Ribot<br />
1 , P. Evans 2 , U. Dessai 3 , K. Kubota 4 , H.<br />
Pouseele 5 , J. Besser 1 , C. Tarr 1 , E. Trees 1 , P.<br />
Gerner-Smidt 1 ;<br />
1<br />
Centers for Disease Control, Atlanta, GA,<br />
2<br />
Food and Drug Administration, Center for<br />
Food Safety and Applied Nutrition, College<br />
Park, MD, 3 United States Department of Agriculture,<br />
Food Safety and Inspection Service,<br />
Washington, DC, 4 Association of Public Health<br />
Laboratories, Washington, DC, 5 Applied<br />
Maths, Sint-Martens-Latem, BELGIUM.<br />
Background: The introduction of low cost<br />
rapid next generation sequencers (NGS) is<br />
revolutionizing public health microbiology<br />
because traditional phenotypic and genotypic<br />
characterization methods now can be replaced<br />
by whole genome sequencing (WGS). As part<br />
of the Advanced Molecular Detection (AMD)<br />
initiative at CDC, we focused on Listeria<br />
monocytogenes surveillance and transformation<br />
of PulseNet’s current pulsed-field gel<br />
electrophoresis (PFGE) -based surveillance<br />
into a WGS -based infrastructure for public<br />
health laboratories. In collaboration with FDA<br />
and USDA-FSIS, we developed a L. monocytogenes<br />
whole genome multi-locus sequence<br />
typing database (wgMLST) in BioNumerics<br />
7.5 and tested the utility of this approach in<br />
surveillance. Methods: Since September 2013,<br />
WGS has been performed on over 2000 clinical,<br />
food and environmental L. monocytogenes<br />
isolates. Nextera XT DNA libraries were sequenced<br />
on the Illumina MiSeq platform. After<br />
applying raw read quality controls, sequences<br />
with >20x coverage were uploaded in real-time<br />
to the Sequence Read Archive at NCBI and<br />
further analyzed using wgMLST. The sequences<br />
were cleaned and assembled using SPAdes,<br />
then alleles were identified from raw reads and<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
99
Poster <strong>Abstracts</strong><br />
assembled genomes. To compare performance<br />
of wgMLST and high quality single nucleotide<br />
polymorphisms (hqSNP) methods, a subset<br />
of the isolates was further characterized using<br />
hqSNP (LYVE-SET: github.com/lskatz/<br />
lyve-SET) approach. Results: The wgMLST<br />
database was built using 200 well characterized<br />
annotated reference genomes, representing<br />
plasmid and chromosomal diversity of L.<br />
monocytogenes. Originally, over 5800 unique<br />
loci were identified then further validated by<br />
removing redundant loci and improving locus<br />
definitions so presently there are 4814 loci in<br />
the wgMLST scheme. These loci represent<br />
on average 88.1% of the coding sequences in<br />
the reference data set and have 0.155% redundancy<br />
(2 loci defined in same position on genome).<br />
Number of loci identified per genome<br />
ranged from 2600 to 3100. For isolates identified<br />
as part of a cluster/outbreak, there was<br />
high correlation in the clustering of isolates by<br />
wgMLST and hqSNP analysis for epidemiologically<br />
confirmed outbreaks. Conclusion:<br />
This preliminary analysis suggests that the<br />
current Listeria wgMLST database adequately<br />
captures the diversity of Listeria monocytogenes<br />
that are characterized as part of real-time<br />
surveillance. Additionally, wgMLST can be<br />
used to identify clusters of epidemiologically<br />
related isolates and the wgMLST results match<br />
closely to hqSNP analyses.<br />
n 89<br />
USE OF WHOLE GENOME SEQUENCE<br />
K-MER-BASED SNP ANALYSIS FOR<br />
CLOSTRIDIUM BOTULINUM SURVEILLANCE<br />
IN THE UNITED STATES<br />
J. L. Halpin, J. K. Dykes, B. H. Raphael, S. M.<br />
Maslanka, C. Lúquez;<br />
Centers for Disease Control and Prevention,<br />
Atlanta, GA.<br />
Clostridium botulinum produces botulinum<br />
neurotoxin, an extremely potent toxin that<br />
causes a rare but serious paralytic disease. Efficient<br />
subtyping methods aid in characterization<br />
of C. botulinum isolated during outbreaks,<br />
both in confirming or excluding a suspect<br />
source, linking patients, and distinguishing<br />
case patients from sporadic illnesses. There are<br />
seven serologically distinct BoNTs (designated<br />
serotypes A through G) defined by neutralization<br />
of toxicity by specific polyclonal antibodies.<br />
In the United States, 67% of foodborne<br />
botulism cases reported from 2001 to 2012<br />
and 99% of infant botulism cases were due to<br />
serotypes A and B. We completed whole genome<br />
sequencing on serotype A and B isolates<br />
that were sent to CDC in 2014 and 2015 for<br />
characterization or were isolated at CDC from<br />
specimens (e.g. stool, foods) sent by the state<br />
public health laboratories. After strains were<br />
sequenced with the Ion Torrent PGM platform,<br />
reads were filtered and assembled de novo<br />
(Torrent Suite v4.4.3, SPAdes plugin v4.4.0.1).<br />
Assemblies with >20x average coverage (n =<br />
57) as well as 13 reference strains from Gen-<br />
Bank were compared using the program kSNP<br />
v3.0 (k=23, core SNPs) (http://sourceforge.net/<br />
projects/ksnp/) and the resulting SNP matrix<br />
file was imported into MEGA v6.0 to create a<br />
maximum parsimony tree with bootstrap values<br />
(n=500). Isolates from the same specimen<br />
as well as isolates from different specimens in<br />
a single botulism outbreak (n = 26) grouped together<br />
in the same clade and were phylogenetically<br />
distinct from epidemiologically unrelated<br />
isolates. kSNP can be a useful tool to quickly<br />
analyze small groups of isolates for relatedness<br />
in an outbreak situation, but analysis becomes<br />
cumbersome when large numbers of isolates<br />
are involved due to the required computing<br />
power.<br />
n 90<br />
INTERNATIONAL GENOMICS<br />
COLLABORATION FOR GLOBAL HEALTH<br />
SECURITY<br />
H. Cui, T. Erkkila, P. Chain;<br />
Los Alamos National Laboratory, Los Alamos,<br />
NM.<br />
Genomic science and next generation sequencing<br />
technologies have become a highly desir-<br />
100<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
able area for international collaboration in<br />
support of strengthening global health security<br />
and many other application areas. Los Alamos<br />
National Laboratory is leveraging a long history<br />
of genomics research and technology development<br />
to assist multiple partner nations in<br />
advancing their genomics and bioinformatics<br />
capabilities, focusing on pathogen detection,<br />
characterization, and biosurveillance applications.<br />
Our current partner countries include Republic<br />
of Georgia, Kingdom of Jordan, Kenya,<br />
Yemen, Gabon, Uganda, Peru, Iraq, Egypt,<br />
Liberia, Republic of Korea, and Thailand. Collaborations<br />
with other countries and regions<br />
are also being initiated. Such partnership allows<br />
us to collaborate with host nations and<br />
provide assistance in capability development<br />
to enhance public health surveillance capability<br />
and developing infectious disease detection<br />
and characterization techniques that can be<br />
maintained and further developed by the host<br />
countries. We continue to develop and provide<br />
to the partner countries with next generation<br />
sequencing protocols and bioinformatics pipelines<br />
that enable efficient sample preparation,<br />
instrument operation, and next generation<br />
sequencing data processing and analysis. Such<br />
collaboration efforts will not only benefit the<br />
host countries and regions with state-of-the-art<br />
genomics science methods and technologies,<br />
but also build a trusted international network in<br />
addressing global emerging infectious disease<br />
challenges. Here we detail our initial scientific<br />
collaboration efforts and highlight the research<br />
outcomes.<br />
n 91<br />
MLST + STRAIN TYPING OF MULTIDRUG-<br />
RESISTANT ORGANISMS (MDROS) USING<br />
ACUITAS® WHOLE GENOME SEQUENCE<br />
ANALYSIS<br />
W. Chang, R. Kersey, A. Saeed, V. Sapiro, T.<br />
Walker;<br />
OpGen, Inc., Gaithersburg, MD.<br />
Quick and efficient strain typing of MDRO<br />
clinical isolates is crucial for the prevention<br />
of outbreaks. In this study we developed<br />
MLST+ schemas to strain type closely related<br />
clinical isolates of eight Gram-negative species<br />
with the highest priority in healthcare<br />
facilities, by using Acuitas® Whole Genome<br />
Sequence Analysis (WGS) with next generation<br />
sequencing technology. We selected eight<br />
species of microbes based on their prominence<br />
and clinical relevance: Escherichia coli, Pseudomonas<br />
aeruginosa, Klebsiella pneumonia,<br />
Acinetobacter baumanii, Enterobacter cloacae,<br />
Citrobacter freundii, Klebsiella oxytoca, and<br />
Serratia marcescens. To create a stable MLST+<br />
schema, a reference genome sequence and<br />
enough query genome sequences of that species<br />
are required. Among our eight selected<br />
species, the first four already had reference genomes<br />
suggested by Ridom. For the remaining<br />
four species, we selected reference genome<br />
sequences by following the criteria provided<br />
by Ridom: the candidate reference genome<br />
must be finished, annotated, and accessible;<br />
the reference should ideally be constructed<br />
using Sanger sequencing; the reference isolate<br />
should be available from culture collections<br />
and DNA for sequencing must be available;<br />
and the reference isolate should preferably be<br />
the type strain or another well characterized<br />
strain of the species. Since the schema we developed<br />
will be used for strain typing MDROs,<br />
we always selected the strains from among<br />
human pathogens. The query genome sequences<br />
were drawn from finished genome and<br />
scaffold sequences of each species from The<br />
National Center for Biotechnology Information<br />
(NCBI). However, there were not enough<br />
genome sequences for Enterobacter cloacae,<br />
Citrobacter freundii, Klebsiella oxytoca, and<br />
Serratia marcescens in NCBI to create stable<br />
MLST+ schema; we supplemented those query<br />
sequences with our own assembled WGS to<br />
complete the query genome set. To test these<br />
schemas, Illumina MiSeq data were generated<br />
on a total of 74 clinical isolates of these eight<br />
species, and the whole genome sequences<br />
were then assembled and analyzed. The results<br />
demonstrated that Acuitas Whole Genome<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
101
Poster <strong>Abstracts</strong><br />
Sequence Analysis MLST+ can strain type<br />
these closely related clinical isolates of each<br />
species, evaluate evolutionary relationships<br />
among the isolates, and reveal the possibility<br />
of an outbreak occurrence. In conclusion, we<br />
successfully created MLST+ schemas for eight<br />
species: Escherichia coli, Pseudomonas aeruginosa,<br />
Klebsiella pneumonia, Acinetobacter<br />
baumanii, Enterobacter cloacae, Citrobacter<br />
freundii, Klebsiella oxytoca, and Serratia<br />
marcescens. These schemas successfully strain<br />
typed closely related clinical isolates, demonstrating<br />
the utility of Acuitas Whole Genome<br />
Sequence Analysis for transmission investigations<br />
and outbreak prevention.<br />
n 92<br />
COMPREHENSIVE ANALYSIS OF ANTIBIOTIC<br />
RESISTANCE IN MULTIDRUG-RESISTANT<br />
ORGANISMS (MDROS) BY WHOLE GENOME<br />
SEQUENCING USING ACUITAS® WHOLE<br />
GENOME SEQUENCE ANALYSIS<br />
W. Chang, R. Kersey, A. Saeed, V. Sapiro, T.<br />
Walker;<br />
OpGen, Inc., Gaithersburg, MD.<br />
The timely and efficient determination of the<br />
antibiotic resistance genes in clinical isolates<br />
is crucial for the prevention of outbreaks and<br />
the treatment of patients. In this study, we developed<br />
pipelines to comprehensively analyze<br />
antibiotic resistance genes in carbapenemresistant<br />
Enterobacteriaceae (CREs) and<br />
extended spectrum beta-lactamase (ESBL)<br />
producers using Acuitas® Whole Genome<br />
Sequence Analysis (WGS) with next generation<br />
sequencing (NGS) technology. To be able<br />
to comprehensively determine the resistance<br />
genes in clinical isolates of MDROs, we built<br />
a database consisting of all beta-lactamase<br />
variants with NCBI accession numbers from<br />
Lahey Clinic (http://www.lahey.org/Studies/).<br />
All genes of beta-lactamases are manually<br />
curated for the coding sequences with start<br />
codon and stop codon if those exist. The database<br />
was tested using WGS data assembled<br />
from Illumina MiSeq data generated on eight<br />
species of clinical isolates: Escherichia coli,<br />
Pseudomonas aeruginosa, Klebsiella pneumonia,<br />
Acinetobacter baumanii, Enterobacter<br />
cloacae, Citrobacter freundii, Klebsiella<br />
oxytoca, and Serratia marcescens; all such<br />
isolates were reported to harbor CRE and<br />
ESBL antibiotic resistance genes based on<br />
results of Sanger sequencing technology and<br />
the Acuitas® Resistome Test. Using Acuitas<br />
Whole Genome Sequence Analysis, we resolved<br />
closely related gene variants across the<br />
antibiotic resistance gene families KPC, NDM,<br />
OXA, CTX-M, CMY, TEM, SHV, ACT, IMP,<br />
VIM, DHA, PER, and VEB in these clinical<br />
isolates. For example, WGS resolved single<br />
nucleotide differences between gene variants<br />
KPC-2 and KPC-3, or single nucleotide differences<br />
between NDM-1 and NDM-4. Similarly,<br />
WGS resolved closely related gene variants 2,<br />
3, 14, 15, and 79 of CTX-M. The depth of our<br />
database facilitated our discovery of antibiotic<br />
resistance genes which were not previously reported<br />
for these clinical isolates. Furthermore,<br />
our variant determination isn’t limited to the<br />
contents of the database we developed. If the<br />
homologous sequence of a resistance gene is<br />
identified but not identical to the gene variant<br />
in the database, the coding sequence will be<br />
retrieved and searched against NCBI database<br />
to find the identical gene. If the identical variant<br />
still can’t be found, the gene is reported as<br />
a new variant of that family of beta-lactamases<br />
and then dynamically added to our database.<br />
In conclusion, we have created a database<br />
consisting of all beta-lactamase genes from<br />
Lahey Clinic website. Using the database with<br />
the Acuitas Whole Genome Sequence Analysis<br />
pipeline, we can comprehensively determine<br />
antibiotic resistance in multidrug-resistant<br />
organisms (MDROs), providing tools to help<br />
the prevention of outbreaks and the treatment<br />
of patients.<br />
102<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
n 93<br />
NEXT-GENERATION SEQUENCING (NGS)<br />
TECHNOLOGY TO STUDY VIRAL-HOST<br />
INTERACTIONS<br />
M. S. Montasser;<br />
University of Kuwait, Kuwait, KUWAIT.<br />
Cucumber mosaic virus (CMV) is a pathogen<br />
causing diseases in a wide variety of economically<br />
important vegetable crops worldwide.<br />
One strain of the virus designated CMV-KU1<br />
(patent No. US 8,138,390 B2) was isolated<br />
from tomato plants grown in Kuwait. The viral<br />
genome of this new isolate was found to be<br />
associated with a benign viral satellite RNA<br />
mini-genes. This can be used as a biological<br />
control agent against plant viruses. The whole<br />
gene was sequenced using Next-Generation<br />
Sequencing (NGS) technology. The mini-gene<br />
sequence analysis revealed that the viral satellite<br />
consisted of a single-stranded RNA, ranging<br />
in size from about 350 to 400 nucleotides.<br />
Based on NGS analysis and bioassay experiments,<br />
the nucleotide sequence was found to<br />
be unrelated to the viral genomic RNAs, and<br />
they do not share any significant sequence<br />
homology, and it is not required for viral replication<br />
and spread. However, this viral satellite<br />
RNA is totally dependent on the helper<br />
viral genomic RNAs for its own replication<br />
and spread. This mini-gene was found to be<br />
highly effective against severe necrotic CMV<br />
strains. The severity of this disease was greatly<br />
reduced by the presence of this benign mini<br />
gene, and this is correlated with a reduction<br />
in virion production. This reduction caused<br />
ameliorating disease symptoms in infected<br />
tomato plants infected with severe viral strains.<br />
This will provide a clear picture on a better<br />
understanding of severe viral infections affecting<br />
vegetable crops. In addition, detecting and<br />
identification of viral satellites, mini-genes,<br />
and whole viral genome is rapid and costeffective<br />
at the whole-genome level using new<br />
sequencing technologies.<br />
n 94<br />
VIRUS SURVEILLANCE IN FOOD AND<br />
WATER USING NEXT GENERATION<br />
SEQUENCING<br />
T. AW, S. Wengert, Y. Kim, J. Rose;<br />
Michigan State University, East Lansing, MI.<br />
Background: Contaminated food and water<br />
are important transmission routes of many different<br />
viruses, associated with diseases ranging<br />
from mild gastroenteritis to severe neurological<br />
symptoms. In addition of unreported outbreaks,<br />
there is increasing evidence of unrecognized<br />
waterborne and foodborne illnesses.<br />
There is currently no established method to<br />
fully describe the broad diversity of human<br />
viral pathogens in environmental samples despite<br />
the increasing importance of environmental<br />
viral infections in global public health. The<br />
recent advent of next generation sequencing<br />
technology coupled with metagenomics offers<br />
an opportunity to identify human viral pathogens<br />
in various environmental samples without<br />
a priori knowledge of what viruses may be<br />
present. The main objective of this study was<br />
to evaluate the Illumina sequencing to simultaneously<br />
detect and characterize viral pathogens<br />
in untreated and treated wastewater for reuse<br />
as well as from the food (fresh produce) itself.<br />
Methods: Various wastewater samples were<br />
collected from full-scale wastewater treatment<br />
facilities. Viruses were concentrated from 20<br />
to 100 liters of treated effluent using ultrafiltration<br />
method. Fresh produce items were<br />
collected from a produce distribution center.<br />
Viruses were recovered using Tris-glycine buffer<br />
followed by precipitation with polyethylene<br />
glycol. Viral nucleic acid was extracted and<br />
pooled for Illumina sequencing. Bioinformatics<br />
approaches were used to analyze the viral<br />
metagenomics fingerprints. Results: DNA<br />
sequences collected from wastewaters revealed<br />
viral communities that were dominated by<br />
bacteriophages with the subset of eukaryotic<br />
viruses. Most of the viral sequences (> 70%)<br />
were uncharacterized, indicating the presence<br />
of great viral diversity to be discovered. 15<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
103
Poster <strong>Abstracts</strong><br />
different human pathogens including emerging<br />
viruses such as bocavirus, polyomaviruses,<br />
novel astroviruses (MLB), picobirnavirus,<br />
salivirus A and Aichi virus were identified.<br />
Rotaviruses (A, D, F and G) were identified<br />
in 8 batches of the lettuce sample with high<br />
protein sequence similarity to the GenBank<br />
reference genomes. This shows that the contamination<br />
occurred at some point from the<br />
field through to the distribution center prior to<br />
moving to the store and suggests a potential<br />
risk for foodborne transmission of rotavirus<br />
from lettuce. Conclusions: The application of<br />
metagenomics coupled with next generation<br />
sequencing to describe diversity of the viromes<br />
in food and wastewater systems provided a<br />
broader outlook of the viral composition of<br />
these communities. This approach has the<br />
potential to (i) provide an appropriate tool to<br />
discriminate viruses associated with different<br />
hosts (e.g. human, animal, plant and bacteria)<br />
for microbial source tracking and (ii) identify<br />
etiologic agents associated with waterborne or<br />
foodborne disease outbreak.<br />
n 95<br />
CRIMEAN CONGO HEMORRHAGIC FEVER,<br />
2013 AND 2014 SUDAN<br />
C. Kohl 1 , M. Eldegail 2 , I. Mahmoud 2 , P.<br />
Dabrowski 1 , L. Schuenadel 1 , A. Radonic 1 , A.<br />
Nitsche 1 , A. Osman 2 ;<br />
1<br />
Robert Koch Institute, Berlin, GERMANY,<br />
2<br />
National Public Health Laboratory, Khartoum,<br />
SUDAN.<br />
The German Partnership Program for Excellence<br />
in Biological and Health Security was<br />
launched in 2013 and is funded by the German<br />
Federal Foreign Office. Currently, the program<br />
funds projects in 18 countries in the fields of<br />
infectious disease surveillance, detection &<br />
diagnostics, biosafety & biosecurity, capacity<br />
building and networking. In Sudan one focus<br />
of the partnership is the detection of highly<br />
pathogenic viruses and identification of known<br />
and yet unknown etiological agents in outbreak<br />
situations. In 2014 an outbreak of hemorrhagic<br />
fever in humans was reported from different<br />
states of Sudan (South Darfur, West Kordofan,<br />
South Kordofan). The NPHL investigated the<br />
cases and forwarded 29 sera samples from<br />
patients suffering from hemorrhagic fever<br />
to the RKI. The sample-set included a panel<br />
of 10 sera collected during former hemorrhagic<br />
fever outbreaks in the same region in<br />
2013. All sera were tested with qPCR assays<br />
for Marburg virus, Ebola virus and CCHFV.<br />
Additionally all samples were subjected to<br />
metagenomic deep sequencing on an Illumina<br />
MiSeq sequencer. CCHF was identified by<br />
two independent qPCR assays in a sample<br />
from November 2013 and November 2014,<br />
respectively. Deep sequencing confirmed these<br />
results. Based on the available sequences the<br />
novel CCHFV strain ‘Sudan 2014’ shares 96%<br />
identity (na) with its closest relative CCHFV<br />
SPU 187/90 from South Africa. CCHFV is<br />
reported to be transmitted by ticks in Europe,<br />
Asia and Africa and known as etiological agent<br />
of severe hemorrhagic fever in humans and<br />
livestock. Beside insect-repellent no preventive<br />
measures are available. The pathogenicity<br />
and characteristics of this novel strain have yet<br />
to be determined by cell-culture isolation and<br />
serology. Further molecular analysis will contribute<br />
to clarify the divergence of the CCHFV<br />
strains detected in 2013 and 2014. First results<br />
will be presented.<br />
n 96<br />
TRACKING PLANT VIRUS POPULATION<br />
STRUCTURE CHANGES WITH DEEP<br />
SEQUENCING OF VIRUS DERIVED SMALL<br />
RNAS<br />
D. Kutnjak 1 , M. Rupar 1 , I. Gutierrez-Aguirre 1 ,<br />
T. Curk 2 , J. F. Kreuze 3 , M. Ravnikar 1 ;<br />
1<br />
National Institute of Biology, Ljubljana, SLO-<br />
VENIA, 2 University of Ljubljana, Faculty of<br />
Computer and Information Science, Ljubljana,<br />
SLOVENIA, 3 International Potato Center<br />
(CIP), Lima, PERU.<br />
RNA viruses exist within a host as a cloud of<br />
mutant sequences, often referred to as quasi-<br />
104<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
species. The composition of the mutant cloud<br />
within a host is an important characteristic<br />
of the virus, since it represents a reservoir of<br />
genetic variants, which can be subjected to<br />
different evolutionary processes. High mutation<br />
rate of RNA viruses and their quasispecies<br />
nature drive their quick adaptability and evolution<br />
rate. Thus, new viruses can emerge as a<br />
result of different processes, such as e.g. host<br />
shifts (1). Next generation sequencing technologies,<br />
with their unprecedented depth, enable<br />
through investigation of the viral population<br />
structure within a host. This allows us to detect<br />
relevant mutations in the viral population<br />
even before the emergence of new pathogenic<br />
viruses or viral strains (2). Before being able to<br />
use such a tool to follow the emergence of new<br />
viral variants in a viral mutant cloud, efficient<br />
sample preparation and bioinformatics pipelines<br />
are required. Deep sequencing of virus<br />
derived small interfering RNAs (vsiRNAs; the<br />
derivates of a plant RNA silencing mechanism)<br />
has been used efficiently for the reconstruction<br />
of consensus viral genome sequences from<br />
insects and plants (3). In our research, we are<br />
focused on potato virus Y, an important potato<br />
pathogen. Using this virus as a model, we first<br />
tested if the variation observed in vsiRNAs<br />
reflects the full diversity of viral populations in<br />
plants. Using ultra deep Illumina sequencing,<br />
the diversity of two coexisting Potato virus<br />
Y sequence pools present within a plant was<br />
investigated: RNA isolated from viral particles<br />
and vsiRNAs. Our analysis pipeline included<br />
state of the art bioinformatic software for low<br />
frequency variant detection (LoFreq) and recombination<br />
pattern detection (ViReMa) and<br />
other commonly used NGS analysis tools. We<br />
showed that both sequence pools reflect highly<br />
similar mutational spectrum (4). Currently<br />
we are employing deep sequencing of small<br />
RNAs to track the dynamics of virus adaptation<br />
to several potato cultivars, which differ in<br />
their response to the virus. Particularly, we are<br />
interested if the structure of virus population<br />
is changed in different potato cultivars and if<br />
there are convergent patterns of virus evolution<br />
emerging in the same cultivar. References: 1.<br />
Domingo et al. 2012. Microbiol. Mol. Biol.<br />
Rev. 76, 159-216 2. Kreuze et al. 2009. Virology<br />
388, 1-7 3. Kutnjak et al. 2015. J. Virol.<br />
89, 4761-4769 4. Stapleford et al. 2014. Cell<br />
Host Microbe 15, 706-716<br />
n 97<br />
A SURVEY OF RNA VIRUSES INFECTING<br />
SOLANACEOUS CROPS IN THE PROVINCE<br />
OF ANTIOQUIA (COLOMBIA) USING NGS<br />
H. Jaramillo Mesa 1 , L. Muñoz 1 , J. F. Alzate 2 ,<br />
M. A. Marín Montoya 1 , P. A. Gutiérrez 1 ;<br />
1<br />
Universidad Nacional de Colombia, Medellin,<br />
COLOMBIA, 2 Universidad de Antioquia, Medellin,<br />
COLOMBIA.<br />
Solanaceae is a family of flowering plants of<br />
great economic importance in the food and<br />
pharmaceutical industries. Some important<br />
members of this family include potato (Solanum<br />
tuberosum and S. phureja), eggplant (S.<br />
melongena), tomato (S. lycopersicum), bell<br />
pepper (Capsicum annuum), tobacco (Nicotiana<br />
tabacum) and many garden ornamentals<br />
such as Angel´s trumpet (Brugmansia candida).<br />
The Andean region of South America is<br />
considered to be the center of diversity of several<br />
of these species and, as such, it is expected<br />
to have a large diversity of viruses. In this<br />
work we investigate the presence of viruses<br />
infecting potato, cape gooseberry (Physalis<br />
peruviana), tamarillo (Solanum betaceum), bell<br />
pepper, tomato and Angel´s trumpet in Antioquia<br />
(Colombia) using NGS. From the analysis<br />
of 14 transcriptomes evidence was found for<br />
the presence of six genera of RNA virus infecting<br />
these plants: Potexvirus (Alphaflexiviridae),<br />
Carlavirus (Betaflexiviridae), Tospovirus<br />
(Bunyaviridae), Crinivirus (Closteroviridae),<br />
Potyvirus (Potyviridae) and Polerovirus (Luteoviridae).<br />
Potato virus X (PVX, Potexvirus)<br />
was detected at very high levels in P. peruviana<br />
(8.9% of reads) and in lower amounts in<br />
S. lycopersicum (0.74%). Potyviruses were<br />
detected in S. betaceum (Tamarillo leaf malformation,<br />
TaLMV; 0.57%), S. lycopersicum (Potato<br />
virus Y, PVY; 0.031%), S. phureja (Potato<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
105
Poster <strong>Abstracts</strong><br />
virus V, PVV; 1.15%) and S. tuberosum (PVY,<br />
1.33%). Potato yellow vein virus (PYVV)<br />
was detected in S. lycopersicum (0.033%), S.<br />
phureja (1.90%) and S. tuberosum (0.26%).<br />
Potato leaf roll virus (PLRV), was found infecting<br />
B. candida (0.01%), C. annuum (0.001<br />
%), P. peruviana (0.023%) and S. tuberosum<br />
(0.029%). A tospovirus closely related to Alstroemeria<br />
necrotic streak virus (ANSV) was<br />
found in C. annuum only (0.043%) and seems<br />
to be the most important virus affecting this<br />
crop in Antioquia. Finally, the virus with the<br />
least relative abundance was PVS, found in the<br />
transcriptomes of S. lycopersicum (0.003%)<br />
and S. phureja (0.018%). All these viruses<br />
were detected and assembled using in-house<br />
Perl scripts and their phylogenetic relationships<br />
with related species are discussed. The<br />
information derived from this study was used<br />
to design specific molecular diagnostic tests<br />
using RT-PCR and RT-qPCR as a tool that will<br />
support seed certification and virus management<br />
programs in Colombia. This work was<br />
funded by Universidad Nacional de Colombia<br />
(Grants VRI: 28616, 26737) and International<br />
Foundation for Science (Sweden, Grant:<br />
C/4634-2).<br />
n 98<br />
DECIPHERING VIRAL RNA GENOMES IN<br />
METAGENOMIC DATA FROM A WIDE RANGE<br />
OF CLINICAL SAMPLES<br />
T. Ng, A. Montmayeur, L. Magana, E. Ramos,<br />
J. Vinje, P. Rota, S. Oberste;<br />
Centers for Disease Control and Prevention,<br />
Atlanta, GA.<br />
A significant component of laboratory-based<br />
surveillance for viral diseases is genetic characterization<br />
of pathogens by sequencing. Many<br />
RNA viruses are genetically highly diverse,<br />
even within a single serotype, therefore present<br />
a challenge in transforming the raw data<br />
into useful viral genomes. Furthermore, coinfections<br />
of multiple viral pathogens often<br />
occur in clinical samples, necessitating both<br />
laboratory procedures and a bioinformatic<br />
pipeline capable of simultaneous detection of<br />
different pathogen genomes. Here we demonstrate<br />
the laboratory procedures and bioinformatic<br />
analysis needed to routinely monitor<br />
viral pathogens from a wide range of samples,<br />
as our laboratories handle over 500 samples<br />
and generated 500M NGS raw reads annually.<br />
Initially, we hypothesized that NGS reads can<br />
be mapped to a list of reference genomes using<br />
a simple reference assembly algorithm. Our<br />
results indicated reference assembly using a<br />
list of pre-selected references does not consistently<br />
generate full genomes, especially for<br />
the highly divergent regions To analyze these<br />
complex clinical samples, the NGS data were<br />
analyzed with the metagenomic approach. Our<br />
pipeline analyzed the NGS data using de novo<br />
assembly, as well as BLAST analysis comparing<br />
against the entire GenBank viral genome<br />
database. To select an appropriate assembler,<br />
we compared the output of different de novo<br />
algorithms, including de bruijn assemblers,<br />
overlap layout consensus assemblers, and<br />
combinations of different assemblers. The<br />
combination approach showed significant<br />
improvement over single assemblers such as<br />
SOAPdenovo and ABySS, while the SPAdes<br />
assembler produced similar results in less time.<br />
We found, however, the adaptor and primer<br />
need to be trimmed perfectly first for any assembler<br />
to work well. Under-trimming or<br />
over-trimming produce gaps or misalignments<br />
that will significantly affect downstream de<br />
novo assembly. We’ll discuss the issues and<br />
solutions for analyzing datasets with multiple<br />
pathogens, low sequence coverage, sequence<br />
misalignment, and cross-sample / cross-run<br />
contamination. Manual inspection of pipeline<br />
output, and communication between bioinformaticians<br />
and laboratorians are the key to obtaining<br />
accurate output across different sample<br />
types and genome types. A graphic interface<br />
to view the NGS output has allowed more<br />
scientists without any programming skills to<br />
analyze the data. Our strategies to monitor<br />
viral pathogens such as poliovirus, measles<br />
virus and norovirus are applicable to the epidemiologic<br />
investigation of other emerging<br />
106<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
pathogens. We’ll also discuss how to use NGS<br />
to investigate emerging pathogens, exemplified<br />
by our recent findings of novel astrovirus,<br />
picornavirus and rotavirus.<br />
n 99<br />
METHODS, APPLICATIONS AND<br />
EXPERIENCES WITH NEXT GENERATION<br />
WHOLE GENOME SEQUENCING IN PUBLIC<br />
HEALTH VIROLOGY<br />
D. M. Lamson 1 , J. Laplante 1 , J. McGinnis 1 , M.<br />
Shudt 1 , A. Kajon 2 , K. St. George 1 ;<br />
1<br />
Wadsworth Center, Albany, NY, 2 Lovelace Respiratory<br />
Research Institute, Albuquerque, NM.<br />
Potential uses for Next-Generation/Whole<br />
Generation Sequencing (NG/WGS) in public<br />
health virology include improved characterization<br />
for molecular epidemiology, viral evolution<br />
studies and genome-wide surveillance.<br />
The Wadsworth Center Virology Laboratory<br />
has initiated NG/WGS for several applications,<br />
initially focusing on adenoviruses (HAdV),<br />
influenza (InfA) and enterovirus D68 (EV-<br />
D68). Challenges include the separation of<br />
viral nucleic acid (NA) from high titer host<br />
NA in both primary samples and cultured<br />
isolates. For HAdV isolates cultured in A-549<br />
cells, high salt precipitation was used to preferentially<br />
remove host cell DNA prior to viral<br />
NA extraction. For InfA and EV-D68, primary<br />
samples or isolates were extracted, then<br />
RT-PCR amplified for either all 8 genomic<br />
fragments (InfA) or near whole genome (EV-<br />
D68). NA concentrations in extracted samples<br />
were determined using the standard Nextera<br />
protocol. Libraries were quantified, average<br />
fragment size determined and paired-end sequencing<br />
was performed on an Illumina MiSeq<br />
with MiSeq 500 cycle v2 kit. Sequences were<br />
de novo-assembled with SPAdes and remapped<br />
using Geneious Pro. Over 100 HAdV samples<br />
from each of the 6 species have been processed<br />
as well as 12 adeno-associated virus samples<br />
that co-purified with HAdV. Applications have<br />
included the investigation of multiple epidemic<br />
keratoconjuctivitis outbreaks, analysis of 47<br />
HAdV3 samples geographically and temporally<br />
distributed over 20 years, genetic characterization<br />
of intertypic recombinants, and<br />
the investigation of HAdV4 and 14 genomic<br />
variants in college infections. The analysis<br />
of 21 EV-D68 strains revealed from 1 to 104<br />
amino acid changes over the entire coding<br />
region during the 2014 outbreak, with greatest<br />
variability observed in the capsid gene.<br />
Fifteen influenza A/H1pdm09 specimens and<br />
44 A/H3N2 specimens from 2009-15 have<br />
been analyzed. Mutations were detected in all<br />
8 segments of both A/H1 and A/H3 viruses,<br />
with highest rates in the hemagglutinin and<br />
neuraminidase genes in both subtypes. Phylogenetic<br />
analysis showed all A/H1 specimens to<br />
cluster with the current vaccine strain, whereas<br />
recent A/H3 viruses were more similar to A/<br />
Switzerland/9715293/2013 than to the current<br />
vaccine strain. As technologies improve and<br />
processing time is reduced, NGS applications<br />
will become more routine. The potential for<br />
clarifying nomenclature, investigating viral<br />
evolution and performing more comprehensive<br />
surveillance is vast. By developing techniques<br />
for applications on primary samples and the<br />
implementation of more streamlined data analysis,<br />
better diagnostics and the resolution of<br />
more outbreak investigations will be obtained.<br />
Additionally, more extensive viral surveillance<br />
will enhance the potential for advance warning<br />
of the emergence of drug-resistant and novel<br />
strains with pandemic potential.<br />
n 100<br />
ASSEMBLING WHOLE GENOMES FROM<br />
MIXED MICROBIAL COMMUNITIES USING<br />
HI-C<br />
I. Liachko 1 , J. N. Burton 1 , L. Sycuro 2 , A. H.<br />
Wiser 2 , D. N. Fredricks 2 , M. J. Dunham 1 , J.<br />
Shendure 1 ;<br />
1<br />
University of Washington, Seattle, WA, 2 Fred<br />
Hutchinson Cancer Research Center, Seattle,<br />
WA.<br />
Assembly of whole genomes from next-generation<br />
sequencing is inhibited by the lack of<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
107
Poster <strong>Abstracts</strong><br />
contiguity information in short-read sequencing.<br />
This limitation also impedes metagenome<br />
assembly, since one cannot tell which sequences<br />
originate from the same species within<br />
a population. We have overcome these bottlenecks<br />
by adapting a chromosome conformation<br />
capture technique (Hi-C) for the deconvolution<br />
of metagenomes and the scaffolding of de novo<br />
assemblies of individual genomes. In modeling<br />
the 3D structure of a genome, chromosome<br />
conformation capture techniques such as Hi-C<br />
are used to measure long-range interactions of<br />
DNA molecules in physical space. These tools<br />
employ crosslinking of chromatin in intact<br />
cells followed by intra-molecular ligation,<br />
joining DNA fragments that were physically<br />
nearby at the time of crosslink. Subsequent<br />
deep sequencing of these DNA junctions generates<br />
a genome-wide contact probability map<br />
that allows the 3D modeling of genomic conformation<br />
within a cell. The strong enrichment<br />
in Hi-C signal between genetically neighboring<br />
loci allows the scaffolding of entire chromosomes<br />
from fragmented draft assemblies.<br />
Hi-C signal also preserves the cellular origin<br />
of each DNA fragment and its interacting partner,<br />
allowing for deconvolution and assembly<br />
of multi-chromosome genomes from a mixed<br />
population of organisms. We have used Hi-C<br />
to scaffold whole genomes of animals, plants,<br />
fungi, as well as prokaryotes and archaea. We<br />
have also been able to use this data to annotate<br />
functional features of microbial genomes, such<br />
as centromeres in many fungal species. Additionally,<br />
we have applied our technology to<br />
diverse metagenomic populations such as craft<br />
beer, bacterial vaginosis infections, soil, and<br />
tree endophyte samples to discover and assemble<br />
the genomes of novel strains of known<br />
species as well as novel prokaryotes and<br />
eukaryotes. The high quality of Hi-C-based<br />
assemblies allows the simultaneous closing of<br />
numerous unculturable genomes, placement of<br />
plasmids within host genomes, and microbial<br />
strain deconvolution in a way not possible<br />
with other methods. Reference: Burton JN*,<br />
Liachko I*, Dunham MJ, Shendure J. Specieslevel<br />
deconvolution of metagenome assemblies<br />
with Hi-C-based contact probability maps. G3.<br />
2014, May 22;4(7):1339-46.<br />
n 101<br />
AN AUTOMATED WGS PIPELINE FOR<br />
ANALYSIS OF BACTERIAL GENOMES<br />
M. Thomsen 1 , H. Hasman 2,3 , A. Petersen 3 , R.<br />
Skov 3 , O. Lund 1 , A.R. Larsen 3 and F.M. Aarestrup<br />
2 ;<br />
1<br />
Danish Technical University – Systems Biology,<br />
CBS, Lyngby, Denmark, 2 , Danish Technical<br />
University – National Food Institute,<br />
Lyngby, Denmark, 3 Statens Serum Institut,<br />
Copenhagen, Denmark.<br />
Background New approaches within diagnostics<br />
and surveillance for species identification,<br />
clonal clustering, and identification<br />
of resistance genes are based whole genome<br />
sequencing (WGS). The Centre for Genomic<br />
Epidemiology (CGE) has been developing<br />
stand-alone web-tools for handling WGS information<br />
for outbreak investigation, epidemiological<br />
surveillance and diagnostics.<br />
Material and methods Based on previously<br />
published web-based CGE tools we developed<br />
a pipeline for automatic analysis of WGS data<br />
from bacterial isolate samples https://cge.cbs.<br />
dtu.dk/services/CGEpipeline-2.0/). The bacterium<br />
analysis pipeline (BAP) automatically<br />
identifies the bacterial species and identify<br />
the multilocus sequence type (MLST), spa<br />
type (S. aureus only), serotype (E. coli only)<br />
and antimicrobial resistance genes. To test the<br />
BAP, a set of 101 S. aureus strains originating<br />
from bacterial infections at Danish hospitals<br />
submitted to Statens Serum Institute (SSI) for<br />
genotypic by traditional Sanger sequencing<br />
were subjected to WGS on an Illumina HiSeq<br />
to a minimum coverage of 30x and assembled<br />
by Velvet prior to being uploaded to the<br />
BAP. Draft genomes including a mandatory<br />
metadata spread sheet containing sequenc-<br />
108<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
ing information as well as epidemiological<br />
data was submitted to the BAP through the<br />
Batch Upload webpage (https://cge.cbs.dtu.dk/<br />
services/CGEpipeline-2.0/) and automatically<br />
analysed by kmerFinder to verify the bacterial<br />
species and by ResFinder to identify antimicrobial<br />
resistance genes. When the bacterial<br />
species was correctly identified as S. aureus,<br />
MLST and spa typing were performed automatically<br />
and stored in the Isolate Overview<br />
page to be extracted as an Excel sheet for<br />
further analysis. Results and discussion The<br />
bacterial species was correctly identified as S.<br />
aureus for all isolates. Therefore, MLST and<br />
spa typing was performed automatically and<br />
could be compared to the previous data for the<br />
strains. An MLST type was assigned for 99%<br />
of the genomes and good concordance was<br />
found between the clonal complexes inferred<br />
from spa types and WGS analysis through the<br />
pipeline. For spa typing, 92 of the genomes<br />
showed the same spa type as found by Sanger<br />
sequencing previously. Seven genomes showed<br />
a different spa type and 2 genomes failed to<br />
give a spa type in the pipeline. This was most<br />
like due to the small size reads (2*100 bp)<br />
obtained from the HiSeq sequencer, which is<br />
not optimal for assembly of highly repetitive<br />
DNA sequences such as the variable region of<br />
the spa gene. ResFinder successfully identified<br />
the mecA gene in all MRSA genomes and<br />
not in any of the MSSA genomes. Conclusion<br />
A combined bioinformatics platform<br />
was developed and made publicly available,<br />
providing easy-to-use automated analysis of<br />
bacterial whole genome sequencing data. The<br />
platform may be of immediate relevance for<br />
investigators using whole genome sequencing<br />
for clinical diagnostic and surveillance.<br />
n 102<br />
ON THE EDGE: ROBUST GENERALIZED<br />
BIOINFORMATICS FOR NEXT-GEN<br />
SEQUENCING NOVICES<br />
Chien-Chi Lo 1 , Po-E Li 1 , Joseph Anderson 2,4 ,<br />
Karen W. Davenport 1 , Kimberly A. Bishop-<br />
Lilly 3,4 , Yan Xu 1 , Sanaa Ahmed 1 , Shihai Feng 1 ,<br />
Tracey Allen K. Freitas 1 , Vishwesh P. Mokashi<br />
4 , and Patrick Chain 1<br />
1<br />
Los Alamos National Laboratory, 2 Defense<br />
Threat Reduction Agency, 3 Henry M. Jackson<br />
Foundation, 4 Naval Medical Research Center<br />
- Frederick<br />
With the continuing evolution of sequencing<br />
platforms and technologies, the so-called democratization<br />
of sequencing is in fact already<br />
in full swing. Despite the inherent challenges<br />
involved, moving sequencing technology into<br />
the field and closer to the source of diverse and<br />
interesting biological samples, is an attractive<br />
idea to many agencies. This even extends to<br />
OCONUS laboratories that are being equipped<br />
with next generation sequencing (NGS) platforms<br />
to complement more traditional molecular,<br />
cell, and microbiology methods for infectious<br />
disease research. However, many groups<br />
new to NGS and/or laboratories in remote or<br />
austere locations may be ill-equipped to handle<br />
the bioinformatic requirements associated with<br />
rapid production of massive, complex datasets<br />
from sequencing clinical or environmental<br />
samples. A collaborative effort, named EDGE<br />
bioinformatics, has begun to research, prototype,<br />
and program bioinformatic pipelines<br />
that can be deployed to new NGS laboratories<br />
in order to enable successful adoption of<br />
sequencing technologies by allowing robust<br />
processing of NGS data. These pipelines were<br />
initially developed with specific use cases and<br />
sample types in mind, although the modular<br />
design provides utility beyond these initial<br />
use cases. The pipelines are being designed<br />
to make use of the most common file formats<br />
and run in a linux environment on relatively<br />
inexpensive hardware so that barriers to adoption<br />
are minimal. A pilot program to install and<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
109
Poster <strong>Abstracts</strong><br />
utilize EDGE at an OCONUS DoD facility has<br />
already taken place, with successful processing<br />
of locally-generated data using a recently<br />
locally-installed MiSeq, and demonstrated<br />
reachback capability using CONUS support.<br />
n 103<br />
SUPERPHY: PREDICTIVE GENOMICS FOR<br />
THE PATHOGEN ESCHERICHIA COLI<br />
M. D. Whiteside 1 , C. R. Laing 1 , A. Manji 1 , J.<br />
Masih 1 , P. Kruczkiewicz 1 , E. N. Taboada 1 , V. P.<br />
J. Gannon 1<br />
1<br />
CANADA - Laboratory for Foodborne Zoonoses,<br />
Public Health Agency of Canada<br />
Introduction: Predictive genomics is the<br />
translation of raw genome sequence data into<br />
an assessment of the phenotypes exhibited by<br />
the organism. For bacterial pathogens, these<br />
phenotypes can range from environmental<br />
survivability, to the severity of human disease<br />
associated with them. Significant progress has<br />
been made in the development of generic tools<br />
for genomic analyses that are broadly applicable<br />
to all microorganisms; however, a fundamental<br />
missing component is the ability to<br />
analyze genomic data in the context of organism-specific<br />
phenotypic knowledge, which has<br />
been accumulated from decades of research<br />
and can provide a meaningful interpretation<br />
of genome sequence data. Implementation:<br />
In this study, we present SuperPhy, an online<br />
predictive genomics platform (http://lfz.corefacility.ca/superphy/)<br />
for Escherichia coli.<br />
The platform integrates the analyses tools and<br />
genome sequence data for all publicly available<br />
E. coli genomes and facilitates the upload<br />
of new genome sequences from users under<br />
public or private settings. SuperPhy provides<br />
real-time analyses of thousands of genome<br />
sequences with results that are understandable<br />
and useful to a wide community, including<br />
those in the fields of clinical medicine, epidemiology,<br />
ecology, and evolution. SuperPhy<br />
includes identification of: 1) virulence and<br />
antimicrobial resistance determinants 2) statistical<br />
associations between genotypes, biomarkers,<br />
geospatial distribution, host, source,<br />
and phylogenetic clade; 3) the identification<br />
of biomarkers for groups of genomes on the<br />
based presence / absence of specific genomic<br />
regions and single-nucleotide polymorphisms<br />
and 4) in silico Shiga-toxin subtype. Conclusions:<br />
SuperPhy is a predictive genomics platform<br />
that attempts to provide an essential link<br />
between the vast amounts of genome information<br />
currently being generated and phenotypic<br />
knowledge in an organism-specific context.<br />
n 104<br />
MOLECULAR SURVEILLANCE OF A.<br />
BAUMANNII IN A REGIONAL WASTE<br />
STABILISATION POND IN AUSTRALIA<br />
1, 2<br />
Maxim Sheludchenko, 2 Mohammad Katouli<br />
and 1 Helen Stratton<br />
1<br />
Smart Water Research Centre, Griffith University,<br />
Southport, Queensland and 2 Genecology<br />
Research Centre, University of the Sunshine<br />
Coast, Sippy Downs, Queensland, Australia<br />
A.baumannii is an increasingly important<br />
pathogen responsible for many nosocomial<br />
infections. Survival of these bacteria in the<br />
environment, especially in waste stabilization<br />
ponds (WSPs) have not been investigated before<br />
although some reports indicate isolation<br />
of these A. baumannii from filaments of active<br />
sludge of wastewaters and paleosol contaminated<br />
by waste leakage. Identification of A.<br />
baumannii from the environmental is normally<br />
done by growing samples on selective culture<br />
media but this requires the use of molecular<br />
techniques for confirmation. Furthermore,<br />
the ability of these bacteria to grow on some<br />
selective media may also cause misinterpretations<br />
of the data. In this study we report such<br />
an event and show that the use of molecular<br />
techniques especially 16S rRNA sequencing<br />
helped identification and surveillance of these<br />
bacteria in a WSP. Between October 2013 and<br />
September 2014 we surveyed the die off of a<br />
number of pathogens in the maturation pond of<br />
a WSP in a regional area of Australia. Samples<br />
cultivated on mCCDA agar containing selec-<br />
110<br />
ASM Conferences
Poster <strong>Abstracts</strong><br />
tive and recommended supplements for growth<br />
of Campylobacter, yielded a high number of<br />
colonies with characteristic of Campylobacter.<br />
However, subsequent PCR tests using speciesspecific<br />
primers proved that these bacteria<br />
were not Campylobacter. Using 16s rRNA, followed<br />
by additional confirmatory tests such as<br />
VS1, ompA, bla OXA-51-like<br />
, bla OXA-23-like<br />
and Class<br />
1 integrase gene confirmed that these colonies<br />
were in fact A. baumannii. Using this methodology<br />
an intensive sampling was carried out<br />
from the inlet and the outlet of this maturation<br />
pond and tested for A.baumannii. The results<br />
indicated the number of these bacteria in most<br />
sampling rounds did not differ significantly<br />
between the inlet and outlet of the maturation<br />
pond suggesting the survival of these bacteria<br />
in such waste treatment system. The implication<br />
of genomic technics for identification of<br />
bacteria in relation to public health studies will<br />
be discussed.<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
111
Index<br />
Bold number indicates presenter.<br />
Aanensen, D. M. S7:5<br />
Aarestrup, F. M. 41<br />
Aarestrup, F. M. 46<br />
Aarestrup, F. M. 78<br />
Aarestrup, F. M. S10:6<br />
Abrams, A. 70<br />
Achtman, M. S7:8<br />
Adam, J. 7, 8, S7:10<br />
Adams, M. D. S9:4<br />
Addy, N. 50<br />
Agarwala, R. S3:3<br />
Akange, N. 47<br />
Albayrak, L. 27<br />
Ali, F. 45<br />
Alikhan, N.-F. S7:8<br />
Allard, M. W. 39<br />
Allard, M. W. 54, S3:5<br />
Al-Shahib, A. S7:9<br />
Alyanak, E. 64<br />
Alzate, J. F. 97<br />
Anand, M. 74<br />
Ancora, M. 40<br />
Andersen, V. D. 46<br />
Anderson, M. S9:3<br />
Andrusch, A. 9<br />
Appalla, L. 13, 14, 30,<br />
S9:5<br />
Argimon, S. S7:5<br />
Ashton, P.<br />
83, S2:5,<br />
S7:9<br />
Au-Young, J. 29<br />
Avillan, J. J. 64<br />
AW, T. 94<br />
Ayers, S. L. 54<br />
Baert, L. S2:2<br />
Baker, D. J. 74<br />
Baker, D. J. 79<br />
Barrette, R. W. S2:3<br />
Barretto, C. S2:2<br />
Baugher, J. S4:6<br />
Beatson, S. A. 16<br />
Beiko, R. G. 8, S7:10<br />
Bell, R. L. 39<br />
Bennett, C. 72<br />
Ben Zakour, N. L. 16<br />
Bergmark, L. S10:6<br />
Berry, C. S7:10<br />
Besser, J. 87, 88<br />
Besser, T. 44<br />
Blais, B. W. 55<br />
Blosser, S. J. 12<br />
Boisvert, S. S4:5<br />
Bonomo, R. A. S9:4<br />
Bonten, M. 33<br />
Bopp, D. 74<br />
Bowden, K. E. 82<br />
Bowden, K. 86<br />
Bradshaw, J. 15<br />
Brettin, T. S4:5<br />
Brinkman, F. 7, S7:10<br />
Brinkman, F. S. 8<br />
Bristow, F. 7, 8, S7:10<br />
Bronstein, P. S3:3<br />
Brouwer, E. 33<br />
Brown, E. 39<br />
Brown, E. W. 54, S3:5<br />
Bulach, D. M. S7:7<br />
Burrows, E. 39<br />
Burton, J. N. 100, S4:7<br />
Busch, J. 57<br />
Cabral, J. S7:10<br />
Calistri, P. 40<br />
Cammà, C. 40<br />
Camp, P. 36<br />
Campbell, E. M. 12<br />
Carleton, H. 71, 72<br />
Carleton, H. 65, 84,<br />
S3:3, S3:6<br />
Carleton, H. A. 87, 88<br />
Carleton-Romer,<br />
H. A. 81<br />
Carrico, J. A. 7, 8<br />
Carriço, J. A. S7:10<br />
Carrillo, C. D. 55<br />
Carroll, L. 44<br />
Catalyurek, U. V. S3:5<br />
Catanzaro, A. S9:6<br />
Catanzaro, D. S9:6<br />
Chain, P. 90<br />
Chang, W. 34, 91, 92<br />
Chapman, E. L. 12<br />
Chase, H. 50<br />
Chase, K. 59<br />
Chen, Y. 72<br />
Chilton, C. S2:2<br />
Christopher-<br />
Hennings, J. 47, 52<br />
Chumakov, S. M. 27<br />
Chung, H. S9:3<br />
Chung, T. 50<br />
Clifford, R. 13, 14,<br />
S9:5<br />
Cohen, T. S9:6<br />
Colman, R. 57<br />
Colman, R. E. S9:6<br />
Comandatore, F. 32<br />
Conrad, C. 12<br />
Conrad, N. R. S4:5<br />
Cooper, A. 1<br />
Corander, J. 37<br />
Courtot, M. 7, 8<br />
Courtot, M. S7:10<br />
Cowan, L. 85<br />
Croughs, P. D. 24<br />
Crudu, V. S9:6<br />
Cui, H. 90<br />
Curk, T. 96<br />
Currie, B. 57<br />
Dabrowski, P. S10:4<br />
Dabrowski, P. 10, 21, 9,<br />
95<br />
Dallman, T.<br />
83, S2:5,<br />
S7:9<br />
Daquigan, N. 43, 48<br />
Das, S. 50, 52<br />
David, S. S3:3<br />
Davis, J. J. S4:5<br />
Davis, M. 44<br />
Davis, S. 2, S4:6<br />
Deatherage<br />
Kaiser, B. S4:4<br />
de Boer, R. F. 24<br />
De Bruyne, K. 59, S7:3<br />
Defibaugh-<br />
Chavez, S. S3:3<br />
Delisle, J. 57<br />
De Massis, F. 40<br />
den Bakker, H. 44<br />
112 ASM Conferences
Deng, X. S4:4<br />
de Pinna, E. 83, S2:5<br />
Dermott, P. F. 54<br />
Dessai, U. 88<br />
Deurenberg, R. H. 22<br />
Dhand, A. 20<br />
Dhere, T. 61<br />
Dhillon, B. 7, 8<br />
Dickinson, M. 74<br />
Di Giannatale, E. 40<br />
Dimitrova, N. 18, 19, 20,<br />
28<br />
Dinsmore, B. 71<br />
Dinsmore, B. S4:4<br />
Disz, T. S4:5<br />
Dooley, D. 7, 8, S7:10<br />
Drees, K. 40<br />
Dunham, M. J. 100, S4:7<br />
Dunn, S. 1<br />
Duwve, J. W. 12<br />
Dykes, J. K. 89<br />
Edirisinghe, J. S4:5<br />
Edwards, R. A. S4:5<br />
Einer-Jensen, K. 6, S7:4<br />
Eldegail, M. 95<br />
Elson, R. S2:5<br />
Engelthaler, D. 57<br />
Engelthaler, D. M. S9:6<br />
Eppinger, M. 11<br />
Erkkila, T. 90<br />
Escuyer, V. 25<br />
Evans, P. 88<br />
Evans, P. S. S3:5<br />
Ezeoke, I. S3:4<br />
Fallon, J. 20<br />
Fallon, J. T. 18, 19, 28<br />
Fasulo, D. 87<br />
Federhen, S. S6:4<br />
Fedosejev, A. S7:5<br />
Felton, A. 29<br />
Ferdous, M. 23, 24<br />
Ferreira, C. 39<br />
Fields, P. 71<br />
Fields, P. S4:4<br />
Fitzgerald, C. 71, 72<br />
Fitzgerald, C. S4:4<br />
Fitzpatrick, M. A. 31<br />
Flett, K. B. S9:3<br />
Fofanov, Y. 27<br />
Foley, A. 47, 52<br />
Foster, J. T. 40<br />
Fournier, C. S2:2<br />
Fredricks, D. N. 100, S4:7<br />
Friedrich, A. W. 22<br />
Friedrich, A. W. 23, 24<br />
Friesema, I. H. 24<br />
Fu, S. 67<br />
Galac, M. R. S3:4<br />
Galang, R. R. 12<br />
Ganova-Raeva, L. 12<br />
Garcia, D. S7:5<br />
Garcia-Toledo, L. 71<br />
Garcia-Toledo, L. 65<br />
Garofolo, G. 40<br />
Gauthier, M. 55<br />
Gehr, E. 80<br />
Gentry, J. 12<br />
Gerner-Smidt, P. 71, 72<br />
Gerner-Smidt, P. 65, 88,<br />
S3:6<br />
Gibbons, H. S. S3:4<br />
Gillece, J. 57<br />
Gimonet, J. S2:2<br />
Gladney, L. 71, 72<br />
Gladney, L. 65<br />
Gladney, L. M. 87<br />
Glaser, A. 50<br />
Glasner, C. S7:5<br />
Goater, R. S7:5<br />
Golovko, G. 27<br />
Gonzalez-<br />
Escalona, N. 54<br />
Gopinath, G. 50<br />
Graham, M. 7, 8, S7:10<br />
Graham, R. 75<br />
Grant, K. 83, S2:5<br />
Grau, F. R. S2:3<br />
Green, K. Y. S10:3<br />
Griffiths, E. 8, S7:10<br />
Griffiths, E. J. 7<br />
Grim, C. 48<br />
Grim, C. J. 43<br />
Griswold, T. 65<br />
Grundmann, H. 22<br />
Gulvik, C. A. 64<br />
Guo, X. 58<br />
Gupta, A. 18, 20<br />
Gutiérrez, P. A. 17, 97<br />
Gutierrez-<br />
Aguirre, I. 96<br />
Index<br />
Gymoese, P. S1:3<br />
Habibi, N. S6:3<br />
Haley, B. J. 49<br />
Hall, C. 57<br />
Halpin, J. L. 89<br />
Halse, T. 25, 79<br />
Hanes, D. 48, 50<br />
Hanes, D. E. 43<br />
Hänninen, M.-L. 37<br />
Hauser, A. R. 31<br />
Haydek, J. P. 61<br />
Heaton, H. 57<br />
Hendriksen, R. S. 41, S10:6<br />
Heneine, W. 12<br />
Henry, C. S4:5<br />
Herrick, J. 80<br />
Highlander, S. 35<br />
Hillman, D. 12<br />
Hinkle, M. 13, 14<br />
Hinkle, M. K. S9:5<br />
Hjelmsø, M. H. S10:6<br />
Hoffmann, M. 54, S3:5<br />
Holler, L. 52<br />
Howden, B. P. S7:7<br />
Hsiao, W. 7, 8, S7:10<br />
Huang, A. 71, 72<br />
Huang, A. 87<br />
Huang, A. D. 73, 81<br />
Huang, W. 18, 19, 28<br />
Hueftle, Y. 57<br />
Im, S. 71<br />
Im, S. 65<br />
Im, S. B. 81<br />
Isaac-Renton, J. 8<br />
Ismail, A. 5<br />
Iwamoto, T. 3<br />
Jang, G. 38<br />
Janies, D. S3:5<br />
Janies, D. A. S3:4<br />
Janssens, K. S7:3<br />
Jaramillo Mesa, H. 17, 97<br />
Jarvis, K. 48<br />
Jarvis, K. G. 43<br />
Jayaram, A. 50<br />
Jean-Gilles<br />
Beaubrun, J. 50<br />
Jennison, A. 75<br />
Jia, H. 12<br />
Jironkin, A. S7:9<br />
Joensen, K. 65<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
113
Index<br />
Joensen, K. G. 78<br />
Johansen, J. 56, 6, S7:4<br />
Jones, M. 35, S4:4<br />
Joseph, L. 72<br />
Julius, M. 14<br />
Kajon, A. 99<br />
Kapsak, C. 80<br />
Karangwa, C. K. S10:3<br />
Karns, J. S. 49<br />
Kasam, M. S2:2<br />
Kato, S. 3<br />
Katz, L. S. 71, 72<br />
Katz, L. S. 65<br />
Katz, L. S. 81, 88,<br />
S3:3,<br />
S7:10<br />
Kaur, S. 66<br />
Kearse, M. 1, S7:2<br />
Keddy, A. 8, S7:10<br />
Keddy, K. H. 5<br />
Keena, M. 47, 52<br />
Keim, P. 57, S9:6<br />
Kenyon, R. W. S4:5<br />
Kersey, R. 91, 92<br />
Kersey, R. K. 34<br />
Khan, M. W. S6:3<br />
Khanipov, K. 27<br />
Khudyakov, Y. 12<br />
Kiil, K. 76<br />
Kim, Y. 94<br />
Kishony, R. S9:3<br />
Kjeldsen, M. S1:3<br />
Klenner, J. 21<br />
Klimke, W. S3:3<br />
Knox, N. S7:10<br />
Koenig, S. S. 11<br />
Kohl, C. 21, 95<br />
Konstantinidis,<br />
K. T. 73<br />
Kooistra-Smid,<br />
A. M. 23, 24<br />
Koren, M. 30<br />
Kornblum, J. S3:4<br />
Koziol, A. 55<br />
Kraft, C. S. 61<br />
Krepps, M. D. S3:4<br />
Kreuze, J. F. 96<br />
Kruczkiewicz, P. 8, S2:4,<br />
S7:10<br />
Ku, Y.-C. 29<br />
Kubota, K. 88<br />
Kucerova, Z. 71<br />
Kucerova, Z. 88<br />
Kuhn, J. 1<br />
Kuroda, M. 3<br />
Kurth, A. S10:4<br />
Kutnjak, D. 96<br />
Kwak, Y. 14<br />
Kwak, Y. I. 30<br />
Kwong, J. S7:7<br />
Laird, M. 7, 8<br />
Laird, M. R. S7:10<br />
Lambert, D. 55<br />
Lamson, D. 29<br />
Lamson, D. M. 99<br />
Lan, R. 66, 67<br />
Lane, C. 83<br />
Lapierre, P. 25, 74, 79<br />
Laplante, J. 99<br />
Lasker, B. 77<br />
Lauer, A. C. 77<br />
Lee, H. 38<br />
Leekitcharoenphon,<br />
P. 41<br />
Lesho, E. 13, 14<br />
Lesho, E. P. 30, S9:5<br />
Levinson, K. J. 74<br />
Li, A.-D. 63<br />
Li, Y. 58<br />
Liachko, I. 100, S4:7<br />
Liboriussen, P. 56, 6, S7:4<br />
Libuit, K. 80<br />
Lim, S. 38<br />
Limbago, B. M. 64<br />
Lin, H. 19, 20, 28<br />
Lin, H. C. 18<br />
Lin, Y. S3:4<br />
Lindsey, R. L. 71<br />
Lindsey, R. L. 65<br />
Lindsey, R. L. 81, 87<br />
Litrup, E. 76<br />
Liu, P. 58<br />
Lokate, M. 22<br />
Lovchik, J. 12<br />
Lui, L. 12<br />
Lukjancenko, O. 78, S10:6<br />
Luo, Y.<br />
Luquette, A. 59<br />
Lúquez, C. 89<br />
2, S3:5,<br />
S4:6<br />
Mabon, P. S7:10<br />
Machi, D. S4:5<br />
Magana, L. 98<br />
Mahmoud, I. 95<br />
Maiden, M. C. 72<br />
Mangone, I. 40<br />
Manninger, P. 55<br />
Mao, C. S4:5<br />
Mao, Y. 62<br />
Marcacci, M. 40<br />
Marín Montoya,<br />
M. A. 17, 97<br />
Markowitz, S. 1<br />
Martin, H. 71<br />
Martin, H. 65<br />
Maslanka, S. M. 89<br />
Materna, A. 56, S7:4<br />
Materna, A. C. 6, S7:4<br />
Matthews, T. 7, 8, S7:10<br />
Maybank, R. 13, S9:5<br />
Mayigowda, P. 18, 20<br />
Mayo, M. 57<br />
McDermott, P. 72<br />
Mc Gann, P. 30, S9:5<br />
McGann, P. 13<br />
McGinnis, J. 99<br />
McIntosh, M. T. S2:3<br />
McQuiston, J. R. 77<br />
Meng, J. 54, S3:5<br />
Mi, T. 18, 20<br />
Michot, L. S2:2<br />
Millard, A. S7:8<br />
Miller, J. 54<br />
Miller, W. 72<br />
Mitarai, S. 3<br />
Moine, D. S2:2<br />
Moir, R. 1, S7:2<br />
Molina, M. 15, 60<br />
Monday, S. R. 54, S3:5<br />
Montasser, M. S. 93<br />
Montmayeur, A. 98<br />
Moon, D. 38<br />
Munk, P. 46<br />
Muñoz, L. 97<br />
Muñoz Baena, L. 17<br />
Muñoz Escudero, D. 17<br />
Murase, Y. 3<br />
Murugesan, K. 18, 20<br />
Muruvanda, T. 39, S3:5<br />
Musser, K. 25<br />
114 ASM Conferences
Musser, K. A. 74<br />
Musser, K. A. 79<br />
Mustafa, A. S. S6:3<br />
Myers, R. S3:5<br />
Nair, S. 83, S2:5<br />
Nash, J. H. S2:4<br />
Neish, E. 61<br />
Nelson, E. 47, 52<br />
Ng, T. 98<br />
Ngeno, E. S10:6<br />
Ngom-Bru, C. S2:2<br />
Nitsche, A. 10, 21, 9,<br />
95, S10:4<br />
Nolan, S. M. 19, 28<br />
Norman, K. N. 51<br />
NT, J. S7:5<br />
Oberste, S. 98<br />
Octavia, S. 66, 67<br />
Ogunremi, D. 53<br />
Ojo, O. O. 68<br />
Olsen, C. 1, S7:2<br />
Olsen, G. J. S4:5<br />
Olson, R. S4:5<br />
Ong, A. 13<br />
Ong, A. C. 30, S9:5<br />
Onmus-Leone, F. 13, 14, 30,<br />
S9:5<br />
Osman, A. 95<br />
Overbeek, R. S4:5<br />
Ozer, E. A. 31<br />
Padilla, J. 14<br />
Pallen, M. J. S7:8<br />
Parikh, C. 29<br />
Parra, G. I. S10:3<br />
Parrello, B. S4:5<br />
Partridge, S. 4<br />
Payne, J. 54<br />
Pena-Gonzalez, A. 73<br />
Peng, Y. 82, 86<br />
Pereira, R. 44<br />
Perez, A. 12<br />
Peters, P. J. 12<br />
Peters, T. 83, S2:5<br />
Petkau, A. 7, 8, S7:10<br />
Pettengill, J. 39, 54,<br />
S4:6<br />
Pettengill, J. B. 2<br />
Peyrani, P. 12<br />
Pillatzki, A. 52<br />
Pimenova, M. 27<br />
Platone, I. 40<br />
Pontones, P. 12<br />
Posey, J. 85<br />
Pot, B. S7:3<br />
Pouseele, H. 71, 72<br />
Pouseele, H. 59, 65, 88,<br />
S7:3<br />
Prarat, M. 36<br />
Prentice, M. B. 68<br />
Priebe, G. P. S9:3<br />
Pruckler, J. 71, 72<br />
Purucker, T. 15<br />
Pusch, G. D. S4:5<br />
Qaadri, K. 1, S7:2<br />
Quinlan, T. 79<br />
Raangs, G. C. 22<br />
Radonic, A. 95, S10:4<br />
Ramachandran, S. 12<br />
Ramos, E. 98<br />
Rand, H.<br />
2, S3:3,<br />
S4:6<br />
Raphael, B. H. 89<br />
Ravnikar, M. 96<br />
Reed, E. 39<br />
Reimer, A. S7:10<br />
Renard, B. 10<br />
Ribot, E. M. 72<br />
Ribot, E. M. 65<br />
Ribot, E. R. 87<br />
Ribot, E. M. 88<br />
Rishishwar, L. 81<br />
Roache, K. 71<br />
Roache, K. 88<br />
Robbe-Austerman,<br />
S. 36<br />
Robinson, T. J. 16<br />
Rockweiler, T. 34<br />
Rodriguez, A. L. 11<br />
Rodwell, T. C. S9:6<br />
Rogers, M. 33<br />
Rojas, M. 27<br />
Rose, J. 94<br />
Roseberry, J. C. 12<br />
Rossen, J. W. 22<br />
Rossen, J. W. 23, 24<br />
Rossi, M. 37<br />
Rota, P. 98<br />
Rothgänger, J. S7:6<br />
Rupar, M. 96<br />
Rusconi, B. 11<br />
Index<br />
Sabol, A. 88<br />
Saeed, A. 91, 92<br />
Sahl, J. 57<br />
Saleem Haider, M. 45<br />
Sammons, S. 58<br />
Sandoval, M. 12<br />
Sapiro, V. 91, 92<br />
Scaria, J. 47, 50, 52<br />
Schauser, L. 56, 6, S7:4<br />
Scheutz, F. 65<br />
Schriml, L. 7, 8<br />
Schuenadel, L. 95, S10:4<br />
Schupp, J. 57<br />
Scott, H. M. 51<br />
Seemann, T. S7:7<br />
Sekizuka, T. 3<br />
Senturk, I. S3:5<br />
Sergeant, M. S7:8<br />
Shafiq, M. 45<br />
Shaheed, F. S6:3<br />
Shankar, A. 12<br />
Shaw, T. 60<br />
Shay, J. 7, 8<br />
Shea, J. 25<br />
Shearman, H. 1, S7:2<br />
Shendure, J. 100, S4:7<br />
Shudt, M. 25<br />
Shudt, M. 99<br />
Shukla, M. P. S4:5<br />
Shumway, M. S3:3<br />
Sieffert, C. S7:10<br />
Siler, J. 44<br />
Simmons, M. S3:3<br />
Sinnige, J. 33<br />
Sintchenko, V. 66, 67<br />
Sischo, W. 44<br />
Sjölund-Karlsson,<br />
M. 64<br />
Smith, A. M. 5<br />
Snesrud, E. 13, 14, 30,<br />
S9:5<br />
Sobral, B. W. S4:5<br />
Sosnovtsev, S. V. S10:3<br />
Sparks, M. 14<br />
Stanton-Cook, M. J. 16<br />
Stevens, E. L. 42<br />
Stevens, R. L. S4:5<br />
St. George, K. 29, 99<br />
Storey, D. B. S7:11<br />
Strain, E. 2, 39<br />
ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />
Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />
115
Index<br />
Strino, F. 56<br />
Stripling, D. 71<br />
Stripling, D. 65<br />
Strockbine, N. 71<br />
Strockbine, N. 65, 81, 87<br />
Stroika, S. 88<br />
Stuber, T. 36<br />
Sun, Y. 29<br />
Sutton, G. G. S9:4<br />
Suzuki, H. 47, 50, 52<br />
Switzer, W. H. 12<br />
Sycuro, L. 100, S4:7<br />
Taboada, E. 7, 8, S7:10<br />
Taboada, E. N. S2:4<br />
Tan, W. S10:5<br />
Tanaka, M. 66, 67<br />
Tang, P. 8<br />
Tarr, C. 71<br />
Tarr, C. 88<br />
Tarr, C. L. 73<br />
Tausch, S. 10<br />
Tauxe, W. M. 61<br />
Thachil, A. 50<br />
Thai, H. 12<br />
Thiessen, J. 8<br />
Thobela, M. 5<br />
Thomas, M. 47, 50, 52<br />
Thompson, L. 74<br />
Tillman, G. S3:3<br />
Timme, R. E. S3:3<br />
Tondella, M. L. 82<br />
Tondella, M. L. 86<br />
Top, J. 33<br />
Torpdahl, M. S1:3<br />
Trees, D. 70<br />
Trees, E. 72<br />
Trees, E. 65, 81, 84,<br />
87, 88,<br />
S3:3<br />
Trees, E. K. S3:6<br />
Tsafnat, G. 4<br />
Turner, S. D. 80<br />
Turnsek, M. 71<br />
Underwood, A. S7:9<br />
Useh, N. 47<br />
Välimäki, N. 37<br />
Van Domselaar, G. 7, 8, S7:10<br />
van Duyne, S. 71<br />
Van Kessel, J. S. 49<br />
Van Roey, P. 25<br />
Vazquez, A. 57<br />
Vehkala, M. 37<br />
Vigre, H. 46<br />
Vinje, J. 98<br />
Vonstein, V. S4:5<br />
Vuyisich, M. 69<br />
Wagner, D. 72<br />
Wagner, D. D. 84<br />
Wagner, D. M. 57<br />
Waldram, A. S2:5<br />
Walker, G. T. 34<br />
Walker, T. 91, 92<br />
Wan, Q. 19<br />
Wang, C. 39<br />
Wang, G. 18, 19, 20,<br />
28<br />
Wang, L. 36<br />
Wang, Q. 66<br />
Wang, Y. S10:5<br />
Ward, A. 61<br />
Ward, G. 14<br />
Warnick, L. 44<br />
Warren, A. S4:5<br />
Waterman, P. 14<br />
Waterman, P. E. 30, S9:5<br />
Watt, J. 58<br />
Wattam, A. R. S4:5<br />
Weigand, M. R. 73, 82<br />
Weigand, M. R. 86<br />
Weimer, B. C. S7:11<br />
Weiss, D. S3:4<br />
Wengert, S. 94<br />
Whichard, J. 72<br />
White, J. 48<br />
White, J. R. 43<br />
Wiedmann, M. 44<br />
Will, R. S4:5<br />
Willems, R. 33<br />
Williams, G. 72<br />
Williams, G. 71<br />
Williams, M. M. 82, 86<br />
Winsor, G. 7, 8, S7:10<br />
Wirth, S. 74<br />
Wirth, S. E. 79<br />
Wiser, A. H. 100, S4:7<br />
Wolcott, M. 59<br />
Wolfgang, W. J. 74<br />
Wolfgang, W. J. 79<br />
Wolfgang, W. J. S3:5<br />
Wong, K. 15, 60<br />
Wright, M. S. S9:4<br />
Xia, F. S4:5<br />
Xia, G. 12<br />
Yamashita, A. 3<br />
Yeats, C. A. S7:5<br />
Yin, Y. S4:4<br />
Yoo, H. S4:5<br />
Yoo, Y. 50<br />
Yoshida, C. S2:4<br />
Yu, Q. 58<br />
Zhang, A. 62<br />
Zhang, J. 37<br />
Zhang, S. S4:4<br />
Zhang, T. 62, 63<br />
Zhang, Y. 36<br />
Zhang, Z. S4:4<br />
Zhao, S. 72<br />
Zhao, S. 54<br />
Zheng, J. 39<br />
Zhou, K. 22, 23<br />
Zhou, Z. S7:8<br />
Zhu, Q. 35<br />
Zhuge, J. 19, 28<br />
Zilli, K. 40<br />
Zong, Z. 26<br />
116<br />
ASM Conferences
American Society for Microbiology<br />
1752 N Street, N.W.<br />
Washington, DC 20036-2904