18.09.2015 Views

Abstracts

ngsfinalprogram

ngsfinalprogram

SHOW MORE
SHOW LESS

Transform your PDFs into Flipbooks and boost your revenue!

Leverage SEO-optimized Flipbooks, powerful backlinks, and multimedia content to professionally showcase your products and significantly increase your reach.

Final Program and <strong>Abstracts</strong><br />

ASM Conference on<br />

Rapid Next-Generation Sequencing<br />

and Bioinformatic Pipelines for<br />

Enhanced Molecular Epidemiologic<br />

Investigation of Pathogens<br />

September 24 – 27, 2015<br />

Washington, DC


© 2015 American Society for Microbiology<br />

1752 N Street, N.W.<br />

Washington, DC 20036-2904<br />

Phone: 202-737-3600<br />

World Wide Web: www.asm.org<br />

All Rights Reserved<br />

Printed in the United States of America


Table of Contents<br />

ASM Conferences Information....................................... 2<br />

Conference Organization................................................ 3<br />

Acknowledgments........................................................... 3<br />

General Information........................................................ 4<br />

Travel Grants................................................................... 5<br />

Scientific Program........................................................... 6<br />

Oral Presentation <strong>Abstracts</strong>........................................... 15<br />

Poster <strong>Abstracts</strong>............................................................. 39<br />

Index........................................................................... 112<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

1


ASM Conferences Committee<br />

Sean Whelan, Chair<br />

Harvard Medical School<br />

Joanna Goldberg, Vice Chair<br />

Emory University<br />

Victor DiRita<br />

University of Michigan<br />

Lora Hooper<br />

University of Texas Southwestern<br />

Medical Center<br />

Petra Levin<br />

Washington University in Saint Louis<br />

Gary Procop*<br />

Cleveland Clinic<br />

Curtis Suttle<br />

University of British Columbia<br />

Theodore White<br />

University of Missouri – Kansas City<br />

*Indicates Committee Liaison for this Conference<br />

ASM Conferences Mission<br />

To identify emerging or underrepresented topics of broad scientific significance.<br />

To facilitate interactive exchange in meetings of 100 to 500 people.<br />

To encourage student and postdoctoral participation.<br />

To recruit individuals in disciplines not already involved in ASM to ASM<br />

membership.<br />

To foster interdisciplinary and international exchange and collaboration with<br />

other scientific organizations.<br />

2<br />

ASM Conferences


Program Committee<br />

Marc Allard<br />

U.S. Food and Drug<br />

Administration<br />

Silver Spring, MD<br />

Eric Brown<br />

U.S. Food and Drug<br />

Administration<br />

Silver Spring, MD<br />

Dag Harmsen<br />

University of Muenster<br />

Muenster, Germany<br />

Acknowledgments<br />

The Conference Program Committee and the American Society for Microbiology<br />

acknowledge the following for their support of the ASM Conference on Rapid<br />

Next-Generation Sequencing and Bioinformatic Pipelines for Enhanced Molecular<br />

Epidemiologic Investigation of Pathogens. On behalf of our leadership and members,<br />

we thank them for their financial support:<br />

Platinum Supporters<br />

Illumina OpGen ThermoFisher<br />

Gold Supporters<br />

Applied Maths<br />

PathoNGenTrace<br />

Qiagen<br />

Ridom Bioinformatics<br />

Silver Supporters<br />

Geneious<br />

Genologics<br />

Bronze Supporters<br />

Microbial Genomics –<br />

a journal from the Society<br />

for General Microbiology<br />

(SGM)<br />

New England Biolabs<br />

Sponsored Seminars<br />

The Platinum sponsors are hosting additional educational opportunities during the<br />

conference. Please stop by the tabletop displays of Illumina, OpGen and ThermoFisher<br />

to learn more about how you can participate. Space is limited.<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

3


General Information<br />

REGISTRATION AND NAME BADGES<br />

ASM Staff will be available at the<br />

registration desk in the Blue Room<br />

Pre-function at the Omni Shoreham<br />

Hotel during posted registration hours.<br />

Participants may collect name badges and<br />

program materials at the registration desk.<br />

A name badge is required for entry into all<br />

sessions and meals. Each participant may<br />

register one guest to attend the Welcome<br />

Reception at the National Zoo and/or<br />

the Conference Party. Guest tickets are<br />

$50 per event. Guests may not attend<br />

sessions, poster sessions, lunches or<br />

coffee breaks.<br />

GENERAL SESSIONS<br />

All general sessions will be held in the<br />

Blue Room in the Omni Shoreham Hotel.<br />

POSTER SESSIONS<br />

Poster boards are located in the Hampton<br />

Room at the Omni Shoreham Hotel.<br />

Posters will be available for viewing<br />

informally throughout the conference,<br />

with official poster sessions scheduled on<br />

Friday and Saturday.<br />

All posters may be mounted starting on<br />

Thursday after 3:00 pm, and should be<br />

available to view by no later than 10:00<br />

am Friday morning. Posters are to be<br />

removed after 6:30 pm on Saturday,<br />

September 26, but by no later than 12:30<br />

pm on Sunday, September 27. Posters not<br />

removed in time may be discarded.<br />

Odd-numbered posters (1,3,5…) will<br />

be officially presented in Session A on<br />

Friday, September 25, and even-numbered<br />

posters (2,4,6…) will be officially<br />

presented in Session B on Saturday,<br />

September 26.<br />

Please check your assigned number in the<br />

abstract index. The same number is used<br />

for the presentation and board number.<br />

EXHIBITS<br />

Please be sure to visit the supporter’s<br />

display tables in the Blue Room<br />

Pre-Function. Supporting company<br />

representatives will be available to talk<br />

with you during coffee breaks.<br />

NETWORKING MEALS AND SOCIAL<br />

EVENTS<br />

Registration includes attendance at the<br />

Welcome Reception at the National<br />

Zoo on Thursday evening, Networking<br />

Lunches on Friday and Saturday, and the<br />

Conference Party with DJ and Dancing<br />

on Saturday night. Ample time has also<br />

been scheduled for participants to network<br />

during coffee breaks.<br />

CERTIFICATE OF ATTENDANCE<br />

Certificates of Attendance can be found<br />

in the registration packet received at the<br />

registration desk.<br />

Note: Certificates of Attendance do not<br />

list session information.<br />

CAMERAS AND RECORDINGS<br />

POLICY<br />

Audio/video recorders and cameras are<br />

not allowed in session rooms or in the<br />

poster areas. Taking photographs with<br />

any device is prohibited.<br />

CHILD POLICY<br />

Children are not permitted in session<br />

rooms, poster sessions, conference meals<br />

or social events. Please contact the hotel<br />

concierge to arrange for babysitting<br />

services in your hotel room.<br />

4<br />

ASM Conferences


Travel Grants<br />

ASM STUDENT TRAVEL GRANTS<br />

ASM encourages the participation of graduate students and new postdocs at ASM<br />

Conferences. To support the cost of attending the conference, ASM has awarded travel<br />

grants of $500 to each of the following individuals:<br />

Levent Albayrak<br />

Nabil-Fareed Alikhan<br />

Philip Ashton<br />

Ellsworth Campbell<br />

Laura Carroll<br />

Hattie Chung<br />

Madeline Galac<br />

John Haydek<br />

Mathis Hjelmsø<br />

Sung Im<br />

Marianne Kjeldsen<br />

Denis Kutnjak<br />

Ana Lauer<br />

Kara Levinson<br />

An-Dong Li<br />

Helena Jaramillo Mesa<br />

Muhammad Shafiq<br />

Dylan Storey<br />

Anni Zhang<br />

ASM-LINK UNDERGRADUATE FACULTY RESEARCH INITIATIVE<br />

FELLOWSHIPS<br />

The ASM-LINK Undergraduate Faculty Research Initiative (UFRI) Fellowship is a<br />

professional development resource that trains STEM faculty to initiate and sustain<br />

successful research partnerships. Through interactive training, structured mentoring,<br />

and deliberate networking at ASM-sponsored research conferences, UFRI fellows gain<br />

access to resources and networks to advance their undergraduate research programs.<br />

We congratulate the 2015 ASM Conference on Rapid Next-Generation Sequencing<br />

UFRI Fellows:<br />

Olga Calderon<br />

LaGuardia Community College, CUNY, Long Island, NY<br />

Robert Furler<br />

Florida SouthWestern State College, Ft. Myers, FL<br />

Olabisi Ojo<br />

Southern University at New Orleans, New Orleans, LA<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

5


Scientific Program<br />

Thursday, September 24, 2015<br />

5:00 pm – 7:00 pm Opening Keynote Session<br />

Blue Room<br />

Session Chair: Steven Musser<br />

5:00 – 5:15 pm Welcome Remarks<br />

Joseph Campos; Secretary, American Society for<br />

Microbiology, Washington, DC<br />

Gary Procop; ASM Conferences Committee, Washington, DC<br />

Eric Brown; US Food and Drug Administration, Silver<br />

Spring, MD<br />

5:15 – 6:00 pm Microbial Genomics and Beyond<br />

George Weinstock; Jackson Laboratory, Farmington, CT<br />

6:00 – 6:45 pm Translating Genomics from Research to Clinical Application<br />

Julian Parkhill; Univ. of Cambridge, Cambridge, United<br />

Kingdom<br />

7:15 – 9:00 pm Welcome Reception<br />

National Zoo, Hors d’oeuvres and an open bar will be provided. You must<br />

Bird House<br />

have your name badge to enter.<br />

(10 min. walk from Omni)<br />

To get there: Walking out the front entrance of the Omni<br />

Shoreham, walk to the right on Calvert Street (the street<br />

in front of the hotel), and at the major intersection with<br />

Connecticut Avenue turn left to walk up Connecticut Avenue.<br />

The National Zoo entrance will be on your right. The Bird<br />

House is the first major building you will reach as you enter<br />

the Zoo. Staff will be on hand to direct you once you arrive at<br />

the Zoo.<br />

A small shuttle van will also run on a continuous loop between<br />

the hotel and the zoo for attendees who elect not to walk (note<br />

the van is small so wait times could be significant).<br />

6<br />

ASM Conferences


Scientific Program<br />

Friday, September 25, 2015<br />

8:00 – 8:45 am Surveillance Keynote I<br />

Blue Room<br />

Session Chair: Eric Brown<br />

Priming the Innovation Pump: FDA’s Role in Advancing and<br />

Using NGS<br />

Stephen Ostroff; US Food and Drug Administration, Silver<br />

Spring, MD<br />

8:45 – 10:00 am Session 1: Genomics for Food and Veterinary Pathogen<br />

Blue Room<br />

Surveillance<br />

Session Chair: Eric Brown<br />

8:45 – 9:15 am GenomeTrakr: A Pathogen Database to Build a Global<br />

Genomic Network for Pathogen Traceback and Outbreak<br />

Detection<br />

Marc Allard; US Food and Drug Administration, Silver<br />

Spring, MD<br />

9:15 – 9:45 am Global Microbial Identifier – is Global Harmonization and<br />

Comparison of WGS Data Feasible?<br />

Frank Aarestrup; Technical Univ. of Denmark, Lyngby,<br />

Denmark<br />

9:45 – 10:00 am Three Months of Surveillance of S. Typhimurium and<br />

S. 1,4,[5],12:i:- in Denmark Based on Whole-genome<br />

Sequencing and MLVA Typing<br />

Marianne Kjeldsen; Statens Serum Institut, Copenhagen,<br />

DENMARK<br />

10:00 – 10:30 am Coffee Break<br />

Blue Prefunction Room<br />

and Patio<br />

10:30 – 12:00 pm Session 2: One Health, Regulation and Assuring the<br />

Blue Room<br />

Quality of NGS for Pathogen Surveillance<br />

Session Chair: Eric Brown<br />

10:30 – 11:00 am Assuring the Quality of Next-Generation Sequencing in<br />

Clinical and Public Health Laboratories<br />

Amy Gargis; Centers for Disease Control and Prevention<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

7


Scientific Program<br />

11:00 – 11:15 am Long Reads Sequencing for Better Short Reads SNP Analysis<br />

Deborah Moine; Nestlé Institute of Health Sciences,<br />

Lausanne, SWITZERLAND<br />

11:15 – 11:30 am Subtractive-hybridization for Enrichment of Non-host Nucleic<br />

Acid for Improvement of Sequence-based Detection of<br />

Pathogens<br />

Roger Barrette; Plum Island Animal Disease Center (USDA/<br />

APHIS), Greenport, NY<br />

11:30 – 11:45 am The Salmonella in silico Typing Resource (SISTR): Rapid<br />

Analysis of Salmonella Draft Genome Sequence Data<br />

Catherine Yoshida; Public Health Agency of Canda, Guelph,<br />

ON, CANADA<br />

11:45 – 12:00 pm Revolutionising Public Health Reference Microbiology Using<br />

Whole Genome Sequencing: A Case Study with Salmonella<br />

Philip Ashton; Public Health England, London, UNITED<br />

KINGDOM<br />

12:00 – 1:15 pm Lunch<br />

Palladian Ballroom<br />

1:15 – 3:15 pm Session 3: Genomics for Public Health Pathogen Surveillance<br />

Blue Room<br />

Session Chair: Amy Gargis<br />

1:15 – 1:45 pm The Application of Genomics to Public Health—an<br />

Epidemiologist’s Point of View<br />

Greg Armstrong; Centers for Disease Control and Prevention,<br />

Atlanta, GA<br />

1:45 – 2:15 pm Tracing Evolution and Spread of Mycobacterium tuberculosis<br />

Strains in Times of Antibiotic Treatment<br />

Stefan Niemann; Research Center Borstel, Borstel, Germany<br />

2:15 – 2:30 pm Benchmark Datasets for Validating Foodborne Outbreak<br />

Investigations: Integrating WGS and Phylogenomic Analyses<br />

Ruth Timme; US Food and Drug Administration, College<br />

Park, MD<br />

8<br />

ASM Conferences


2:30 – 2:45 pm Integrating Core Genome Phylogenetic Relationships<br />

and Isolate Geographic Data to Trace the 2012 Neisseria<br />

meningitidis Outbreak in New York City<br />

Madeline Galac; Univ. of North Carolina at Charlotte,<br />

Charlotte, NC<br />

2:45 – 3:00 pm Whole Genome Sequencing Provides Rapid Traceback of<br />

Clinical to Food Sources During a Foodborne Outbreak of<br />

Salmonellosis<br />

Maria Hoffmann; US Food and Drug Administration, College<br />

Park, MD<br />

3:00 – 3:15 pm Transforming Public Health Microbiology in the United States<br />

with Whole Genome Sequencing (WGS) - PulseNet and<br />

Beyond<br />

Eija Trees; CDC, Atlanta, GA<br />

3:15 – 3:30 pm Coffee Break<br />

Blue Prefunction Room<br />

and Patio<br />

3:30 – 6:00 pm Session 4: Bioinformatic Pipelines and Tools for Genomic<br />

Blue Room<br />

Pathogen Surveillance<br />

Session Chair: Bill Klimke<br />

3:30 – 4:00 pm Pathogen Genomics at NCBI<br />

David Lipman; National Center for Biotechnology<br />

Information, Bethesda, MD<br />

4:00 – 4:30 pm Overview of Tools for Microbial NGS Data Analysis<br />

Dag Harmsen; Univ. of Munster, Munster, Germany<br />

4:30 – 5:00 pm Community and Social Data / Applications for Genomic<br />

Pathogen Surveillance<br />

David Aanensen, Imperial College, London, United Kingdom<br />

5:00 – 5:15 pm Salmonella Serotype Determination Utilizing High-throughput<br />

Genome Sequencing Data<br />

Xiangyu Deng; Univ. of Georgia, Griffin, GA<br />

5:15 – 5:30 pm PATRIC Pipeline<br />

Fangfang Xia; Univ. of Chicago, Chicago, IL<br />

Scientific Program<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

9


Scientific Program<br />

5:30 – 5:45 pm CFSAN SNP Pipeline: A Whole Genome Sequence Data<br />

Analysis Pipeline for Food-borne Pathogens<br />

Yan Luo; FDA/CFSAN, College Park, MD<br />

5:45 – 6:00 pm Assembling Whole Genomes from Mixed Microbial<br />

Communities Using Hi-C<br />

Ivan Liachko; Univ. of Washington, Seattle, WA<br />

6:00 – 7:30 pm Poster Session A<br />

Hampton Room Odd-numbered posters will be officially presented.<br />

Saturday, September 26, 2015<br />

8:00 – 8:45 am Surveillance Keynote II<br />

Blue Room<br />

Session Chair: Greg Armstrong<br />

Integrating Advanced Molecular Technologies into Public<br />

Health<br />

Rima Khabbaz; Centers for Disease Control and Prevention,<br />

Atlanta, GA<br />

8:45 – 9:45 am Session 5: Omni-Omics for Pathogen Surveillance<br />

Blue Room<br />

Session Chair: Martin Maiden<br />

8:45 – 9:15 am SURPI: A Deep Sequencing Clinical Analysis Tool for<br />

Infectious Disease<br />

Charles Y. Chiu; Univ. of California, San Francisco, San<br />

Francisco, CA<br />

9:15 – 9:45 am Genomics & Transcriptomics in the Clinical Microbiology<br />

Laboratory<br />

Randall J. Olsen; Houston Methodist, Houston, TX<br />

9:45 – 10:15 am Coffee Break<br />

Blue Prefunction Room<br />

and Patio<br />

10:15 – 11:45 am Session 6: Genomics for Microbial Taxonomy<br />

Blue Room<br />

Session Chair: Dag Harmsen<br />

10:15 – 10:45 am A New Genomics Driven Taxonomy. Are We There, Yet?<br />

George Garrity; Michigan State Univ., East Lansing, MI<br />

10<br />

ASM Conferences


Scientific Program<br />

10:45 – 11:15 am Beyond Typing and Phylogeny: the Population and Functional<br />

Genomics of the Neisseria<br />

Martin Maiden; Univ. of Oxford, Oxford, United Kingdom<br />

11:15 – 11:30 am Next Generation Sequencing of Brucella melitensis Isolates<br />

from Kuwait and Comparative Genome Analyses<br />

Abu Mustafa; Kuwait Univ., Jabriya, KUWAIT<br />

11:30 – 11:45 am Microbial Genomic Taxonomy at GenBank<br />

Scott Federhen; NCBI, Bethesda, MD<br />

11:45 am – 1:15 pm Lunch<br />

Empire Ballroom<br />

1:15 – 3:45 pm Session 7: Pathogen Surveillance Software Demonstration<br />

Blue Room<br />

Session Chairs: Kathryn Holt, Bill Klimke, Errol Strain<br />

1:15 – 1:30 pm What Do We Need from Microbial Genomics Surveillance<br />

Software?<br />

Kathryn Holt; Univ. of Melbourne, Melbourne, AUSTRALIA<br />

1:30 – 1:39 pm A Biosurveillance Analysis Pipeline for Genomic Sequence<br />

Data<br />

Christian Olsen; Biomatters, Inc., Newark, NJ<br />

1:39 – 1:48 pm A Universal Whole Genome Sequencing Approach Using<br />

Whole Genome MLST and Whole Genome SNP Analysis in<br />

the Cloud<br />

Hannes Pouseele; Applied Maths NV, Sint-Martens-Latem,<br />

BELGIUM<br />

1:48 – 1:57 pm Typing and Epidemiological Clustering of Common Pathogens<br />

Based on Whole Genome NGS Data<br />

Arne Materna; QIAGEN, Aarhus, DENMARK<br />

1:57 – 2:06 pm wgsa.net: Whole Genome Sequence Analysis<br />

David Aanensen; Imperial College London, London, UNITED<br />

KINGDOM<br />

2:06 – 2:15 pm SeqSphere+ Software for Prospective Bacterial Genomic<br />

Surveillance and Resistome or Virulome Analysis<br />

Jörg Rothgänger; Ridom GmbH, Münster, GERMANY<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

11


Scientific Program<br />

2:15 – 2:24 pm Nullarbor: Rapid Analysis of Bacterial Outbreak Sequence<br />

Data<br />

Torsten Seemann; Univ. of Melbourne, Melbourne,<br />

AUSTRALIA<br />

2:24 – 2:33 pm EnteroBase: A Powerful, User-friendly Online Resource for<br />

Analysing Genomic Variation<br />

Nabil-Fareed Alikhan; Univ. of Warwick, Coventry, UNITED<br />

KINGDOM<br />

2:33 – 2:42 pm SnapperDB: A Scalable Database for Routine Sequencing of<br />

Bacterial Isolates<br />

Philip Ashton; Public Health England, London, UNITED<br />

KINGDOM<br />

2:42 – 2:51 pm Phylogenetic Reconstruction and Outbreak Investigation<br />

Using IRIDA and SNVPhyl<br />

Aaron Petkau; Public Health Agency of Canada, Winnipeg,<br />

MB, CANADA<br />

2:51 – 3:00 pm PanCore: A Flexible Workflow for the Comparison and<br />

Assignment of Genomes to Outcomes<br />

Dylan Storey; Univ. of California, Davis, CA<br />

3:00 – 3:09 pm Reference-free Pan-genomic Epidemiology using Cortex<br />

Zamin Iqbal; Wellcome Trust Centre for Human Genetics,<br />

Univ. of Oxford, Oxford, UNITED KINGDOM<br />

3:10 – 3:15 pm Quality Issues and Standards for Next Generation Sequencing<br />

Bill Klimke; NCBI/NLM/NIH, Bethesda, MD<br />

3:15 – 3:35 pm Retrospective WGS of Two Foodborne Outbreaks<br />

Errol Strain; US Food and Drug Administration, College<br />

Park, MD<br />

3:35 – 4:00 pm Coffee Break<br />

Blue Prefunction Room<br />

and Patio<br />

4:00 – 5:00 pm Session 8: Genomics for Microbial Forensics<br />

Blue Room<br />

Session Chair: Marc Allard<br />

12<br />

ASM Conferences


Scientific Program<br />

4:00 – 4:30 pm Microbial Forensics and Its Needs for Standards and<br />

Standardization<br />

Bruce Budowle; Univ. of Texas, Austin, TX<br />

4:30 – 5:00 pm Anthrax – Molecular Epidemiology and Forensics from Whole<br />

Genome Sequencing and Metagenomic Sampling of Complex<br />

Specimens<br />

Paul Keim; Northern Arizona Univ., Flagstaff, AZ<br />

5:00 – 6:30 pm Poster Session B<br />

Hampton Room Even-numbered posters will be officially presented.<br />

7:00 – 10:00 pm Conference Party<br />

Blue Room<br />

Enjoy a relaxed and fun evening with your colleagues. A DJ<br />

will play a range of music and dancing is encouraged. Heavy<br />

hors d’oeuvres, a carving station and a tickets/cash bar will be<br />

provided.<br />

Sunday, September 27, 2015<br />

8:00 – 10:00 am Session 9: Genomics for Clinical Microbiology Pathogen<br />

Blue Room<br />

Surveillance<br />

Session Chair: Stefan Niemann<br />

8:00 – 8:30 am Full Genome Sequencing to Track Carbapenem Producing<br />

Organisms within a Hospital<br />

Julie Segre; National Institutes of Health, Bethesda, MD<br />

8:30 – 9:00 am Prospective Genome Sequencing for Real-time Surveillance of<br />

Resistant Bacteria in a Univ. Hospital<br />

Alexander Mellmann; Univ. of Munster, Munster, Germany<br />

9:00 – 9:15 am Whole-genome Sequence Analysis of Pseudomonas<br />

aeruginosa in Acute Infection Reveals Widespread withinpopulation<br />

Diversity and Rapid Transmission within the Body<br />

Hattie Chung; Harvard Medical School, Boston, MA<br />

9:15 – 9:30 am Beyond the SNV: Integrating Multiple Data Types into<br />

Genomic Epidemiology<br />

Meredith Wright; J. Craig Venter Institute, La Jolla, CA<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

13


Scientific Program<br />

9:30 – 9:45 am Defining Clonality in Acinetobacter baumannii Using Whole<br />

Genome Sequencing of Outbreak Strains Associated with the<br />

Conflict in Iraq<br />

Erik Snesrud; Walter Reed Army Institute of Research, Silver<br />

Spring, MD<br />

9:45 – 10:00 am Direct from Sputum: Next Gen Analysis of Mycobacterium<br />

tuberculosis in Clinical Samples<br />

David Engelthaler; TGen North, Flagstaff, AZ<br />

10:00 – 10:30 am Coffee Break<br />

Blue Prefunction Room<br />

and Patio<br />

10:30 – 12:30 pm Session 10: Genomics for Virus Surveillance<br />

Blue Room<br />

Session Chair: Charles Y. Chiu<br />

10:30 – 11:00 am Genomic Surveillance of the Ebola Epidemic in Western<br />

Africa<br />

Shirlee Wohl; Harvard Univ., Cambridge, MA<br />

11:00 – 11:30 am Small Game Hunting<br />

W. Ian Lipkin; Columbia Univ., New York, NY<br />

11:30 – 11:45 am Development of an Efficient Next-generation Sequencing<br />

Platform for Charting the Evolution of Norovirus Strains<br />

Gabriel Parra; National Institutes of Health, Bethesda, MD<br />

11:45 – 12:00 pm Genome-wide Comparison of Cowpox Viruses Reveals a New<br />

Clade Related to Variola Virus<br />

Livia Schuenadel; Robert Koch Institute, Berlin, GERMANY<br />

12:00 – 12:15 pm Virome Analyses Among Children with Acute Respiratory<br />

Infection in China<br />

Wenjie Tan; National Institute for Viral Disease Control and<br />

Prevention, China CDC, Beijing, CHINA<br />

12:15 – 12:30 pm Using Wastewater to Monitor Viral Pathogens in the Slum City<br />

of Kibera<br />

Mathis Hjelmsø; Technical Univ. of Denmark, Kgs. Lyngby,<br />

DENMARK<br />

12:30 – 12:35 pm Concluding Remarks and Plans for ASMNGS 2017<br />

Blue Room<br />

Dag Harmsen<br />

14<br />

ASM Conferences


Oral Presentation <strong>Abstracts</strong><br />

n S1:3<br />

THREE MONTHS OF SURVEILLANCE OF S.<br />

TYPHIMURIUM AND S. 1,4,[5],12:I:- IN<br />

DENMARK BASED ON WHOLE-GENOME<br />

SEQUENCING AND MLVA TYPING<br />

M. Kjeldsen, P. Gymoese, M. Torpdahl;<br />

Statens Serum Institut, Copenhagen, DEN-<br />

MARK.<br />

Introduction: Salmonella enterica subsp. enterica<br />

Typhimurium (S. Typhimurium) and its<br />

monophasic variant 1,4,[5],12:i:- are zoonotic<br />

pathogens of significance in both humans and<br />

animals worldwide. In Europe, Salmonella<br />

cause the majority of food-borne outbreaks.<br />

Currently, several laboratories primarily use<br />

pulsed-field gel electrophoresis (PFGE) and<br />

Multiple-locus variable-number tandem repeat<br />

analysis (MLVA) for surveillance and outbreak<br />

investigations of Salmonella. Surveillance<br />

studies based on whole-genome sequences<br />

(WGS) shows good results and are promising<br />

alternatives to conventional methods. In this<br />

study, we evaluate SNP analysis in comparison<br />

to MLVA for surveillance of S. Typhimurium<br />

and S. 1,4,[5],12:i:-. Materials and Methods:<br />

We analyzed all S. Typhimurium and S.<br />

1,4,[5],12:i:- human clinical isolates from the<br />

Danish surveillance program from January to<br />

March 2015. This collection comprises of 40<br />

monophasic S. Typhimurium and 66 S. Typhimurium<br />

isolates, hereunder three outbreaks<br />

defined by MLVA-typing and epidemiological<br />

findings. The relatedness of the strains was<br />

examined by core genome SNP analysis, and<br />

results were compared with those of MLVA<br />

and Multi-locus sequence typing (MLST).<br />

Results: WGS analysis on the collection of<br />

106 strains resulted in close to 5900 SNPs.<br />

A clear correlation between SNP and MLST<br />

analysis was observed. S. Typhimurium ST36<br />

was separated by a deep branch from ST19<br />

and ST34. Isolates of ST34 mainly comprised<br />

monophasic variants and were separated by<br />

440 SNPs, indicating a close relationship<br />

within this group. In correspondence with the<br />

MLVA defined S. Typhimurium outbreaks,<br />

the SNP based tree revealed three clusters of<br />

closely related strains with a few SNP differences.<br />

In one of the outbreaks, MLVA included<br />

35 isolates while the SNP analysis added two<br />

potential outbreak isolates to this cluster. In the<br />

ST34 group, SNP analysis dispersed all MLVA<br />

clusters, including the outbreak cluster (of<br />

eight isolates) located within this group; SNP<br />

queried if one of the eight defined outbreak<br />

isolates should be included. Conclusion: Our<br />

results show that strains with identical MLVA<br />

profiles can be either unrelated or closely related<br />

based on SNP distance determined from<br />

WGS. Using WGS analysis for outbreak detection<br />

seems reliable and in addition, it provides<br />

a higher resolution of the strains relationships.<br />

At present, defining an outbreak solely on SNP<br />

differences is problematic, since the number<br />

of SNP differences allowed within a cluster<br />

have to be considered. This study highlights<br />

the challenges with both SNP and MLVA based<br />

cluster detection and emphasizes the importance<br />

of combining molecular methods with<br />

epidemiological data.<br />

n S2:2<br />

LONG READS SEQUENCING FOR BETTER<br />

SHORT READS SNP ANALYSIS<br />

D. Moine 1 , L. Baert 2 , C. Barretto 2 , C. Ngom-<br />

Bru 2 , M. Kasam 1 , C. Fournier 1 , L. Michot 2 , J.<br />

Gimonet 2 , C. Chilton 2 ;<br />

1<br />

Nestlé Institute of Health Sciences, Lausanne,<br />

SWITZERLAND, 2 Nestle Research Center,<br />

Lausanne, SWITZERLAND.<br />

Whole genome sequencing (WGS) is an<br />

emerging tool for foodborne pathogen characterization.<br />

It can help to identify and type<br />

the bacteria for investigative purposes (source<br />

attribution), factory ecology and trend analysis<br />

in the food industry. This novel approach is<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

15


Oral Presentation <strong>Abstracts</strong><br />

more precise and faster than former methods<br />

like ribotyping or 16S gene sequencing. The<br />

sequencing of the isolates (e.g. Salmonella,<br />

Listeria, Cronobacter) was performed using<br />

Illumina short read sequencing technology.<br />

Regarding bioinformatic analysis, the approach<br />

of single nucleotide polymorphism (SNP)<br />

called from a reference is a good way to obtain<br />

precise result. One key parameter when doing<br />

SNP analysis is to have a valuable reference<br />

genome that is closely related to the targeted<br />

strain. Often those complete reference genomes<br />

are not available. Only a few strains of<br />

Salmonella, Listeria or Cronobacter have complete<br />

publicly available genomes of high quality.<br />

To create the reference genomes, we used<br />

the Pacific Biosciences long reads sequencing<br />

technology followed by hierarchical genome<br />

assembly process (HGAP) for de novo genome<br />

assembly. The assemblies were then validated<br />

using a quality checking pipeline (mega blast<br />

and dot plot comparison).<br />

n S2:3<br />

SUBTRACTIVE-HYBRIDIZATION FOR<br />

ENRICHMENT OF NON-HOST NUCLEIC ACID<br />

FOR IMPROVEMENT OF SEQUENCE-BASED<br />

DETECTION OF PATHOGENS<br />

R. W. Barrette, F. R. Grau, M. T. McIntosh;<br />

Plum Island Animal Disease Center (USDA/<br />

APHIS), Greenport, NY.<br />

Next generation sequencing is a powerful tool<br />

for detection and characterization of pathogens.<br />

This technology is rapidly reducing the<br />

sequencing cost per base while simultaneously<br />

increasing the amount of sequence data<br />

produced. Next generation sequencing is<br />

particularly well suited to screening complex<br />

mixtures of nucleic acids, which can be exploited<br />

for pathogen identification. However,<br />

such nucleic acid mixtures are often biased to<br />

host-derived, rather than pathogen-derived,<br />

nucleic acid. This can greatly reduce the<br />

number of sequence reads that are relevant to<br />

infectious agents. By hybridization of random<br />

cDNA from samples to host RNA immobilized<br />

on paramagnetic beads, we have developed a<br />

subtracted-hybridization method that may be<br />

used to enrich microbial or viral cDNA from<br />

diagnostic specimens. To test this method,<br />

samples from cattle exhibiting clinical signs<br />

similar to foot-and-mouth disease (FMD) were<br />

cultured in primary lamb kidney cells (LK) and<br />

a monkey kidney cell line (Vero) for isolation<br />

of potential viruses. FMD virus was rapidly<br />

ruled out by conventional diagnostic methods<br />

including real-time RT-PCR of the original and<br />

cultured samples; however, electron microscopy<br />

of LK and Vero cell cultures revealed the<br />

presence of picornavirus-like particles. LK cultured<br />

material was subsequently subjected to<br />

random cDNA amplification, host subtraction<br />

and next-generation sequencing. Total cDNA<br />

was simultaneously sequenced to compare the<br />

effectiveness of the new method. Results from<br />

the host subtracted cDNA library resolved 74%<br />

of the genome of a novel isolate of Enterovirus<br />

Type 1A. This was compared to 34% virus<br />

genome coverage by direct NGS. Phylogenetic<br />

analysis revealed that only 84% nucleotide sequence<br />

identity was shown between the newly<br />

identified enterovirus and the next most closely<br />

related virus. This work illustrates the utility<br />

of next generation sequencing and host nucleic<br />

acid subtraction in sequence-based detection<br />

and characterization of new viruses.<br />

n S2:4<br />

THE SALMONELLA IN SILICO TYPING<br />

RESOURCE (SISTR): RAPID ANALYSIS OF<br />

SALMONELLA DRAFT GENOME SEQUENCE<br />

DATA<br />

C. Yoshida 1 , P. Kruczkiewicz 2 , J. H. Nash 1 , E.<br />

N. Taboada 2 ;<br />

1<br />

Public Health Agency of Canda, Guelph, ON,<br />

CANADA, 2 Public Health Agency of Canda,<br />

Lethbridge, AB, CANADA.<br />

Salmonella is an important food safety and<br />

public health concern worldwide. Serotyping<br />

has historically been essential in human<br />

disease surveillance and outbreak prevention<br />

and control. Although laboratories around the<br />

16<br />

ASM Conferences


Oral Presentation <strong>Abstracts</strong><br />

world rely on serotyping as a primary means<br />

of isolate characterization, there is significant<br />

consensus that current means of Salmonella<br />

subtyping often lack the specificity and discriminatory<br />

power required in the context of<br />

epidemiologic investigations. Whole-genome<br />

sequencing (WGS) is increasingly being adopted<br />

as a front-line laboratory tool, promising to<br />

revolutionize our ability to perform advanced<br />

pathogen characterization in support of enhanced<br />

public health surveillance and epidemiologic<br />

investigations. In an effort to overcome<br />

the significant barrier that exists for laboratories<br />

without the necessary bioinformatics infrastructure<br />

we have developed the Salmonella In<br />

Silico Typing Resource (SISTR), an open webaccessible<br />

bioinformatics platform for rapidly<br />

analyzing draft Salmonella genome sequence<br />

data. In addition to serovar prediction by<br />

genoserotyping, Multi-Locus Sequence Typing<br />

(MLST) and ribosomal MLST (rMLST), the<br />

SISTR platform incorporates a novel core genome<br />

MLST (cgMLST) scheme that we have<br />

developed, the first such scheme described for<br />

Salmonella. The platform incorporates metadata-driven<br />

visualizations to examine the phylogenetic,<br />

geospatial and temporal distribution of<br />

genome-sequenced isolates. The resource also<br />

incorporates a database comprising over 4,000<br />

publicly available genomes, allowing users to<br />

place their isolates in a broader phylogenetic<br />

and epidemiological context. We have used a<br />

dataset comprised of 4,188 finished genomes<br />

and WGS draft assemblies to examine correlation<br />

between an isolate’s cgMLST cluster<br />

and its serovar and show how phylogenetic<br />

context from cgMLST analysis can supplement<br />

the genoserotyping analysis to increase<br />

the accuracy of in silico serovar prediction<br />

to over 94.6%. As sequencing of Salmonella<br />

isolates at public health laboratories becomes<br />

increasingly common, rapid in silico analysis<br />

of minimally processed draft genome assemblies<br />

provides a powerful replacement for<br />

current methods of isolate characterization.<br />

The SISTR platform can be used to generate<br />

the requisite serovar information and subtyping<br />

data currently used by epidemiologists and<br />

public health practitioners, while also providing<br />

some of the genome-based analyses that<br />

are the primary motivation for a move towards<br />

WGS. This type of integrated analysis allows<br />

for continuity with historical serotyping and<br />

subtyping data as we transition towards the<br />

increasing adoption of genomic epidemiology<br />

in public health. The SISTR web-application is<br />

accessible online at https://lfz.corefacility.ca/<br />

sistr-app/.<br />

n S2:5<br />

REVOLUTIONISING PUBLIC HEALTH<br />

REFERENCE MICROBIOLOGY USING WHOLE<br />

GENOME SEQUENCING: A CASE STUDY<br />

WITH SALMONELLA<br />

P. Ashton, S. Nair, T. Peters, A. Waldram, E. de<br />

Pinna, R. Elson, K. Grant, T. Dallman;<br />

Public Health England, London, UNITED<br />

KINGDOM.<br />

Background: Salmonella is a major human<br />

pathogen and a global public health issue.<br />

Currently, presumptive Salmonella isolates<br />

received by Public Health England (PHE) are<br />

typed by slide agglutination against specific<br />

anti-sera to determine their serotype based<br />

on their lipopolysaccharide and flagella. The<br />

microbiological typing data generated in<br />

the laboratory is fed into the public health<br />

workflow at PHE that involves the use of a<br />

sophisticated statistical algorithm for detection<br />

of pathogen outbreaks. A more discriminatory<br />

typing method would lead to more accurate<br />

and focussed public health investigation. Here,<br />

we address the question of whether whole<br />

genome sequencing (WGS) can improve on<br />

results obtained via this current workflow.<br />

Methods: Public Health England have adopted<br />

WGS for the routine identification and<br />

epidemiological investigation of Salmonella.<br />

The sequence type (ST) of each isolate is determined<br />

via a reference mapping approach.<br />

For the more common STs, the whole genome<br />

is analysed for Single Nucleotide Polymorphisms<br />

that are used to construct phylogenetic<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

17


Oral Presentation <strong>Abstracts</strong><br />

trees, giving the highest possible resolution.<br />

To date, more than 11000 isolates have been<br />

analysed in this fashion. Findings: There are<br />

three primary results to report. Firstly, as there<br />

is a link between underlying genetics and serotype,<br />

the serotype can be determined from<br />

the sequence type with 98.5% concordance.<br />

This allows for backwards compatibility with<br />

existing outbreak detection methods. Secondly,<br />

clusters of genetically related cases can be<br />

identified; between 01/04/2014 and 11/08/2014<br />

there were seven clusters of more than 10 Salmonella<br />

Enteritidis isolates within a 10 SNP<br />

cluster. Only one of these was identified as an<br />

outbreak by traditional means. These clusters<br />

were retrospectively investigated for common<br />

exposures. Finally, a WGS analysis of clusters<br />

identified based on traditional typing will be<br />

presented. Interpretation: Although WGS is<br />

being embraced and adopted by PHE, it is not<br />

being treated as a panacea for public health<br />

microbiology, but rather as a paradigm shift in<br />

microbiological typing which has the potential<br />

to transform, rather than replace, traditional<br />

epidemiological investigation.<br />

n S3:3<br />

BENCHMARK DATASETS FOR VALIDATING<br />

FOODBORNE OUTBREAK INVESTIGATIONS:<br />

INTEGRATING WGS AND PHYLOGENOMIC<br />

ANALYSES<br />

R. E. Timme 1 , H. Rand 1 , E. Trees 2 , R. Agarwala<br />

3 , S. David 1 , M. Shumway 3 , M. Simmons 4 ,<br />

G. Tillman 4 , P. Bronstein 4 , H. Carleton 2 , S.<br />

Defibaugh-Chavez 4 , W. Klimke 3 , L. S. Katz 2 ;<br />

1<br />

US Food and Drug Administration, College<br />

Park, MD, 2 Center for Disease Control, Decatur,<br />

GA, 3 National Center for Biotechnology<br />

Information, Bethesda, MD, 4 USDA-FSIS,<br />

Athens, GA.<br />

Background: As US regulatory agencies begin<br />

to rely on whole genome sequencing (WGS)<br />

data and analyses for foodborne pathogen<br />

surveillance, and public health applications for<br />

WGS expand, a critical need has emerged for<br />

benchmark datasets of well-curated outbreaks.<br />

Such datasets can serve as validation standards<br />

for training purposes, ensuring analytical consistency<br />

among participating labs, and allowing<br />

researchers to compare parameter choices<br />

and understand the inherent biases of different<br />

methods and workflows. Having analytical<br />

consistency and the flexibility to use a variety<br />

of methods will give epidemiologists and regulators<br />

the best possible support for their efforts.<br />

What we have done: We identified one retrospective<br />

outbreak dataset for each of the most<br />

common foodborne pathogens (Salmonella,<br />

Listeria, and E.coli) that met criteria for wellvetted<br />

epidemiological data, sequence quality,<br />

and completeness of metadata. Chosen outbreaks<br />

represented the size (8-30 isolates) and<br />

degrees of sequence divergence seen in recent<br />

years. The isolates from which these genomes<br />

came are vouchered in culture collections at<br />

the FDA and CDC, allowing further validation<br />

of identified variants. Our standard format<br />

provides BioSample, SRA, and GenBank<br />

accession numbers, a classification of the isolates<br />

as in or out of the outbreak, a suggested<br />

reference genome, and a phylogenetic tree in<br />

Newick format. To ensure that labs can use the<br />

same materials for validating current or future<br />

analysis tools, we created detailed instructions<br />

to exactly replicate these datasets. Results: Using<br />

our detailed instructions, we downloaded<br />

each standardized dataset on a wide variety<br />

of Linux platforms (CentOS6, Ubuntu 14.4,<br />

Mac OS, etc) and verified that these downloads<br />

were both correct (using sha256sum values)<br />

and compatible with many current analysis<br />

methods for SNP finding and reconstructing<br />

phylogenies (CFSAN SNP Pipeline, Lyve-<br />

SET, BioNumerics, wgMLST, Wombac, kSNP,<br />

and SNVPhyl). We then created a strategy<br />

for comparing the results of different analysis<br />

methods. Conclusion: Government, university,<br />

and industry labs now have a shared baseline<br />

against which we can assess new methods,<br />

enabling researchers to work together more<br />

effectively and with greater confidence in each<br />

other’s results. The process of building this<br />

first benchmark dataset has helped four US<br />

18<br />

ASM Conferences


Oral Presentation <strong>Abstracts</strong><br />

government agencies (CDC, FDA, USDA,<br />

NCBI) work together and refine their methodologies<br />

together. Additional curated datasets<br />

can and should be incorporated into this schema:<br />

phylogenetic data are context-dependent:<br />

as more sequence data accumulate, we will<br />

need to revise our analyses. In this way, our<br />

work can ensure a sound footing for WGSbased<br />

regulatory decision-making.<br />

n S3:4<br />

INTEGRATING CORE GENOME<br />

PHYLOGENETIC RELATIONSHIPS AND<br />

ISOLATE GEOGRAPHIC DATA TO TRACE THE<br />

2012 NEISSERIA MENINGITIDIS OUTBREAK<br />

IN NEW YORK CITY<br />

M. R. Galac 1 , I. Ezeoke 2 , M. D. Krepps 3 , Y.<br />

Lin 2 , J. Kornblum 2 , H. S. Gibbons 3 , D. Weiss 2 ,<br />

D. A. Janies 1 ;<br />

1<br />

University of North Carolina at Charlotte,<br />

Charlotte, NC, 2 New York City Department<br />

of Health and Mental Hygiene, Queens, NY,<br />

3<br />

Edgewood Chemical and Biological Center,<br />

Aberdeen, MD.<br />

Between 2012 and 2014, a serogroup C Neisseria<br />

meningitidis (Nm) outbreak was detected<br />

among men who have sex with men (MSM)<br />

in New York City (NYC). To determine if<br />

geographic paths of isolates from this outbreak<br />

were distinct, we compared them to previous<br />

and concurrent isolates of Nm from NYC. We<br />

sequenced 102 isolates (79 serogroup C) covering<br />

an 11 year period (2003-2013). Bacterial<br />

whole genomes were sequenced with Illumina<br />

at ~250x coverage, assembled de novo, and<br />

annotated by xBASE. OrthoMCL identified<br />

orthologous clusters in our 102 genomes plus<br />

14 complete Nm genomes from GenBank.<br />

Sequences from clusters containing a gene<br />

from every isolate were individually aligned<br />

with MAFFT and concatenated to produce<br />

one core genome. Based on this alignment, we<br />

employed RAxML to infer a phylogenetic tree<br />

using maximum likelihood optimality criterion<br />

which was then integrated with geocoded patient<br />

addresses. We then focused on 73 NYC<br />

genomes with geographic data that were not<br />

multiple samples from the same patient. To<br />

build a network of transmission events, we<br />

used PAUP* to calculate the ancestor descendent<br />

changes in geographic location on the tree<br />

and to count the changes and directionality of<br />

isolation location at ancestral nodes. This created<br />

a network of bacterial movement. At the<br />

NYC borough resolution, Brooklyn is integral<br />

to the movement of all Nm city-wide over the<br />

11 years examined. We then subdivided the<br />

boroughs into neighborhoods. We found that<br />

almost all transmission events were unique<br />

movements between neighborhoods and did<br />

not represent repeated or reciprocal movement<br />

of the bacteria from one neighborhood to another.<br />

The most closely related non-outbreak<br />

isolate to the monophyletic MSM outbreak<br />

isolates originated in Brooklyn-neighborhood-<br />

A then spread to Manhattan-neighborhood-B.<br />

From Manhattan-neighborhood-B, the infection<br />

moved to different neighborhoods in Manhattan,<br />

Brooklyn and the Bronx. This included<br />

Manhattan-neighborhood-C from which 6<br />

isolates with nearly identical core genomes<br />

radiated outward to additional neighborhoods<br />

in 4 boroughs. GeoGenes was used which assigns<br />

higher network betweenness to locations<br />

that are repeatedly the shortest path between<br />

all possible pairs of locations. We found that<br />

Manhattan-neighborhood-B and C were hubs<br />

for the spread of the MSM outbreak. In summary,<br />

whole genome sequencing, phylogenetic,<br />

and betweenness analyses allowed us<br />

to determine that the outbreak was the result<br />

of a single highly similar group of Nm and<br />

enabled us to track the spread of the bacteria<br />

throughout NYC. These analyses could be<br />

an integral component of the public health<br />

response against meningococcal transmission.<br />

The results can further elucidate the relation<br />

between cases and the phylogeographic related<br />

networks, allowing for more targeted and efficient<br />

interventions.<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

19


Oral Presentation <strong>Abstracts</strong><br />

n S3:5<br />

WHOLE GENOME SEQUENCING PROVIDES<br />

RAPID TRACEBACK OF CLINICAL TO<br />

FOOD SOURCES DURING A FOODBORNE<br />

OUTBREAK OF SALMONELLOSIS<br />

M. Hoffmann 1 , Y. Luo 1 , S. R. Monday 1 , T.<br />

Muruvanda 1 , D. Janies 2 , I. Senturk 3 , U. V.<br />

Catalyurek 3 , W. J. Wolfgang 4 , R. Myers 5 , P. S.<br />

Evans 1 , J. Meng 6 , M. W. Allard 1 , E. W. Brown 1 ;<br />

1<br />

US Food and Drug Administration, College<br />

Park, MD, 2 University of North Carolina,<br />

Charlotte, NC, 3 Ohio State University, Columbus,<br />

OH, 4 New York State Department of<br />

Health, Albany, NY, 5 Department of Health and<br />

Mental Hygiene, Baltimore, MD, 6 University of<br />

Maryland, College Park, MD.<br />

Salmonella serovar Bareilly is responsible for<br />

numerous outbreaks among humans. In 2012<br />

a widespread foodborne outbreak associated<br />

with scraped tuna imported from India occurred<br />

in the United States. A comparative<br />

genomic analysis within the serovar was<br />

performed to explore, on a global scale, how<br />

effectively whole-genome sequencing (WGS)<br />

can differentiate outbreak isolates of S. Bareilly<br />

from non-outbreak isolates sharing the<br />

same Xbal PFGE pattern. We sequenced, on<br />

different platforms, 100 S. Bareilly isolates<br />

including 41 isolates from the 2012 outbreak.<br />

A single isolate was sequenced on the Pacific<br />

Biosciences RS II Sequencer to determine the<br />

first complete genome sequence of S. Bareilly<br />

that served as the reference genome. Subsequent<br />

raw reads were mapped to this reference<br />

genome to build a single-nucleotide polymorphism<br />

(SNP) matrix and construct a phylogenetic<br />

tree. Pathogen genomes were linked<br />

with geography by projecting the phylogeny<br />

on a virtual globe. Using the phylogenetic tree<br />

and the pathogen metadata a transmission network<br />

was reconstructed for S. Bareilly. Using<br />

SNP analyses, we were able to distinguish<br />

and separate highly clonal S. Bareilly strains<br />

sharing the same XbaI PFGE pattern. The isolates<br />

from the recent 2012 outbreak clustered<br />

together, sharing only a few SNPs differences<br />

between them. Our results revealed a common<br />

origin for the outbreak strains, indicating that<br />

the patients in the U.S. were infected from<br />

sources originating at the India facility. In addition,<br />

we identified a unique arsenic resistance<br />

operon carried by many of these strains. Our<br />

data strongly suggests that WGS, combined<br />

with geographic mapping and the novel use of<br />

transmission networks for genetic data, vastly<br />

improves the rapid source tracking and surveillance<br />

of a bacterial outbreak that are critically<br />

important for characterizing outbreak dynamics<br />

and, ultimately, protection of the public<br />

health.<br />

n S3:6<br />

TRANSFORMING PUBLIC HEALTH<br />

MICROBIOLOGY IN THE UNITED STATES<br />

WITH WHOLE GENOME SEQUENCING (WGS)<br />

- PULSENET AND BEYOND<br />

E. K. Trees, H. Carleton, P. Gerner-Smidt;<br />

CDC, Atlanta, GA.<br />

Background: A number of different methods<br />

and technologies are used in public health<br />

laboratories to characterize foodborne bacterial<br />

pathogens ranging from phenotypic tests<br />

e.g., growth characteristics, serotyping and<br />

antimicrobial susceptibility testing, to PCR<br />

and other molecular methods for detection<br />

of e.g., virulence genes, and for subtyping in<br />

outbreak investigations e.g., pulsed-field gel<br />

electrophoresis (PFGE). This traditional strain<br />

characterization is labor and resource intensive<br />

and has a turn-around-time (TAT) of up to<br />

several months. Most of this information may<br />

be extracted from the genome sequence of any<br />

organism. With the introduction of next generation<br />

sequencing technologies and advances in<br />

bioinformatics it is now possible to characterize<br />

foodborne pathogens in a single workflow<br />

with a TAT of ≤ four working days in a costefficient<br />

manner. Methods: The Enteric Diseases<br />

Laboratory Branch at CDC is working<br />

with partners in federal agencies, public health<br />

20<br />

ASM Conferences


Oral Presentation <strong>Abstracts</strong><br />

laboratories in the states and internationally to<br />

build applications to extract information about<br />

genus, species, lineage, serotype, pathotype,<br />

virulence profile and antimicrobial resistance<br />

markers and subtyping using gene by gene approaches<br />

(whole genome multi-locus sequence<br />

typing, wgMLST) from the genome sequences.<br />

The PulseNet foodborne disease surveillance<br />

network infrastructure and analytical platform<br />

(BioNumerics, Applied Maths) and database<br />

will be used because of its versatility and<br />

because the public health laboratories have<br />

worked with it for > 15 years. Additionally,<br />

the use of this analytic software does not require<br />

extensive bioinformatics skills or local<br />

high performance computing capacity. The<br />

method will also be validated and will replace<br />

conventional CLIA related reference identification<br />

activities in the branch and public health<br />

laboratories. Results: WGS has successfully<br />

been tested for the US national surveillance<br />

of listeriosis since September 2013 and the<br />

wgMLST analysis has been implemented in<br />

the daily routine. The technology has proven<br />

its power supplementing traditional methods in<br />

outbreak investigations by adding phylogenetic<br />

relevance and increased resolution compared<br />

with current methods thereby enhancing case<br />

definitions and source tracking. The wgMLST<br />

method is currently being piloted in-house<br />

for surveillance of Campylobacteraceae,<br />

Shiga toxin-producing E. coli, and Salmonella<br />

with the remainder of foodborne bacteria to<br />

be added over the next three years. A betatesting<br />

pilot with the PulseNet public health<br />

laboratory partners will start in summer 2015.<br />

Conclusion: WGS will revolutionize the surveillance<br />

of both sporadic and outbreak related<br />

foodborne infections by replacing and enhancing<br />

current laboratory methods in use in public<br />

health laboratories.<br />

n S4:4<br />

SALMONELLA SEROTYPE DETERMINATION<br />

UTILIZING HIGH-THROUGHPUT GENOME<br />

SEQUENCING DATA<br />

S. Zhang 1 , Y. Yin 2 , M. Jones 3 , Z. Zhang 4 , B.<br />

Deatherage Kaiser 5 , B. Dinsmore 6 , C. Fitzgerald<br />

6 , P. Fields 6 , X. Deng 1 ;<br />

1<br />

University of Georgia, Griffin, GA, 2 Illinois<br />

Institute of Technology, Chicago, IL, 3 J. Craig<br />

Venter Institute, Rockville, MD, 4 University of<br />

Michigan, Ann Arbor, MI, 5 Pacific Northwest<br />

National Laboratory, Richard, WA, 6 Centers<br />

for Disease Control and Prevention, Atlanta,<br />

GA.<br />

Serotyping forms the basis of national and<br />

international surveillance networks for Salmonella,<br />

one of the most prevalent foodborne<br />

pathogens worldwide. Public health microbiology<br />

is currently being transformed by whole<br />

genome sequencing (WGS) which opens the<br />

door to serotype determination using WGS<br />

data. SeqSero (www.denglab.info/SeqSero)<br />

is a novel web-based tool for determining<br />

Salmonella serotypes using high-throughput<br />

genome sequencing data. SeqSero is based<br />

on curated databases of Salmonella serotype<br />

determinants (rfb gene cluster, fliC and fljB<br />

alleles) and is predicted to determine serotype<br />

rapidly and accurately for nearly the full spectrum<br />

of Salmonella serotypes (more than 2,300<br />

serotypes), from both raw sequencing reads<br />

and genome assemblies. The performance of<br />

SeqSero was evaluated by testing: 1) raw reads<br />

from genomes of 308 Salmonella isolates of<br />

known serotype; 2) raw reads from genomes<br />

of 3,306 Salmonella isolates sequenced and<br />

made publicly available by GenomeTrakr, a<br />

U.S. national monitoring network operated<br />

by the Food and Drug Administration; and 3)<br />

354 other publicly available draft or complete<br />

Salmonella genomes. We also demonstrated<br />

Salmonella serotype determination from raw<br />

sequencing reads of fecal metagenomes from<br />

mice orally infected with this pathogen. Seq-<br />

Sero can help to maintain the well-established<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

21


Oral Presentation <strong>Abstracts</strong><br />

utility of Salmonella serotyping when integrated<br />

into a platform of WGS-based pathogen<br />

subtyping and characterization.<br />

n S4:5<br />

PATRIC PIPELINE<br />

F. Xia 1 , T. Brettin 2 , S. Boisvert 2 , N. R. Conrad 2 ,<br />

J. J. Davis 1 , T. Disz 1 , J. Edirisinghe 2 , R. A. Edwards<br />

3 , C. Henry 1 , R. W. Kenyon 4 , D. Machi 4 ,<br />

C. Mao 4 , G. J. Olsen 5 , R. Olson 2 , R. Overbeek 6 ,<br />

B. Parrello 6 , G. D. Pusch 6 , M. P. Shukla 2 , B. W.<br />

Sobral 4 , R. L. Stevens 1 , V. Vonstein 6 , A. Warren<br />

4 , R. Will 4 , H. Yoo 4 , A. R. Wattam 4 ;<br />

1<br />

University of Chicago, Chicago, IL, 2 Argonne<br />

National Laboratory, Lemont, IL, 3 San Diego<br />

State University, San Diego, CA, 4 Virginia<br />

Tech, Blacksburg, VA, 5 University of Illinois<br />

at Urbana and Champaign, Urbana, IL, 6 Fellowship<br />

for Interpretation of Genomes, Burr<br />

Ridge, IL.<br />

Recent advances in DNA sequencing technology<br />

accompanied by plummeting per-base cost<br />

is making sequence-based applications more<br />

amenable. While a plethora of bioinformatics<br />

databases and workflows exist, their capabilities<br />

are often hampered by the inconsistent<br />

use of analysis tools. PATRIC, the NIAIDfunded<br />

comprehensive bacterial bioinformatics<br />

resource, has integrated more than 30,000<br />

consistently annotated prokaryote genomes<br />

with a focus on human pathogenic species.<br />

Here we present PATRIC’s new computational<br />

services that support the assembly, annotation<br />

and metabolic modeling of user-supplied<br />

genomes in the same consistent fashion. These<br />

services, integrated with PATRIC’s collections<br />

of specialty genes such as antibiotic resistance<br />

determinants and virulence factors, will enable<br />

users to rapidly process newly sequenced<br />

pathogens and investigate key pathogenic<br />

determinants in foodborne outbreaks using the<br />

powerful visualization and comparative analysis<br />

tools in PATRIC. We have implemented<br />

the new services with three principles in mind.<br />

(1) Controlled vocabulary. At the heart of<br />

PATRIC’s annotation service is a controlled<br />

vocabulary for functional annotation derived<br />

from the curated subsystems and protein families<br />

in the RAST and SEED systems [2]. Similarly,<br />

the new model reconstruction service<br />

relies on our curated biochemistry data [5].<br />

These curation efforts ensure newly sequenced<br />

genomes can be automatically annotated and<br />

modeled and readily compared with existing<br />

reference data. (2) Modular design. In genome<br />

assembly as well as other bioinformatic analyses,<br />

there is often no single tool best suited<br />

for all occasions [4]. We have added support<br />

for more than 30 tools for error correction,<br />

contig assembly, scaffolding, contig evaluation,<br />

consensus building, gene calling, overlap<br />

removal, as well as many custom algorithms<br />

[3]. These modules are condensed into a few<br />

curated workflows to ensure convenient and<br />

efficient execution as well as consistent quality<br />

control. (3) Integrated analysis. The new<br />

workspace allows users to upload their own<br />

data for analysis, and upon completion the<br />

private results are immediately integrated into<br />

PATRIC. This enables users to take advantage<br />

of PATRIC’s data (drug targets, omics, AMR<br />

and other clinical metadata) and comparative<br />

tools (protein family sorter, phylogeny, heat<br />

maps, etc). In addition to these services, we<br />

are actively building support for batch analysis<br />

and SNP-level comparative analysis for closely<br />

related genomes. URL: https://www.patricbrc.<br />

org. References: [1] Gillespie, et al. “PATRIC:<br />

... (2011). [2] Overbeek, et al. “The SEED<br />

... (RAST).” Nucleic acids research (2014):<br />

D206-D214. [3] Brettin, et al. “RASTtk ...”<br />

Scientific reports 5 (2015). [4] Earl, et al. “Assemblathon<br />

...” Genome research (2011). [5]<br />

Henry, et al. “High-throughput ... models.”<br />

Nature biotechnology (2010).<br />

22<br />

ASM Conferences


Oral Presentation <strong>Abstracts</strong><br />

n S4:6<br />

CFSAN SNP PIPELINE: A WHOLE GENOME<br />

SEQUENCE DATA ANALYSIS PIPELINE FOR<br />

FOOD-BORNE PATHOGENS<br />

Y. Luo, J. Pettengill, J. Baugher, H. Rand, S.<br />

Davis;<br />

FDA/CFSAN, College Park, MD.<br />

In support of the analysis of whole genome sequence<br />

data (WGS) for closely related pathogens<br />

in food-borne outbreaks, the Center for<br />

Food Safety and Applied Nutrition (CFSAN)<br />

at the FDA has developed a reference-based<br />

software pipeline for high quality SNP identification<br />

and analysis. This software pipeline<br />

combines into a single package the mapping of<br />

WGS reads to a reference genome, processing<br />

of those mapping files, identification of variant<br />

sites, and production of a SNP matrix. Additional<br />

features include a summary table of the<br />

results, soft-links to minimize data storage, and<br />

the ability to switch between workstations and<br />

computer clusters with minimal effort. The CF-<br />

SAN SNP Pipeline is currently used in production<br />

mode to analyze WGS data from isolates<br />

related to food-borne illnesses. The pipeline is<br />

used when outbreak investigations are ongoing<br />

to link samples and to provide information for<br />

decision-makers. It is also used retrospectively<br />

to aid in the analysis of closed outbreaks. The<br />

CFSAN SNP Pipeline is reference-based,<br />

and so a reference must be provided. Isolate<br />

sequence data must be in fastq format but can<br />

either be paired-end or single-read data. All<br />

analysis steps are run automatically, and only<br />

depend on the proper organization of the input<br />

files and identification of a suitable reference.<br />

Additionally, each of the analysis steps can be<br />

run using individual shell scripts. The addition<br />

of new samples is very straightforward, and result<br />

files from previous portions of the analysis<br />

that do not need to be regenerated are reused.<br />

This greatly reduces the computational time<br />

when adding new samples as the mapping and<br />

pileup steps are not redone. The pipeline will<br />

run without problems on current workstations,<br />

and will run on high performance computing<br />

clusters with either Torque or Grid Engine job<br />

schedulers. The CFSAN SNP Pipeline is written<br />

in a combination of Bash and Python. The<br />

code is designed to run on Linux platforms<br />

with bash and python. BioPython must be<br />

installed in tandem with three executable software<br />

dependencies, Bowtie2, SAMtools, and<br />

VarScan. Substantial effort has been devoted to<br />

making the software robust, well-documented,<br />

and easy to use. The following links provide<br />

for access to the source code, the documentation,<br />

and the Python package. Also provided is<br />

the current publication reference. Source code:<br />

https://github.com/CFSAN-Biostatistics/snppipeline.<br />

Documentation: http://snp-pipeline.<br />

rtfd.org. PyPI package: https://pypi.python.<br />

org/pypi/snp-pipeline. Reference publication:<br />

Pettengill JB, Luo Y, Davis S, Chen Y, Gonzalez-Escalona<br />

N, Ottesen A, Rand H, Allard<br />

MW, Strain​ E An evaluation of alternative<br />

methods for constructing phylogenies from<br />

whole genome sequence data: A case study<br />

with Salmonella.<br />

n S4:7<br />

ASSEMBLING WHOLE GENOMES FROM<br />

MIXED MICROBIAL COMMUNITIES USING<br />

HI-C<br />

I. Liachko 1 , J. N. Burton 1 , L. Sycuro 2 , A. H.<br />

Wiser 2 , D. N. Fredricks 2 , M. J. Dunham 1 , J.<br />

Shendure 1 ;<br />

1<br />

University of Washington, Seattle, WA, 2 Fred<br />

Hutchinson Cancer Research Center, Seattle,<br />

WA.<br />

Assembly of whole genomes from next-generation<br />

sequencing is inhibited by the lack of<br />

contiguity information in short-read sequencing.<br />

This limitation also impedes metagenome<br />

assembly, since one cannot tell which sequences<br />

originate from the same species within<br />

a population. We have overcome these bottlenecks<br />

by adapting a chromosome conformation<br />

capture technique (Hi-C) for the deconvolution<br />

of metagenomes and the scaffolding of de novo<br />

assemblies of individual genomes. In modeling<br />

the 3D structure of a genome, chromosome<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

23


Oral Presentation <strong>Abstracts</strong><br />

conformation capture techniques such as Hi-C<br />

are used to measure long-range interactions of<br />

DNA molecules in physical space. These tools<br />

employ crosslinking of chromatin in intact<br />

cells followed by intra-molecular ligation,<br />

joining DNA fragments that were physically<br />

nearby at the time of crosslink. Subsequent<br />

deep sequencing of these DNA junctions generates<br />

a genome-wide contact probability map<br />

that allows the 3D modeling of genomic conformation<br />

within a cell. The strong enrichment<br />

in Hi-C signal between genetically neighboring<br />

loci allows the scaffolding of entire chromosomes<br />

from fragmented draft assemblies.<br />

Hi-C signal also preserves the cellular origin<br />

of each DNA fragment and its interacting partner,<br />

allowing for deconvolution and assembly<br />

of multi-chromosome genomes from a mixed<br />

population of organisms. We have used Hi-C<br />

to scaffold whole genomes of animals, plants,<br />

fungi, as well as prokaryotes and archaea. We<br />

have also been able to use this data to annotate<br />

functional features of microbial genomes, such<br />

as centromeres in many fungal species. Additionally,<br />

we have applied our technology to<br />

diverse metagenomic populations such as craft<br />

beer, bacterial vaginosis infections, soil, and<br />

tree endophyte samples to discover and assemble<br />

the genomes of novel strains of known<br />

species as well as novel prokaryotes and<br />

eukaryotes. The high quality of Hi-C-based<br />

assemblies allows the simultaneous closing of<br />

numerous unculturable genomes, placement of<br />

plasmids within host genomes, and microbial<br />

strain deconvolution in a way not possible<br />

with other methods. Reference: Burton JN*,<br />

Liachko I*, Dunham MJ, Shendure J. Specieslevel<br />

deconvolution of metagenome assemblies<br />

with Hi-C-based contact probability maps. G3.<br />

2014, May 22;4(7):1339-46.<br />

n S6:3<br />

NEXT GENERATION SEQUENCING OF<br />

BRUCELLA MELITENSIS ISOLATES FROM<br />

KUWAIT AND COMPARATIVE GENOME<br />

ANALYSES<br />

A. S. Mustafa, F. Shaheed, N. Habibi, M. W.<br />

Khan;<br />

Kuwait University, Jabriya, KUWAIT.<br />

Human brucellosis is a zoonotic disease of<br />

worldwide occurrence. In Kuwait, almost all<br />

cases are caused by Brucella melitensis. Three<br />

different strains (biovars) of B. melitensis have<br />

been identified using classical techniques but<br />

the presence of very limited variation across<br />

strains makes it difficult to identify a particular<br />

strain using the classical molecular methods,<br />

e.g. PCR and Sanger sequencing. The aim of<br />

this study was to exploit the potential of next<br />

generation sequencing to identify the strain(s)<br />

of B. melitensis present in Kuwait, and also<br />

to find the extent of differences among the<br />

isolates of the same strain by comparative<br />

genome analyses. B. melitensis were isolated<br />

from 15 patients suspected of human brucellosis.<br />

The bacterial colonies from culture<br />

plates were suspended in saline and heated at<br />

95°C for 10 minutes. DNA released from the<br />

bacterial cells were purified using the QIAamp<br />

DNA Mini Kit (Qiagen). The isolated DNA<br />

were quantitated and checked for purity using<br />

a spectrophotometer (Epoch) and a fluorometer<br />

(Qubit), respectively. DNA libraries were<br />

prepared using the Nextera XT DNA Sample<br />

Preparation Kit (Illumina) and sequenced using<br />

the next generation sequence platform of<br />

MiSeq (Illumina) using standard procedures.<br />

The obtained sequence files were aligned to<br />

the sequences of three known biovars of B.<br />

melitensis available in the NCBI data base,<br />

i.e. biovar 1 str. 16M, biovar 2 str. 63/9, and<br />

biovar 3 str. Ether. The alignment and variant<br />

calling were performed using ‘bwa mem’<br />

and SAMtools/VCFtools, respectively. The<br />

results showed that the genome size of all the<br />

isolates was around 3.3 mega base pairs, and<br />

24<br />

ASM Conferences


Oral Presentation <strong>Abstracts</strong><br />

all of them belonged to B. melitensis biovar<br />

2 str. 63/9. A neighbor-joining tree analysis<br />

identified one of the isolates as an outlier. Furthermore,<br />

variations (SNPs and indels) were<br />

spread all over the genome; but 138 SNPs<br />

were common among the 14 isolates, supporting<br />

the same ancestral origin. In addition,<br />

SNPs (2 - 478) unique to each isolate were<br />

also identified, which divided the B. melitensis<br />

biovar 2 into two major variant groups. In<br />

conclusion, this study suggest that biovar 2 is<br />

the most prevalent biovar of B. melitensis in<br />

Kuwait. Furthermore, at least two major variant<br />

groups exist within biovar 2. Supported<br />

by Kuwait University Research Sector grant<br />

SRUL02/13.<br />

n S6:4<br />

MICROBIAL GENOMIC TAXONOMY AT<br />

GENBANK<br />

S. Federhen;<br />

NCBI, Bethesda, MD.<br />

Incorrectly identified genomes at GenBank<br />

are a problem for users of the data. Some<br />

genomes are submitted with incorrect species<br />

identifications. Others were correctly identified<br />

when they were submitted but should now<br />

be updated based on a subsequent taxonomic<br />

publication, for example the description of a<br />

new species. GenBank has traditionally relied<br />

on the submitters to provide the correct<br />

taxonomic identifications for their sequence<br />

submissions. Two developments have combined<br />

to change this situation in the domain<br />

of microbial genomes. First, the curation of<br />

type material in the NCBI taxonomy database<br />

allows us to flag sequences from type in the<br />

nucleotide and genome domains of Entrez.<br />

Second, current sequencing technology makes<br />

it fast and easy to generate microbial genomes.<br />

It has been clear for some time that the current<br />

paradigm of species delimitation by 16S rRNA<br />

sequence and DNA-DNA hybridization (DDH)<br />

would eventually be replaced with a model<br />

based on whole genome analysis. We present<br />

a proposal to find and correct misidentified<br />

genomes based on average nucleotide identity<br />

(ANI) from type and proxytype. Sequences<br />

from type are reliably identified (by definition)<br />

once we have verified that they are free from<br />

contamination and are actually from the strain<br />

with which they are annotated. All other identifications<br />

are a matter of opinion, and will be<br />

subject to verification. We have genomes from<br />

type (both finished and WGS) for 4000 species,<br />

including 3500 bacteria. This represents<br />

25% of bacterial species with validly published<br />

names. The other 75% of bacterial species<br />

will generally have an assortment of short sequences<br />

from type in GenBank - at least a 16S<br />

sequence, but often more. These sequences are<br />

used to probe our existing genomes and predict<br />

where the genome from type will appear once<br />

we do get one. In many cases we can designate<br />

a proxy for the missing type from among<br />

the genomes that we do have - we call these<br />

‘proxytype’ genomes. Taken together, these<br />

genomes from type and proxytype represent a<br />

scaffold of reliably identified sequences that<br />

we can use in conjunction with some simple<br />

genome-wide comparison measures to validate<br />

the identifications in our other genomes.<br />

Once we have identified genomes that need<br />

taxonomic updates, we plan to correct the entries,<br />

add a structured comment detailing the<br />

evidence for the update, and notify the submitters<br />

of the change. This represents a significant<br />

change in policy for GenBank - a new genomic<br />

paradigm for validating taxonomic identifications,<br />

some new types of analysis, as well as<br />

a shift in the boundary for database-driven<br />

source feature updates. We convened a workshop<br />

to present the proposal, with representation<br />

from a broad spectrum of the bacterial<br />

taxonomic community (GenBank genomic<br />

taxonomy workshop, 12-13 May 2015). This<br />

group unanimously endorsed our genomic approach<br />

to validating taxonomic identifications<br />

in genomes at GenBank.<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

25


Oral Presentation <strong>Abstracts</strong><br />

n S7:2<br />

A BIOSURVEILLANCE ANALYSIS PIPELINE<br />

FOR GENOMIC SEQUENCE DATA<br />

C. Olsen 1 , K. Qaadri 1 , H. Shearman 2 , R. Moir 2 ,<br />

M. Kearse 2 ;<br />

1<br />

Biomatters, Inc., Newark, NJ, 2 Biomatters,<br />

Ltd., Auckland, NEW ZEALAND.<br />

Next-generation sequencing (NGS) approaches<br />

have numerous applications for biosurveillance<br />

programs and outbreak investigation.<br />

However, there are significant challenges for<br />

analyzing the data accurately without the aid<br />

of high performance compute resources in a<br />

timely fashion. Often times the users are not<br />

bioinformaticians who are comfortable running<br />

sequence analysis pipelines. Biomatters’<br />

Geneious R9 is a bioinformatics software<br />

platform that allows researchers the use of<br />

industry-leading algorithms for their genomic<br />

and protein sequence analyses. Geneious offers<br />

a comprehensive suite of functions, including<br />

a robust collection of peer-reviewed tools,<br />

that enable researchers to be more efficient<br />

with their sequence analysis workflows. The<br />

recent addition of the 16S Biodiversity tool<br />

and Sequence Classifier plugin provides tools<br />

which can be incorporated into an easy to use<br />

pathogen identification workflow. The 16S<br />

Biodiversity tool identifies high-throughput<br />

16S rRNA amplicons from environmental<br />

samples using the RDP database, and visualizes<br />

biodiversity as an interactive chart using<br />

a secure web viewer. The Sequence Classifier<br />

plugin taxonomically classifies an organic<br />

sample by how similar its DNA is to your<br />

own database of known sequences using a<br />

BLAST-like algorithm with multiple loci and<br />

trees to assist with identification. By utilizing<br />

Geneious R9, biologists can easily streamline<br />

their sequence analysis workflows for mixed<br />

sample analysis. This demonstration is for a<br />

sequence analysis pipeline for identification of<br />

bacterial pathogens from mixed metagenomic<br />

data generated from outbreaks. The pipeline<br />

can also be extended to include eukaryotic and<br />

fungal pathogens.<br />

n S7:3<br />

A UNIVERSAL WHOLE GENOME<br />

SEQUENCING APPROACH USING WHOLE<br />

GENOME MLST AND WHOLE GENOME SNP<br />

ANALYSIS IN THE CLOUD<br />

H. Pouseele, K. De Bruyne, B. Pot, K. Janssens;<br />

Applied Maths NV, Sint-Martens-Latem, BEL-<br />

GIUM.<br />

Introduction: While whole genome sequencing<br />

(WGS) is becoming very attractive for<br />

research as well as for routine analyses, current<br />

challenge is to extract whole genome typing<br />

information relevant for e.g. outbreak surveillance<br />

without the need for cluster facilities or<br />

highly specialized staff. Methodology: Here<br />

we present a cloud-based, high throughput<br />

(HT) environment for WGS data processing.<br />

Raw sequences are processed with standardized<br />

and validated pipelines to produce WG<br />

multi-locus sequence typing (wgMLST) and<br />

WG single nucleotide polymorphism (wg-<br />

SNP) results. Whereas wgMLST is typically<br />

used to identify potential outbreak clusters,<br />

wgSNP analysis is considered useful for final<br />

subtyping. The wgMLST pipeline uses WGS<br />

data (assembled or not) to perform MLST on<br />

a genome-wide scale. For each sample, locus<br />

presence is analyzed and allelic variants are<br />

determined. Per locus, new sequences are<br />

submitted, curated and assigned new allele<br />

numbers. WgMLST provides the possibility to<br />

obtain traditional typing results such as MLST,<br />

rMLST, etc, yielding full compatibility with<br />

former typing schemes, without additional<br />

cost or time delay. A ‘pan-genome’ wgMLST<br />

scheme has the advantage over the core genome<br />

of maximizing resolution and leads to<br />

stability, as adding new samples does not influence<br />

existing information. The core schema is<br />

implemented as a subschema of the pan-genome<br />

and was shown to have a high epidemiological<br />

relevance and stability, necessary for<br />

long-term surveillance and outbreak investigation.<br />

For the wgSNP approach, a dual reference-based<br />

approach is used: reads are mapped<br />

26<br />

ASM Conferences


Oral Presentation <strong>Abstracts</strong><br />

on organism- and outbreak-specific references,<br />

yielding maximum resolution for detecting<br />

outbreak isolates. Once allele assignments and/<br />

or SNPs are calculated, traditional BioNumerics®<br />

analysis tools are used for phylogenetic,<br />

statistical and comparative follow-up analyses.<br />

Access to the Amazon cloud is integrated in<br />

BioNumerics®, allowing easy point-and-click<br />

analysis. Confidential metadata remain at all<br />

times at the local BioNumerics® database,<br />

while anonymous WGS data are processed<br />

in the cloud. Results: We will evaluate the<br />

wgMLST and wgSNP approaches for L. monocytogenes<br />

and Salmonella using the Amazon<br />

cloud solution. wgMLST combined with wg-<br />

SNP analysis identified reliably strains belonging<br />

to documented outbreaks. Public NCBI<br />

SRA data will provide additional sequences<br />

that add context to more precisely understand<br />

and position the outbreak. Conclusion: NGS<br />

combined with automated data analysis and<br />

interpretation tools holds great promise for<br />

rapid, accurate and comprehensive identification<br />

of outbreaks. BioNumerics® 7.6* together<br />

with its scalable HT calculation environment,<br />

offers a powerful, user-friendly, easy-access<br />

environment where both wgMLST and wgSNP<br />

analyses can be performed and interpreted,<br />

lowering the thresholds for the use of WGS in<br />

routine applications.<br />

n S7:4<br />

TYPING AND EPIDEMIOLOGICAL<br />

CLUSTERING OF COMMON PATHOGENS<br />

BASED ON WHOLE GENOME NGS DATA<br />

A. Materna, K. Einer-Jensen, P. Liboriussen,<br />

J. Johansen, L. Schauser, A. C. Materna;<br />

QIAGEN, Aarhus, DENMARK.<br />

Next generation sequencing (NGS) data from<br />

whole pathogen genomes is frequently used for<br />

enhanced surveillance and outbreak detection<br />

of common pathogens. Version 1.5 of the CLC<br />

Microbial Genomics Module for CLC Genomics<br />

Workbench and CLC Genomics Server<br />

introduces new functionality for molecular<br />

typing and epidemiological analysis of bacterial<br />

isolates. The latest update to the module<br />

enables the user to perform stepwise (tool-bytool)<br />

analysis or to take advantage of included<br />

multistep workflows. With a few clicks workflows<br />

can be optimized for routine analysis of<br />

a specific pathogen or outbreak. New features<br />

include for instance streamlined tools for NGSbased<br />

Multilocus Sequence Typing (MLST),<br />

resistance typing, as well as detection of genus<br />

and species information. New tools for phylogenetic<br />

tree reconstruction generate trees based<br />

on single nucleotide polymorphisms (SNPs) or<br />

infer K-mer trees from NGS reads or genomes.<br />

A new table format, acting as a database, collects<br />

typing results and associates these results<br />

with metadata such as sample information,<br />

geographic origin, treatment outcome, etc. Results<br />

generated using e.g. MLST and resistance<br />

typing can furthermore be associated with the<br />

original sample metadata. Users can filter for<br />

results and metadata to find and select relevant<br />

subsets of samples for downstream analysis.<br />

Results and metadata available during tree<br />

generation can further be used to explore phylogeny<br />

in the context of this epidemiologically<br />

relevant information. Version 1.5 of the CLC<br />

Microbial Genomics Module aims to facilitate<br />

tasks commonly carried out during outbreak<br />

investigation, such as typing or source tracking<br />

based on whole genome data. Preconfigured<br />

workflows and simple deployment via integration<br />

into the widely used CLC Genomics<br />

Workbench and CLC Genomics Server ecosystem<br />

offer a user-friendly platform for scientists<br />

engaged in outbreak prevention and control.<br />

n S7:5<br />

WGSA.NET: WHOLE GENOME SEQUENCE<br />

ANALYSIS<br />

D. M. Aanensen 1 , S. Argimon 2 , C. A. Yeats 1 , A.<br />

Fedosejev 1 , C. Glasner 2 , R. Goater 2 , D. Garcia<br />

2 , J. NT 2 ;<br />

1<br />

Imperial College London, London, UNITED<br />

KINGDOM, 2 Centre for Genomic Pathogen<br />

Surveillance, Wellcome Genome Campus,<br />

Cambridgeshire, UNITED KINGDOM.<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

27


Oral Presentation <strong>Abstracts</strong><br />

WGSA.net [1] provides an intuitive interface<br />

for the uploading, processing, clustering and<br />

visualization of microbial genomic assemblies.<br />

Uploaded data are run through a number of<br />

analysis modules, including MLST, detection<br />

of genes and variants for identifying<br />

potential antibiotic resistance and virulence,<br />

and profiling of gene family membership for<br />

the production of clustering and identification<br />

of strict core and non-core genes. Metadata<br />

included during upload includes required (eg<br />

location and date) and non-required datatypes.<br />

Once processed, assemblies are presented<br />

to users in, firstly, a population context and<br />

then secondly, through selection of specific<br />

sub-parts of a population, relatedness to other<br />

very closely related genomes allowing further<br />

investigation by a user (eg outbreak detection).<br />

All results are available to download allowing<br />

further investigation and utility. An intuitive<br />

user interface presenting inferred clustering<br />

(using PhyloCanvas), geographic location (using<br />

Google maps) and data tables, allows the<br />

overlay on clustering of both user metadata<br />

and also results of anlaysis modules, based on<br />

the visualization tool microreact.org [2]. Wgsa.<br />

net is available via the web and runs in any<br />

modern browser. [1] http://www.wgsa.net [2]<br />

http://microreact.org<br />

n S7:6<br />

SEQSPHERE+ SOFTWARE FOR<br />

PROSPECTIVE BACTERIAL GENOMIC<br />

SURVEILLANCE AND RESISTOME OR<br />

VIRULOME ANALYSIS<br />

J. Rothgänger;<br />

Ridom GmbH, Münster, GERMANY.<br />

SeqSphere+ was introduced in the year 2013<br />

(Nat Biotechnol. 31: 294, 2013) and supports<br />

genome-wide allele and single nucleotide<br />

variant (SNV) calling from whole genome<br />

sequence (WGS) data either on core genome<br />

and/or accessory genome level. However,<br />

the recommended (initial) analysis is a core<br />

genome MLST (cgMLST) allele typing as a<br />

global and uniform nomenclature service is<br />

maintained to ensure for a ‘molecular typing<br />

Esperanto’. A number of cgMLST schemes using<br />

the software have been published recently;<br />

e.g., for M. tuberculosis (JCM 52: 2479, 2014)<br />

or L. monocytogenes (JCM 54: Jul 1. pii:<br />

JCM.01193-15, 2015 [ahead of print]). These<br />

schemes are available for download within<br />

the software. In addition, users can define<br />

with the included cgMLST Target Definer on<br />

the fly own ‘ad hoc’ schemes. Furthermore,<br />

the software supports setup of resistome and<br />

virulome schemes. Place, time, ‘person’, and<br />

type dimensions can be visualized with built-in<br />

geographic information system (GIS), epicurve,<br />

coloring, and phylogenetic tree (among<br />

others the minimum spanning tree algorithm is<br />

supported) functionality. All dimension views<br />

are inter-linked and exportable in publication<br />

quality SVG format. Finally, the software<br />

can generate sample-reports for senders with<br />

a summary of the analytical results (e.g.,<br />

MLST, cgMLST), QC/QA data, and extensive<br />

documentation of the analytical procedure.<br />

SeqSphere+ is designed for distributed workgroups<br />

(client/server model with encryption of<br />

all data in transmission) and requires no scripting<br />

or bioinformatics skills. It allows automatic<br />

processing and analyzing of next generation<br />

sequence (NGS) and Sanger data for prospective<br />

bacterial genomic surveillance. De novo<br />

assembly or reference mapping of NGS read<br />

data is achieved with the incorporated Velvet<br />

or BWA tools, respectively. Defining and starting<br />

a pipeline to down-sample, assemble, and<br />

analyze data processes NGS data fully automated,<br />

e.g., by fetching the raw reads from a<br />

benchtop-sequencer as soon as data are generated.<br />

For speeding- and scaling-up the analysis<br />

simply an additional computer can be added<br />

for processing data in parallel. Experiment<br />

and epidemiologic meta-data are stored in an<br />

integrated searchable SQL database together<br />

with the DNA data. New sequence entries can<br />

be compared against stored data and automatic<br />

cluster alerts of possible outbreaks can be triggered.<br />

Meta- and sequence data can be (semi)-<br />

automated submitted to the EBI ENA archive.<br />

A backup plan for all data can be defined and<br />

28<br />

ASM Conferences


Oral Presentation <strong>Abstracts</strong><br />

automatically executed. An audit trail of all<br />

user actions including the execution of analysis<br />

and (manual) data editing is also maintained.<br />

SeqSphere+ is commercially available from<br />

Ridom GmbH (Münster, Germany) for Windows<br />

and Linux operation systems. Further<br />

information and a request for a fully functional<br />

trial version can be found at http://www.ridom.<br />

de/seqsphere/.<br />

n S7:7<br />

NULLARBOR: RAPID ANALYSIS OF<br />

BACTERIAL OUTBREAK SEQUENCE DATA<br />

T. Seemann, J. Kwong, D. M. Bulach, B. P.<br />

Howden;<br />

University of Melbourne, Melbourne, AUS-<br />

TRALIA.<br />

The modern public health microbiology laboratory<br />

has embraced whole genome sequencing<br />

as the primary assay for pathogen surveillance<br />

and outbreak analysis. Here we present Nullarbor,<br />

a software pipeline for turning a set<br />

of isolate sequence data into a single report<br />

summarizing the key information about each<br />

isolate and the relationship between isolates.<br />

This report is then used by epidemiologists<br />

and laboratory staff to make a final recommendation<br />

or actionable decision. Nullarbor<br />

first performs quality control on each isolate.<br />

The reads are adaptor and quality trimmed<br />

then measured for yield whereby low coverage<br />

isolates are quarantined. The cleaned reads are<br />

scanned with Kraken to identify the likely species<br />

and the level of contamination, and offtarget<br />

or mixed samples are quarantined. The<br />

read are then de novo assembled into contigs<br />

using MegaHit and annotated using Prokka<br />

(*). The contigs are also used to calculate the<br />

MLST with the mlst tool (*) and the resistome<br />

profile using ABRicate (*). Next the relationship<br />

between the isolates is determined. Each<br />

isolate is aligned against a reference genome<br />

and variants called using Snippy (*) and a<br />

core genome SNP alignment produced. The<br />

reference can be provided or the isolate assemblies<br />

can be used, and the SNP alignment<br />

may be optionally filtered of recombination<br />

using ClonalFrameML. A phylogenetic tree is<br />

created using FastTree and various statistics<br />

including SNP distances are calculated. The<br />

annotated genomes are used to calculate the<br />

pan-genome with Roary and visualized using<br />

FriPan (*). The use of the pan-genome augments<br />

the investigation with data on mobile<br />

genetic elements otherwise missed by core<br />

SNP analysis. Nullarbor then generates a report<br />

which can rendered in multiple formats<br />

using PanDoc. The Nullarbor pipeline follows<br />

the Unix philosophy of using and combining<br />

existing standalone tools in an efficient manner.<br />

The input is a spreadsheet-like text file,<br />

and the default output is a clean HTML report.<br />

The pipeline utilises the Unix make dependency<br />

system to enable highly parallel analyses<br />

on a single machine, to maximize efficiency<br />

for the typical laboratory having only a single<br />

bioinformatics workstation. It is currently used<br />

by the Microbiological Diagnostics Unit Public<br />

Health Laboratory in Australia routinely for all<br />

investigations. Nullarbor is open-source software<br />

released under a GPL licence and runs<br />

on Unix systems. It is available from https://<br />

github.com/tseemann/nullarbor. Simple installation<br />

of Nullarbor and its dependencies may<br />

be achieved via Homebrew Science https://<br />

github.com/Homebrew/homebrew-science.<br />

Future plans include a Docker image based on<br />

CoreOS, and a virtual machine image for use<br />

with Amazon, OpenStack and VirtualBox. (*)<br />

denotes software written by the first author.<br />

n S7:8<br />

ENTEROBASE: A POWERFUL, USER-<br />

FRIENDLY ONLINE RESOURCE FOR<br />

ANALYSING GENOMIC VARIATION<br />

N. Alikhan, M. Sergeant, Z. Zhou, A. Millard,<br />

M. J. Pallen, M. Achtman;<br />

University of Warwick, Coventry, UNITED<br />

KINGDOM.<br />

The decreasing cost of next-generation sequencing<br />

promises to revolutionise molecular<br />

epidemiology. For example, Salmonella en-<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

29


Oral Presentation <strong>Abstracts</strong><br />

terica has over 40,000 sets of short reads available<br />

within GenBank. Sequencing at this scale<br />

can potentially encompass sufficient genomic<br />

variation across a species to elucidate evolutionary<br />

lineages and virulence factors such<br />

as antimicrobial resistance. These data could<br />

provide a basis for a global perspective of<br />

microbial pathogens, which not only identifies<br />

outbreak-associated strains but also long-term<br />

transmission trends and novel environmental<br />

reservoirs. However, the paucity of analytical<br />

tools to handle data at such scales remains a<br />

key limitation. Here we present EnteroBase,<br />

which addresses such logistical challenges and<br />

facilitates access to data for a general audience<br />

of clinicians and epidemiologists. EnteroBase<br />

is a user-friendly online resource, where users<br />

can upload their own sequencing data for<br />

de novo assembly by a stream-lined pipeline.<br />

The assemblies are used for calling MLST and<br />

wgMLST patterns, allowing users to compare<br />

their strains to publically available genotyping<br />

data from other EnteroBase users, GenBank<br />

and classical MLST databases. EnteroBase<br />

was designed to exclude low quality data. Assemblies<br />

are screened for contamination, poor<br />

sequencing quality and misassemblies, and<br />

linked to standardised and curated strain metadata,<br />

including geographic, host and temporal<br />

information. Curation will be by experts from<br />

the microbial research community. Curated<br />

metadata will support exploring associations<br />

between strain genotype and phenotypic or<br />

geographic features. EnteroBase will also<br />

include SNP and pan-genome based genome<br />

comparisons, including virulence factors and<br />

antimicrobial resistance. Visualisation approaches<br />

will be integrated, including minimal<br />

spanning trees and geographic mapping. Many<br />

approaches implemented in EnteroBase will<br />

also be applicable to metagenomic data, which<br />

we intend implementing. We will provide<br />

integration with existing databases, such as<br />

BIGSdb, and existing analytical tools, such as<br />

Bionumerics, in addition to providing a standardized<br />

ontology implemented through APIs<br />

allowing for easy interoperability with other<br />

similar resources. EnteroBase is accessible<br />

through all modern web browsers, and is available<br />

at http://enterobase.warwick.ac.uk.<br />

n S7:9<br />

SNAPPERDB: A SCALABLE DATABASE FOR<br />

ROUTINE SEQUENCING OF BACTERIAL<br />

ISOLATES<br />

P. Ashton, A. Al-Shahib, A. Jironkin, A. Underwood,<br />

T. Dallman;<br />

Public Health England, london, UNITED<br />

KINGDOM.<br />

As routine sequencing of bacterial isolates becomes<br />

a reality, scalable data storage solutions<br />

are required. Analysis of bacterial populations<br />

often requires re-computing the likely variants<br />

across all isolates in a dataset and this is<br />

not feasible in rapidly growing, large datasets.<br />

Public Health England has embarked on the<br />

implementation of high throughput sequencing<br />

for the surveillance of several important<br />

human pathogens and aims to leverage the<br />

high discriminatory power of single nucleotide<br />

polymorphisms (SNPs) to detect linked<br />

cases and outbreaks of infectious disease. For<br />

this software demonstration we will present<br />

SnapperDB, a set of tools to store and query<br />

bacterial variant data to facilitate reproducible<br />

and scalable analysis of bacterial populations.<br />

The use of a relational database enables highly<br />

efficient queries that can generate SNPs in the<br />

core genome for phylogenetic analysis, or the<br />

whole genome consensus sequence for output<br />

into e.g. recombination detection tools. As part<br />

of SnapperDB, a pairwise distance matrix is<br />

maintained from which hierarchical clustering<br />

is performed. This allows the assignment of a<br />

‘SNP address’, which locates the isolate within<br />

‘SNP space’ and enables the rapid identification<br />

of closely related isolates. SnapperDB is<br />

a stable application which is easily installed<br />

from the Github repository by Unix power<br />

users. For those who are less familiar with the<br />

command line, there are pre-configured instances<br />

on both the MRC CLIMB (http://www.<br />

climb.ac.uk/) and Amazon Web Services cloud<br />

computing infrastructures.<br />

30<br />

ASM Conferences


Oral Presentation <strong>Abstracts</strong><br />

n S7:10<br />

PHYLOGENETIC RECONSTRUCTION AND<br />

OUTBREAK INVESTIGATION USING IRIDA<br />

AND SNVPHYL<br />

A. Petkau 1 , P. Mabon 1 , L. S. Katz 2 , F. Bristow 1 ,<br />

T. Matthews 1 , J. Adam 1 , J. Cabral 3 , C. Sieffert 1 ,<br />

N. Knox 1 , D. Dooley 4 , E. Griffiths 5 , G. Winsor 5 ,<br />

M. R. Laird 5 , M. Courtot 5 , P. Kruczkiewicz 6 ,<br />

E. Taboada 6 , J. A. Carriço 7 , A. Keddy 8 , R. G.<br />

Beiko 8 , C. Berry 1 , A. Reimer 1 , M. Graham 1 , W.<br />

Hsiao 4 , F. Brinkman 5 , G. Van Domselaar 1 ;<br />

1<br />

Public Health Agency of Canada, Winnipeg,<br />

MB, CANADA, 2 Centers for Disease Control<br />

and Prevention, Atlanta, GA, 3 University of<br />

Manitoba, Winnipeg, MB, CANADA, 4 BC<br />

Public Health Microbiology and Reference<br />

Laboratory, Vancouver, BC, CANADA, 5 Simon<br />

Fraser University, Burnaby, BC, CANADA,<br />

6<br />

Laboratory for Foodborne Zoonoses, Lethbridge,<br />

AB, CANADA, 7 University of Lisbon,<br />

Lisbon, PORTUGAL, 8 Dalhousie University,<br />

Halifax, NS, CANADA.<br />

Whole Genome Sequencing (WGS) based<br />

methods for disease surveillance and outbreak<br />

investigation are poised to replace existing<br />

typing methods such as pulsed-field gel electrophoresis<br />

(PFGE) and multi-locus sequence<br />

typing (MLST). The wealth of information<br />

obtained from WGS provides a typing method<br />

enabling significantly increased resolution<br />

between outbreak-associated and non-outbreak<br />

concurrent isolates. However, the routine<br />

use of WGS data for outbreak investigation<br />

has been hindered due to complexity in the<br />

management and quality assessment of WGS<br />

data, execution of analysis pipelines, and the<br />

visualization and interpretation of results. SN-<br />

VPhyl is a pipeline for building whole genome<br />

phylogenies from single nucleotide variants<br />

(SNVs). SNVPhyl accepts a set of pathogen<br />

WGS sequence reads, an assembled reference<br />

genome, and a collection of QA/QC parameters.<br />

The sequence reads are mapped to the<br />

reference genome, high-quality variants are<br />

identified within the core genome, and a table<br />

of all variants together with a quality report of<br />

the data is generated. The identified variants<br />

are used to generate a multiple sequence alignment<br />

of variant sites along with a maximum<br />

likelihood phylogeny. SNVPhyl is an integrated<br />

suite of tools that are implemented within<br />

a Galaxy workflow. Galaxy, a web-based<br />

bioinformatics analysis platform, supports<br />

execution of workflows on a variety of different<br />

high-performance computing environments<br />

which enables, together with parallelization<br />

of the pipeline tools, rapid analysis of large<br />

datasets. SNVPhyl is also integrated into<br />

IRIDA (Integrated Rapid Infectious Disease<br />

Analysis) a genomic epidemiology platform.<br />

IRIDA provides a web interface for the storage<br />

and management of WGS data and epidemiological<br />

metadata, a simplified interface for<br />

executing pipelines such as SNVPhyl, and the<br />

storage and visualization of analysis results. In<br />

addition, IRIDA provides support for integration<br />

with external tools using a REST API. In<br />

particular, a plugin has been developed for the<br />

software GenGIS, a phylogeographic analysis<br />

and visualization tool, allowing the in-depth<br />

exploration of SNVPhyl pipeline results. SN-<br />

VPhyl is the culmination of nearly five years<br />

of development on a pipeline to construct<br />

whole genome phylogenies at the National<br />

Microbiology Laboratory (NML) in Canada,<br />

with contributions from the US Centers for<br />

Disease Control and Prevention. Both IRIDA<br />

and SNVPhyl are actively in use at the NML<br />

for outbreak response and are available as free<br />

and open source software. More information<br />

can be found at http://irida.ca.<br />

n S7:11<br />

PANCORE: A FLEXIBLE WORKFLOW FOR<br />

THE COMPARISON AND ASSIGNMENT OF<br />

GENOMES TO OUTCOMES<br />

D. B. Storey, B. C. Weimer;<br />

University of California, Davis, CA.<br />

The identification and assignment of bacterial<br />

pathogens to specific outbreaks is a task<br />

of importance for the continued security and<br />

safety of our food supply. As high throughput<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

31


Oral Presentation <strong>Abstracts</strong><br />

sequencing technologies continue to increase<br />

in throughput, decrease in cost, and increase<br />

in mobility; it is clear that they will play an<br />

increasing role in our ability to identify and<br />

characterize bacteria in our food supply in a<br />

pro-active manner. By applying robust datascience<br />

techniques to these problems we can<br />

attempt to begin regulating our food supply in<br />

a prescriptive manner instead of a reactive one.<br />

With these goals in mind we present PanCore;<br />

a work-flow that utilizes: high throughput sequencing<br />

data, reference free assembly, global<br />

annotations, machine learning techniques, and<br />

predictive algorithms to classify and assign<br />

bacterial genomes to a phenotype and identify<br />

outlier isolates in a fast and robust manner.<br />

The work-flow makes use of publicly available<br />

sequencing data from the SRA/ENA and data<br />

generated as part of the 100kPathogen genomes<br />

project as the underlying data for building<br />

global models of what the potential genetic<br />

landscape is for an organism of interest. We<br />

incorporate a number of measures including:<br />

genomic distance, kmer distributions, genetic<br />

content, gene polymorphisms, and allele frequencies<br />

into this database. These expansive<br />

databases are subjected to dimensionality<br />

reduction techniques, and informative features<br />

are extracted. Using these reduced data representations<br />

and user input data (i.e. serotype,<br />

presence in an outbreak, geographic location<br />

etc.) and a training set the program undergoes<br />

a second round of clustering and feature extraction.<br />

These refined features are then used<br />

to train a Naive Bayesian classifier which can<br />

be directly applied to new incoming sequencing<br />

data. This two step approach allows for the<br />

classification of isolates directly into classes<br />

and identification of outliers in a data dependent<br />

manner. This means that unknown and/or<br />

mis-classified isolates can be quickly identified<br />

and subjected to more directed analyses and reincluded<br />

in the underlying database in an efficient<br />

manner. By front-loading computing and<br />

applying dimensionality reduction techniques<br />

prior to training the classifier we also make it<br />

possible to provide fast scalable classification<br />

that doesn’t require large infrastructure for<br />

support. We have successfully applied these<br />

methods as a way to identify Campylobacter<br />

isolates that have been mis-classified by biochemical<br />

methods and identify potential markers<br />

for identifying isolate host range.<br />

n S7:12<br />

REFERENCE-FREE PAN-GENOMIC<br />

EPIDEMIOLOGY USING CORTEX<br />

Zamin Iqbal 1 , Henk den Bakker 2 , Phelim<br />

Bradley 1 , Rachel Norris 1 , Jennifer Gardy 3 ,<br />

Sarah Walker 4 , Tim Peto 4 , Derrick Crook 4 ;<br />

1<br />

Wellcome Trust Centre for Human Genetics,<br />

Univ. of Oxford, UK, 2 Department of Animal<br />

and Food Sciences, Texas Tech University,<br />

Lubbock, Texas, 3 Communicable Disease<br />

Prevention and Control Services, British Columbia<br />

Centre for Disease Control, Vancouver,<br />

BC, Canada, 4 Nuffield Department of Medicine,<br />

University of Oxford, UK<br />

Bacterial outbreak studies based on genetic<br />

epidemiology alone are fundamentally focussed<br />

on looking at genetic similarity and<br />

differences within sets of samples, both in<br />

the core genome, and also at shared accessory<br />

genome elements (e.g. due to plasmid<br />

transfer). Standard approaches require one to<br />

compare all samples against a reference genome,<br />

which can add artefacts and noise, and<br />

cost compute time. We therefore developed a<br />

reference-free multi-sample approach called<br />

Cortex [1] , allowing partial assembly of many<br />

samples into a joint de Bruijn graph, followed<br />

by rapid and accurate determination of SNP,<br />

indel and structural variants segregating within<br />

the samples, and extremely simple assaying<br />

of accessory genome content. In terms of the<br />

species involved in this challenge, Cortex has<br />

been used for outbreak analysis of Salmonella<br />

[2] and sequence presence/absence testing in<br />

Listeria [3]. In our experience, this approach<br />

leads to highly accurate and sensitive call sets<br />

for bacteria (where coverage is not generally<br />

limiting) without the need for manual curation<br />

32<br />

ASM Conferences


Oral Presentation <strong>Abstracts</strong><br />

or parameter tuning, allowing the user to focus<br />

on subsequent analyses.<br />

The software runs on Linux or Mac OS X, and<br />

is freely available under the GPLv3 license at<br />

http:/github.com/iqbal-lab/cortex. All analyses<br />

done for this bioinformatics challenge will be<br />

available on a secondary github repository.<br />

[1] De novo assembly and genotyping of variants<br />

using colored de Bruijn graphs Iqbal et al,<br />

Nature Genetics (2012)<br />

[2] Rapid whole-genome sequencing for surveillance<br />

of Salmonella enterica serovar enteritidis,<br />

den Bakker et al, EID (2014)<br />

[3] Whole genome sequencing allows for<br />

improved identification of persistent Listeria<br />

monocytogenes in food associated environments.<br />

Stasiewicz et al, AEM (2015)<br />

n S9:3<br />

WHOLE-GENOME SEQUENCE ANALYSIS OF<br />

PSEUDOMONAS AERUGINOSA IN ACUTE<br />

INFECTION REVEALS WIDESPREAD WITHIN-<br />

POPULATION DIVERSITY AND RAPID<br />

TRANSMISSION WITHIN THE BODY<br />

H. Chung 1 , K. B. Flett 2 , M. Anderson 2 , R. Kishony<br />

3 , G. P. Priebe 2 ;<br />

1<br />

Harvard Medical School, Boston, MA, 2 Boston<br />

Children’s Hospital, Boston, MA, 3 Technion<br />

- Israel Institute of Technology, Haifa,<br />

ISRAEL.<br />

Bacterial pathogen populations mutate and<br />

adapt during the course of an infection. Strong<br />

selective pressures such as host-adaptation<br />

and antibiotic treatment lead to genetic diversification<br />

within a population. Examining<br />

this diversity in pathogen populations originating<br />

from chronic infections such as cystic<br />

fibrosis has revealed that many mutations are<br />

polymorphic rather than fixed. Theoretical<br />

works have shown that such within-population<br />

diversity can hinder accurate reconstruction<br />

of epidemiological transmission networks.<br />

Here we show that even in an acute infection,<br />

we observe within-population diversity due to<br />

pre-existing polymorphisms in the infecting<br />

population, as well as de novo mutations that<br />

arise rapidly during treatment. We describe a<br />

pipeline for rapidly collecting and sequencing<br />

populations of a bacterial pathogen from<br />

patients over multiple time points. Focusing on<br />

ventilator-associated tracheitis in mechanically<br />

ventilated children, we sampled Pseudomonas<br />

aeruginosa populations from the respiratory<br />

tract (and in some cases the gut) of eight patients<br />

prior to and after antibiotic treatment.<br />

Using a low-cost library preparation method<br />

we developed, we prepared in just 4 days the<br />

whole-genomes of 636 Pseudomonas isolates<br />

for next-generation sequencing on the Illumina<br />

HiSeq platform. Comparing diversity<br />

of pathogen populations between the airways<br />

and the gut reveals fast transmission of newly<br />

generated genotypic variants across the body.<br />

While analyzing de novo mutations is useful<br />

for inferring genes that confer selective advantage<br />

in adapting to the human host, uncovering<br />

pre-existing diversity of a pathogen across<br />

the body prior to treatment could be key for<br />

tailoring patient-specific treatment strategies.<br />

Furthermore, incorporating within-population<br />

diversity into current epidemiological models<br />

will improve the accuracy of reconstructing<br />

transmission events, especially in rapidly occurring<br />

outbreaks.<br />

n S9:4<br />

BEYOND THE SNV: INTEGRATING<br />

MULTIPLE DATA TYPES INTO GENOMIC<br />

EPIDEMIOLOGY<br />

M. S. Wright 1 , G. G. Sutton 2 , R. A. Bonomo 3 ,<br />

M. D. Adams 1 ;<br />

1<br />

J. Craig Venter Institute, La Jolla, CA, 2 J.<br />

Craig Venter Institute, Rockville, MD, 3 University<br />

Hospitals Case Medical Center,Louis<br />

Stokes Cleveland Department of Veteran Affairs<br />

Medical Center, Cleveland, OH.<br />

Single nucleotide variant (SNV) analyses can<br />

be useful for identifying transmission routes<br />

during outbreaks, characterizing pathogen<br />

population structure, and providing taxonomic<br />

discrimination for strain identification. The<br />

inclusion of gene content analysis adds further<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

33


Oral Presentation <strong>Abstracts</strong><br />

resolution to evolutionary relationships and<br />

yields phenotypically significant information<br />

during investigations that use genomic<br />

epidemiology. To demonstrate this we have<br />

sequenced > 50 Klebsiella pneumoniae (Kp)<br />

and >200 Acinetobacter baumannii (Ab)<br />

isolates from the Midwestern US. Core SNV<br />

phylogenies for both species indicated population<br />

mixing across hospital locations, with the<br />

maintenance of lineages distinct from other<br />

geographical locations. Gene content analysis<br />

additionally revealed population-specific<br />

plasmids in both species. A significant founder<br />

effect was observed in Kp, where all ST258b<br />

strains likely originated from a common ancestor<br />

that had an entS deletion unique to this<br />

lineage. Mapping of insertion sequence (IS)<br />

locations identified lineage- and strain-specific<br />

IS events. Longitudinal sequence analysis of<br />

Ab isolates originating from the same patient<br />

over time highlighted gene content variability<br />

mediated by isolate-specific IS events including<br />

the loss of phenotypically relevant antibiotic<br />

resistance genes, as well as antibiotic<br />

resistance plasmid gain events that would be<br />

missed by conventional SNV analysis. Using<br />

patient-specific SNVs, IS events, and gene<br />

content analyses, we were able to determine<br />

whether persistent Ab infections were the result<br />

of treatment failure or reinfection by new<br />

strains. Thus the combination of SNV, gene<br />

content, and IS mapping data lead to a more<br />

complete picture of transmission dynamics,<br />

pathogen populations, and evolution. Challenges<br />

remain in integrating these data types in<br />

clinically-relevant settings, given the requirements<br />

for rapid assessments and minimal data<br />

curation efforts. Computational tools are under<br />

development so that new strains can be rapidly<br />

placed in the genotypic and phenotypic context<br />

of local and global strains.<br />

n S9:5<br />

DEFINING CLONALITY IN ACINETOBACTER<br />

BAUMANNII USING WHOLE GENOME<br />

SEQUENCING OF OUTBREAK STRAINS<br />

ASSOCIATED WITH THE CONFLICT IN IRAQ<br />

E. Snesrud, P. Mc Gann, L. Appalla, F.<br />

Onmus-Leone, A. C. Ong, R. Maybank, R.<br />

Clifford, M. K. Hinkle, P. E. Waterman, E. P.<br />

Lesho;<br />

Walter Reed Army Institute of Research, Silver<br />

Spring, MD.<br />

Multi-drug resistant (MDR) A. baumannii<br />

emerged as a significant source of infection<br />

during the conflict in Iraq. Unravelling the<br />

epidemiology of these strains using conventional<br />

typing methods, such as multi-locus<br />

sequence typing (MLST), is difficult, as these<br />

methods lack the resolution to detect small<br />

genetic changes, such as single nucleotide<br />

polymorphisms (SNPs). Whole genome sequencing<br />

(WGS) offers the prospect of providing<br />

definitive data on strain relatedness, but<br />

studies to define what constitutes clonality<br />

are lacking. Here, WGS was employed on<br />

a large collection of A. baumannii cultured<br />

from patients treated at the Walter Reed Army<br />

Medical Center (WRAMC) from 2003-2011 in<br />

an effort to determine a baseline for clonality<br />

in this species. From 2003-2011, carbapenem<br />

resistance among clinical isolates of A.<br />

baumannii rose from 12% to >95%. WGS<br />

was performed using the Illumina MiSeq and<br />

NextSeq benchtop sequencers on every deduplicated<br />

carbapenem-resistant A. baumannii<br />

(CRAB) archived during this period (N=394).<br />

An additional 142 carbapenem-sensitive A.<br />

baumannii from the same time period were<br />

sequenced in tandem. Comparative genomics<br />

were performed on all isolates to determine<br />

clonality. Carbapenem resistance was mediated<br />

by the Class D oxacillinases in all isolates,<br />

with bla OXA-23<br />

identified in 312 strains (79.2%).<br />

In silico MLST revealed 12 different sequence<br />

types (ST) carrying bla OXA-23<br />

, with ST-1, 2, 20,<br />

25, 81 and 94 the most prevalent. SNP-based<br />

analysis revealed that ST-1 was composed of<br />

34<br />

ASM Conferences


Oral Presentation <strong>Abstracts</strong><br />

4 different clonal clusters that were temporally<br />

separated. Within each clonal group, SNP accumulation<br />

occurred over time, with an average<br />

of 23 SNPs separating the first and last<br />

isolate identified in each cluster. In contrast,<br />

isolates of ST-2, 20, 25, 81, and 94 were all<br />

caused by a single clone that persisted and<br />

spread throughout the healthcare facility over<br />

1 to 6 years. Accumulation of SNPs in these<br />

strains was consistent with that observed for<br />

ST-1; strains varied by 0 to 25 SNPs, with the<br />

number of SNPs increasing as time progressed.<br />

At the height of the Iraq conflict (2004-2006),<br />

a new patient with a CRAB infection was<br />

being identified almost daily. Remarkably,<br />

despite the large number of patients involved,<br />

WGS revealed that the majority of infections<br />

over the 9 years were caused by just 9 different<br />

strains, which appear to have entered,<br />

persisted and disseminated within the facility<br />

over periods ranging from 1 to 6 years. SNPbased<br />

phylogeny demonstrated that every ST<br />

accumulated SNPs at a comparable rate, with<br />

an average of 8 SNPs accumulating every year.<br />

Thus, we define a baseline for clonality in A.<br />

baumannii as two isolates sharing ≤ 8 (± 3)<br />

SNPs over the course of a year.<br />

n S9:6<br />

DIRECT FROM SPUTUM: NEXT GEN<br />

ANALYSIS OF MYCOBACTERIUM<br />

TUBERCULOSIS IN CLINICAL SAMPLES<br />

D. M. Engelthaler 1 , R. E. Colman 1 , V. Crudu 2 ,<br />

D. Catanzaro 3 , A. Catanzaro 4 , P. Keim 5 , T.<br />

Cohen 6 , T. C. Rodwell 4 ;<br />

1<br />

TGen North, Flagstaff, AZ, 2 Phthisiopneumology<br />

Institute, Chișinău, MOLDOVA, REPUB-<br />

LIC OF, 3 University of Arkansas, Little Rock,<br />

AR, 4 University of California San Diego, San<br />

Diego, CA, 5 Northern Arizona University,<br />

Flagstaff, AZ, 6 Yale University, New Haven,<br />

CT.<br />

The incidence of drug-resistant (DR) tuberculosis<br />

(TB) continues to increase worldwide.<br />

With the presence of multi-drug resistance it<br />

has become critical to quickly identify the appropriate<br />

treatment regimen, in order to effectively<br />

treat disease and prevent further transmission<br />

of DR-TB. Tabletop DNA sequencers<br />

now allow for rapid and robust sequencing<br />

of pathogen isolates as well as direct analysis<br />

of clinical samples. However, applying this<br />

technology directly to complex samples, such<br />

as sputum, currently has limitations due to the<br />

complexity of biological samples. We have<br />

developed accessible tools and methodologies<br />

for direct sequencing of clinical sputum samples<br />

which enable us to detect Mycobacterium<br />

tuberculosis (Mtb), produce a rapid drug susceptibility<br />

profile, detect heteroresistance and<br />

conduct additional analyses related to the nature<br />

of TB infection and transmission. Targeted<br />

sequencing allows for Next Gen Drug Susceptibility<br />

Testing (Next Gen-DST), an amplicon<br />

sequencing method for generating a rapid and<br />

inexpensive DST profile straight from positive<br />

sputum samples. Additionally we have devised<br />

Single Molecule Overlapping Read (SMOR)<br />

analysis – an advanced amplicon sequencing<br />

approach for detecting and measuring heteroresistance<br />

(i.e., resistance allele mixtures) to<br />

0.1% minor resistance component. Lastly, a<br />

more detailed analysis of the generated with<br />

these techniques allows for haplotype analysis<br />

leading to an understanding of the nature of an<br />

infection (e.g., whether a patient has a superinfection<br />

of multiple strains or is infected with<br />

multiple lineages of the same strain). We have<br />

employed these techniques on DNA extracted<br />

from >150 remnant clinical Mtb sputum<br />

samples from The Republic of Moldova. The<br />

Next Gen-DST assay provided comparable<br />

drug sensitivity profiles as culture-based DST<br />

on 36 well established target loci; the SMOR<br />

analysis identified the presence of mixtures<br />

in samples at


Oral Presentation <strong>Abstracts</strong><br />

n S10:3<br />

DEVELOPMENT OF AN EFFICIENT NEXT-<br />

GENERATION SEQUENCING PLATFORM FOR<br />

CHARTING THE EVOLUTION OF NOROVIRUS<br />

STRAINS<br />

G. I. Parra, C. K. Karangwa, S. V. Sosnovtsev,<br />

K. Y. Green;<br />

National Institutes of Health, Bethesda, MD.<br />

Noroviruses (NoV) are important pathogens<br />

of acute gastroenteritis. An effective vaccine<br />

could save thousands of lives each year, but<br />

the number of antigenic components needed<br />

for the development of efficacious vaccines<br />

as well as the immune correlates of protection<br />

against NoV infections are still unknown. Like<br />

many other RNA viruses, NoV are genetically<br />

diverse with seven major genogroups containing<br />

over 30 different genotypes. To gain insight<br />

into the rules that govern intra- and interhost<br />

NoV evolution and antigenic diversity,<br />

we developed an efficient platform to analyze<br />

complete NoV genomes by Next-Generation<br />

Sequencing (NGS), and analyzed differences<br />

in the population dynamics of NoV infecting<br />

healthy individuals. The first set of samples<br />

was collected from patients with GII.3, GII.4,<br />

or GII.6 infection that, despite resolving symptoms<br />

within days, shed NoV for up to 4 weeks.<br />

The GII.6 and GII.3 strains were stable and did<br />

not show evidence of adaptive changes during<br />

the prolonged shedding phase, while the<br />

GII.4 viruses showed a number of nucleotide<br />

changes as infection progressed. A second set<br />

of samples was obtained from cases of personto-person<br />

transmission of non-GII.4 strains,<br />

which showed that although minor changes<br />

could be detected during transmission (acute<br />

phase), the virus would often revert to the original<br />

virus sequence during the recovery phase<br />

of the infection. Finally the new amplification<br />

and genome sequence method developed allowed<br />

us not only to amplify archival samples<br />

but also describe and characterize a new GII.17<br />

norovirus (Hu/GII.17/GaithersburgD1/2014/<br />

USA) that is currently causing large outbreaks<br />

of gastroenteritis in countries from Asia. Taken<br />

together, our data suggests different patterns of<br />

evolution among NoV strains; with some viruses<br />

(like GII.4) more prone to change, while<br />

others remain static over time, limiting their<br />

antigenic diversity and prevalence. Population<br />

dynamics offers a new tool in the development<br />

of NoV vaccines.<br />

n S10:4<br />

GENOME-WIDE COMPARISON OF COWPOX<br />

VIRUSES REVEALS A NEW CLADE RELATED<br />

TO VARIOLA VIRUS<br />

P. Dabrowski, A. Radonic, A. Kurth, L.<br />

Schuenadel, A. Nitsche;<br />

Robert Koch Institute, Berlin, GERMANY.<br />

Zoonotic infections caused by several Orthopoxviruses<br />

(OPV) like Monkeypox virus or<br />

Vaccinia virus have a significant impact on<br />

human health. In Europe, the number of diagnosed<br />

infections with Cowpox viruses (CPXV)<br />

is increasing in animals as well as in humans.<br />

CPXV used to be enzootic in cattle; however,<br />

such infections were not being diagnosed over<br />

the last decades. Instead, individual cases of<br />

cowpox are being found in cats or exotic zoo<br />

animals that transmit the infection to humans.<br />

Both animals and humans reveal local exanthema<br />

on arms and legs or on the face. Although<br />

cowpox is generally regarded as a self-limiting<br />

disease, immunosuppressed patients can develop<br />

a lethal systemic disease resembling smallpox.<br />

To date, only limited information on the<br />

complex and, compared to other OPV, sparsely<br />

conserved CPXV genomes is available. Since<br />

CPXV displays the widest host range of all<br />

OPV known, it seems important to comprehend<br />

the genetic repertoire of CPXV which in<br />

turn may help elucidate specific mechanisms<br />

of CPXV pathogenesis and origin. Therefore,<br />

about 50 genomes of independent CPXV<br />

strains from clinical cases involving several humans,<br />

rats, cats, jaguarundis, beaver, elephant,<br />

marah and mongoose were sequenced. All<br />

samples were collected as part of the routine<br />

36<br />

ASM Conferences


Oral Presentation <strong>Abstracts</strong><br />

diagnostics at the German Consultant Laboratory<br />

for Poxviruses over the last decade. The<br />

first genomes were gained by using massive<br />

parallel pyrosequencing (GS FLX) while Illumina<br />

sequencing in combination with Nextera<br />

library generation was utilized later on. The<br />

extensive phylogenetic analysis showed that<br />

the CPXV strains sequenced clearly cluster<br />

into several distinct clades, some of which are<br />

closely related to Vaccinia viruses while others<br />

represent different clades in a CPXV cluster.<br />

Particularly one CPXV clade is more closely<br />

related to Camelpox virus, Taterapox virus and<br />

Variola virus than to any other known OPV.<br />

These results support and extend recent data<br />

from other groups who postulate that CPXV<br />

does not form a monophyletic clade and should<br />

be divided into multiple lineages.<br />

n S10:5<br />

VIROME ANALYSES AMONG CHILDREN<br />

WITH ACUTE RESPIRATORY INFECTION IN<br />

CHINA<br />

W. Tan, Y. Wang;<br />

National Institute for Viral Disease Control<br />

and Prevention, China CDC, Beijing, CHINA.<br />

Acute respiratory infection (ARI) of children<br />

is known to be caused by several recognized<br />

respiratory viruses. Global understanding of<br />

virome of respiratory tract in children with<br />

ARI is limited, especially in Beijing, China,<br />

though infection by individual viruses is well<br />

characterized. To define the virome of respiratory<br />

tract in children with ARI, we carried<br />

out next-generation sequencing (NGS) of<br />

nasopharyngeal swabs by Illumina Hiseq 2500<br />

followed by phylogenetic analysis. A total of<br />

42,951,290 reads were obtained with 25%<br />

of all the generated reads assigned to recognized<br />

respiratory viruses. Respiratory tract of<br />

children with ARI contains complicated viral<br />

populations which are mainly dominated by<br />

seven families including Paramyxoviridae,<br />

Comnaviridae, Parvoviridae, Orthomyxoviridae,<br />

Picornaviridae, Adenoviridae and Anelloviridae.<br />

Various viruses of different genotypes<br />

were detected in respiratory samples. In detail<br />

HRSV, HCoVs (HCoV-229E, HCoV-OC43,<br />

HCoV-HKU1), HBoV1, influenza A/B, HRVs,<br />

HAdVs, anelloviruses (TTVs and TTMVs)<br />

and HPIVs represented the most abundant<br />

and common viruses harbored by childhood<br />

respiratory tract in Beijing. In contrast HMPV,<br />

HCoV-NL63 and measles virus were occasionally<br />

detected. Contigs sequence analysis<br />

indicated that HRSV detected in this study<br />

mainly belonged to BA and GA2 subtypes. At<br />

least three genotypes of HCoV-OC43 circulating<br />

in Beijing were determined including B,<br />

C/D and UNT subgroups and genotype UNT<br />

is the predominant genotype in recent years.<br />

Influenza A, B and C viruses were all detected<br />

in this study and mainly included H1N1 and<br />

H5N1. Most of the reads related to HRVs belong<br />

to HRV-A and HRV-C. Diverse types of<br />

anelloviruses (TTVs and TTMVs) were found<br />

in respiratory samples including TTV5, TTV7,<br />

TTV10, TTV21, TTMV5, TTMV7, TTMV8<br />

and TTV-like mini virus. The viral population<br />

and dominant genotypes detected differed significantly<br />

between ours and previous reports.<br />

This research firstly provides a comprehensive<br />

understanding of virome of children with ARI<br />

in China and indicated a high heterogeneity of<br />

known viruses present in respiratory tract of<br />

children, which may benefit detection and prevention<br />

of respiratory disease in China.<br />

n S10:6<br />

USING WASTEWATER TO MONITOR VIRAL<br />

PATHOGENS IN THE SLUM CITY OF KIBERA<br />

M. H. Hjelmsø 1 , O. Lukjancenko 1 , L. Bergmark<br />

1 , E. Ngeno 2 , F. M. Aarestrup 1 , R. S. Hendriksen<br />

1 ;<br />

1<br />

Technical University of Denmark, Kgs. Lyngby,<br />

DENMARK, 2 Center for Disease Control,<br />

Nairobi, KENYA.<br />

Pathogenic viruses are a huge burden to<br />

mankind. Enteric viruses, the major cause of<br />

gastroenteritis, indisposes millions of people<br />

each year and kills hundreds of thousands in<br />

developing countries. Other important diseases<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

37


Oral Presentation <strong>Abstracts</strong><br />

caused by viruses include MERS, SARS, Aids<br />

and Ebola. To combat these diseases, health<br />

authorities need an understanding of the spread<br />

and epidemiology of the viruses in question.<br />

Classic surveillance relies on data from local<br />

health centers and diagnostic labs. This<br />

is problematic as many people in the world<br />

either fail to contact or have no access to such<br />

facilities, leading to an under-reporting of the<br />

problem. Alternative monitoring methods are<br />

needed to limit the spread of viral pathogens in<br />

the future. The objective of this study was to<br />

establish if deep sequencing of wastewater can<br />

be used as a surveillance tool for viral pathogens.<br />

Infected individuals shed virus particles<br />

in large numbers in their feces, which ends up<br />

contaminating the wastewater. In this regard,<br />

wastewater is most often seen as a problem,<br />

but in this project we use it to monitor the<br />

health state of the population it originates<br />

from. To test this novel approach, a proof of<br />

concept study was made in the slum city of<br />

Kibera, Kenya. This city has very poor sanitary<br />

conditions and limited medical facilities, leading<br />

to a high incidence of infective diseases.<br />

Wastewater was sampled daily at two central<br />

locations for three months. Samples were<br />

frozen at -80 °C directly after sampling and<br />

shipped frozen to our laboratory in Denmark.<br />

The virus particles were then isolated from the<br />

rest of the wastewater content, concentrated<br />

with PEG8000 precipitation and treated with<br />

nucleases to degrade extracellular DNA and<br />

RNA. The nucleotides from the pure viral<br />

concentrate were then extracted with the Nucleospin<br />

RNA XS kit and the High Pure Viral<br />

Nucleic Acid Kit selecting for RNA and DNA<br />

viruses, respectively. The nucleotides were<br />

then amplified with 40 cycles of PCR using a<br />

random primer before library creation with the<br />

Nextera XT kit and sequenced on the Illumina<br />

MiSeq creating 250bp paired-end reads. The<br />

reads were then mapped to a custom database<br />

containing all viral sequences and genomes<br />

from the NCBI and ViPR databases. We were<br />

able to detect 456 different virus species,<br />

including the human pathogens: Enterovirus,<br />

Rotavirus, Norovirus, Astrovirus, Hepatitis C<br />

virus and Human Herpes Virus. Several of the<br />

human viral pathogens exhibited a clear rise<br />

and fall in numbers during the study period.<br />

Similar epicurves were seen at both sampling<br />

points, suggesting that our results were indeed<br />

a reflection of an outbreak in the city. Unfortunately,<br />

no clinical data exists to confirm this.<br />

With this approach, the public health of large<br />

populations can be monitored cost effectively.<br />

Better monitoring could be instrumental in<br />

combating the diseases locally and from<br />

preventing global outbreaks of existing and<br />

emerging viral pathogens.<br />

38<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

n 1<br />

A BIOSURVEILLANCE ANALYSIS PIPELINE<br />

FOR GENOMIC SEQUENCE DATA<br />

C. Olsen 1 , K. Qaadri 1 , H. Shearman 2 , R. Moir 2 ,<br />

M. Kearse 2 , S. Markowitz 2 , J. Kuhn 2 , S. Dunn 2 ,<br />

A. Cooper 2 ;<br />

1<br />

Biomatters, Inc., Newark, NJ, 2 Biomatters,<br />

Ltd., Auckland, NEW ZEALAND.<br />

Next-generation sequencing (NGS) approaches<br />

have numerous applications for biosurveillance<br />

programs and outbreak investigation.<br />

However, there are significant challenges for<br />

analyzing the data accurately without the aid<br />

of high performance compute resources in a<br />

timely fashion. Often times the users are not<br />

bioinformaticians who are comfortable running<br />

sequence analysis pipelines. Biomatters’<br />

Geneious R9 is a bioinformatics software<br />

platform that allows researchers the use of<br />

industry-leading algorithms for their genomic<br />

and protein sequence analyses. Geneious offers<br />

a comprehensive suite of functions, including<br />

a robust collection of peer-reviewed tools,<br />

that enable researchers to be more efficient<br />

with their sequence analysis workflows. The<br />

recent addition of the 16S Biodiversity tool<br />

and Sequence Classifier plugin provides tools<br />

which can be incorporated into an easy to use<br />

pathogen identification workflow. The 16S<br />

Biodiversity tool identifies high-throughput<br />

16S rRNA amplicons from environmental<br />

samples using the RDP database, and visualizes<br />

biodiversity as an interactive chart using<br />

a secure web viewer. The Sequence Classifier<br />

plugin taxonomically classifies an organic<br />

sample by how similar its DNA is to your<br />

own database of known sequences using a<br />

BLAST-like algorithm with multiple loci and<br />

trees to assist with identification. By utilizing<br />

Geneious R9, biologists can easily streamline<br />

their sequence analysis workflows for mixed<br />

sample analysis. This poster aims to describe a<br />

sequence analysis pipeline for identification of<br />

bacterial pathogens from mixed metagenomic<br />

data generated from outbreaks. The pipeline<br />

can also be extended to include eukaryotic and<br />

fungal pathogens.<br />

n 2<br />

THE IMPORTANCE OF ASSESSING THE<br />

PERFORMANCE OF METHODS FOR VARIANT<br />

DETECTION AND CONSTRUCTING SNP<br />

MATRICES: INSIGHTS FROM A VALIDATION<br />

EXPERIMENT OF THE CFSAN SNP PIPELINE<br />

J. B. Pettengill, S. Davis, Y. Luo, H. Rand, E.<br />

Strain;<br />

FDA, College Park, MD.<br />

The CFSAN SNP Pipeline combines into a single<br />

package the steps necessary to generate a<br />

SNP matrix with which a phylogenetic tree can<br />

be inferred to assist in traceback and foodborne<br />

outbreak investigations. The pipeline works<br />

with next-generation sequencing reads from<br />

a group of individuals and uses a referencebased<br />

approach to identify variant sites. Given<br />

the importance of decisions that may be based<br />

on the topology inferred from the SNP matrix,<br />

it is paramount that the method be validated.<br />

With this objective in mind, we developed a<br />

simple python package that when given a reference<br />

genome will generate variants of known<br />

position against which we validate our pipeline.<br />

We created 1000 simulated Salmonella<br />

enterica sp. enterica Serovar Agona genomes<br />

at 100x and 20x coverage, each containing<br />

500 SNPs, 20 single-base insertions and 20<br />

single-base deletions. For the 100x dataset,<br />

the CFSAN SNP Pipeline recovered 98.9% of<br />

the introduced SNPs and had a false positive<br />

rate of 1.04 x 10-6; for the 20x dataset 98.8%<br />

of SNPs were recovered and the false positive<br />

rate was 8.34 x 10-7. Interestingly, failure to<br />

meet the consensus frequency threshold rather<br />

than the coverage threshold was the primary<br />

explanation for false negatives. Additionally,<br />

false negatives were not randomly distrib-<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

39


Poster <strong>Abstracts</strong><br />

uted across the genome, which suggests that<br />

hotspots within the genome may exist where it<br />

will be difficult to accurately detect differences<br />

between two samples. These results provide<br />

critical metrics that show the CFSAN SNP<br />

Pipeline to be a robust method for constructing<br />

a SNP matrix and further reinforces the utility<br />

and importance of validation exercises.<br />

n 3<br />

TGS-TB: TOTAL GENOTYPING SOLUTION<br />

FOR MYCOBACTERIUM TUBERCULOSIS<br />

USING SHORT-READ WHOLE-GENOME<br />

SEQUENCING<br />

T. Sekizuka 1 , A. Yamashita 1 , Y. Murase 2 , T.<br />

Iwamoto 3 , S. Mitarai 2 , S. Kato 2 , M. Kuroda 1 ;<br />

1<br />

National Institute of Infectious Diseases, Shinjyuku-ku,<br />

JAPAN, 2 Japan Anti-Tuberculosis Association,<br />

Kiyose-shi, JAPAN, 3 Kobe Institute<br />

of Health, Kobe-shi, JAPAN.<br />

Background: Whole-genome sequencing<br />

(WGS) with next-generation DNA sequencing<br />

(NGS) is an increasingly accessible and affordable<br />

method for genotyping hundreds of Mycobacterium<br />

tuberculosis (Mtb) isolates, leading<br />

to more effective epidemiological studies<br />

involving single nucleotide variations (SNVs)<br />

in the core genomic sequences based on molecular<br />

evolution. Methods: We developed an<br />

all-in-one web-based tool for genotyping Mtb,<br />

referred to as Total Genotyping Solution for<br />

TB (TGS-TB), to facilitate multiple genotyping<br />

platforms using NGS for spoligotyping and<br />

the detection of phylogenes with core genomic<br />

single nucleotide variations (SNVs), IS6110<br />

insertion sites, and VNTRs (our customized<br />

short TR on 43 loci) through a user-friendly<br />

simple click interface. In addition, this methodology<br />

is implemented with a KvarQ script<br />

to predict MTBC lineages/sublineages and<br />

potential antimicrobial resistance. Findings:<br />

The results of in silico analyses using TGS-TB<br />

are completely consistent with those obtained<br />

using conventional molecular genotyping<br />

methods, suggesting that MiSeq NGS short<br />

reads could provide multiple genotypes to<br />

discriminate multiple strains of Mtb. Indeed,<br />

seven Mtb isolates showing the same VNTR<br />

profile were accurately discriminated through<br />

median joining network analysis using specific<br />

SNVs unique to those isolates. Furthermore,<br />

an additional IS6110 insertion was detected<br />

in one of those isolates as supportive genetic<br />

information in addition to core genomic SNVs.<br />

The results obtained from all in silico analyses<br />

can be downloaded from the website. Interpretation:<br />

TGS-TB provides more accurate<br />

and discriminative strain typing for clinical<br />

and epidemiological investigations; NGS strain<br />

typing offers a total genotyping solution for<br />

Mtb outbreak and surveillance. The genotype<br />

information obtained for all Mtb isolates can<br />

be deposited into an integrated database for<br />

the surveillance of future outbreaks and global<br />

infections. TGS-TB web site: http://gph.<br />

niid.go.jp/tgs-tb Funding: This research was<br />

funded through a Grant-in-Aid for Research<br />

on Emerging and Re-emerging Infectious Diseases<br />

(H25-Shinko-Ippan-015) from the Ministry<br />

of Health Labour and Welfare Programs<br />

of Japan.<br />

n 4<br />

MARA: THE MULTI-ANTIBIOTIC RESISTANCE<br />

ANNOTATOR<br />

S. Partridge 1 , G. Tsafnat 2 ;<br />

1<br />

Westmead Millennium Institute, Sydney, AUS-<br />

TRALIA, 2 Centre for Health Informatics, Macquarie<br />

University, Sydney, AUSTRALIA.<br />

Much of the increasingly problematic multiresistance<br />

in Gram-negative bacteria is due<br />

to resistance genes associated with different<br />

mobile elements (mainly gene cassettes/<br />

integrons, insertion sequences, transposons)<br />

that tend to cluster together in complex multiresistance<br />

regions (MRR). MRR in turn are<br />

found on plasmids that can spread between<br />

cells, including different species, or sometimes<br />

in islands integrated into the chromosome.<br />

Increasing numbers of MRR sequences are<br />

becoming available as part of large projects<br />

using next-generation methods, enabling<br />

40<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

comparative analysis to better understand how<br />

resistance genes are spreading. However, many<br />

such sequences are poorly and inconsistently<br />

annotated, as available software often only<br />

focuses on identifying genes and the putative<br />

functions of the proteins that they encode. In<br />

the resistance domain, gene functions are often<br />

well known, but resistance gene nomenclature<br />

is confusing, with different naming systems<br />

for the same genes and relationships between<br />

genes often not evident from their names<br />

alone. Consistently naming genes, identifying<br />

minor variations that have important effects<br />

on resistance phenotype and simultaneously<br />

identifying boundaries of mobile genetic elements<br />

are needed to provide the most useful<br />

annotations. We previously developed an automated<br />

tool, Attacca, which uses a database<br />

of ‘features’ and computational grammars<br />

to accurately and consistently annotate gene<br />

cassettes and integrons, and the Repository of<br />

Antibiotic-resistance Cassettes (RAC) website<br />

http://www.rac.aihi.mq.edu.au/), which provides<br />

a database of resistance and other gene<br />

cassettes and allows sequences to be submitted<br />

for annotation by Attacca. We have now<br />

extended Attacca to annotate other resistance<br />

genes and associated mobile elements using<br />

an expanded database (compiled from existing<br />

resistance gene repositories, where possible)<br />

and additional grammar rules. Attacca annotations<br />

were compared with manually annotated<br />

sequences to refine the grammars. Attacca is<br />

able to accurately and consistently annotate<br />

resistance genes, the boundaries of complete<br />

mobile elements and their fragments, insertions<br />

of one mobile element inside another and<br />

direct repeats indicative of insertion, as well<br />

as providing diagrammatic representations of<br />

annotations. The Multi-Antibiotic Resistance<br />

Annotator (MARA) website (linked to RAC)<br />

will provide access to the extended resistance<br />

gene database, as browsable lists that include<br />

information such as relationships within different<br />

resistance gene families, and a service for<br />

annotating MRR sequences.<br />

n 5<br />

MOLECULAR EPIDEMIOLOGY OF<br />

CAMPYLOBACTER ISOLATES FROM<br />

SOUTH AFRICA INVESTIGATED USING<br />

WEB-BASED BIOINFORMATICS PIPELINES<br />

DEVELOPED AT THE CENTER FOR GENOMIC<br />

EPIDEMIOLOGY OF THE TECHNICAL<br />

UNIVERSITY OF DENMARK<br />

A. M. Smith, M. Thobela, A. Ismail, K. H.<br />

Keddy;<br />

National Institute for Communicable Diseases,<br />

Johannesburg, SOUTH AFRICA.<br />

Background: Campylobacter is a leading<br />

cause of bacterial gastroenteritis globally. The<br />

molecular epidemiology of Campylobacter is<br />

well described in developed countries. However,<br />

in developing countries very little data exist<br />

on the molecular epidemiology of Campylobacter.<br />

Objectives: A proof-of-concept study<br />

was conducted to determine the feasibility of<br />

investigating the molecular epidemiology of<br />

Campylobacter by using web-based bioinformatics<br />

pipelines to analyze whole-genome<br />

sequence (WGS) data. Methods: Sixteen human<br />

isolates of Campylobacter jejuni were<br />

investigated. Isolates were recovered in 2014<br />

from the Gauteng and Western Cape provinces<br />

of South Africa. Genomic DNA was isolated<br />

from strains using the QIAGEN QIAamp DNA<br />

Mini Kit and sequenced using Illumina MiSeq<br />

next generation sequencing technology. Raw<br />

data generated on the MiSeq were assembled<br />

into contiguous DNA sequences using CLC<br />

Genomics Workbench Software. Assembled<br />

data was analyzed using ‘multi-locus sequence<br />

typing (MLST)’ and ‘acquired antimicrobial<br />

resistance gene finder’ pipelines developed at<br />

the Center for Genomic Epidemiology (CGE)<br />

of the Technical University of Denmark. Results:<br />

Genomic data were successfully uploaded<br />

to CGE servers and the resulting analyses<br />

were returned via e-mail messages containing<br />

website links to the analyzed data. Analysis<br />

of a single data file was typically completed<br />

within 5-10 minutes. Our 16 isolates showed<br />

the following MLST subtypes: ST-4063 (n=2),<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

41


Poster <strong>Abstracts</strong><br />

1737 (n=1), ST-4624 (n=1), ST-6091 (n=1),<br />

ST-5809 (n=1), ST-658 (n=1), ST-356 (n=1),<br />

ST-257 (n=1), ST-883 (n=1), ST-474 (n=1),<br />

ST-829 (n=1), ST-51 (n=1), and ST-unknown<br />

(n=3). Analysis for acquired antimicrobial resistance<br />

genes showed the following: 3 isolates<br />

had no evidence of known resistance genes; 9<br />

isolates contained bla OXA-61<br />

; 1 isolate contained<br />

tet(O); 2 isolates contained both bla OXA-61<br />

and<br />

tet(O); and 1 isolate contained bla OXA-61<br />

, tet(O),<br />

tet(B), sul2, strA and strB. Conclusions:<br />

Our proof-of-concept study was successful.<br />

Analysis of WGS data using web-based bioinformatics<br />

pipelines at the CGE provided a<br />

single, rapid and cost-effective approach to<br />

investigate the molecular epidemiology of<br />

Campylobacter. We showed a wide diversity<br />

of MLST subtypes among 16 Campylobacter<br />

isolates suggesting that a heterogeneous population<br />

of Campylobacter may exist in South<br />

Africa. A diversity of acquired antimicrobial<br />

resistance genes was shown in the population<br />

with bla OXA-61<br />

(coding for β-lactam resistance)<br />

most commonly described.<br />

n 6<br />

GENETIC TYPING AND EPIDEMIOLOGICAL<br />

CLUSTERING OF COMMON PATHOGENS<br />

BASED ON WHOLE GENOME NGS DATA<br />

K. Einer-Jensen, P. Liboriussen, J. Johansen,<br />

L. Schauser, A. C. Materna;<br />

QIAGEN, Aarhus, DENMARK.<br />

Next generation sequencing (NGS) data from<br />

whole pathogen genomes is frequently used for<br />

enhanced surveillance and outbreak detection<br />

of common pathogens. Version 1.5 of the CLC<br />

Microbial Genomics Module for CLC Genomics<br />

Workbench and CLC Genomics Server<br />

introduces new functionality for molecular<br />

typing and epidemiological analysis of bacterial<br />

isolates. The latest update to the module<br />

enables the user to perform stepwise (tool-bytool)<br />

analysis or to take advantage of included<br />

multistep workflows. With a few clicks workflows<br />

can be optimized for routine analysis of<br />

a specific pathogen or outbreak. New features<br />

include for instance streamlined tools for NGSbased<br />

Multilocus Sequence Typing (MLST),<br />

resistance typing, as well as detection of genus<br />

and species information. New tools for phylogenetic<br />

tree reconstruction generate trees based<br />

on single nucleotide polymorphisms (SNPs) or<br />

infer K-mer trees from NGS reads or genomes.<br />

A new table format, acting as a database, collects<br />

typing results and associates these results<br />

with metadata such as sample information,<br />

geographic origin, treatment outcome, etc. Results<br />

generated using e.g. MLST and resistance<br />

typing can furthermore be associated with the<br />

original sample metadata. Users can filter for<br />

results and metadata to find and select relevant<br />

subsets of samples for downstream analysis.<br />

Results and metadata available during tree<br />

generation can further be used to explore phylogeny<br />

in the context of this epidemiologically<br />

relevant information. Version 1.5 of the CLC<br />

Microbial Genomics Module aims to facilitate<br />

tasks commonly carried out during outbreak<br />

investigation, such as typing or source tracking<br />

based on whole genome data. Preconfigured<br />

workflows and simple deployment via integration<br />

into the widely used CLC Genomics<br />

Workbench and CLC Genomics Server ecosystem<br />

offer a user-friendly platform for scientists<br />

engaged in outbreak prevention and control.<br />

n 7<br />

NEEDS ASSESSMENT AND ONTOLOGY<br />

DEVELOPMENT FOR INTEGRATING WHOLE<br />

GENOME SEQUENCING INTO ROUTINE<br />

OUTBREAK INVESTIGATIONS<br />

E. J. Griffiths 1 , M. Courtot 1 , D. Dooley 2 , J.<br />

Adam 3 , F. Bristow 3 , J. A. Carrico 4 , B. Dhillon<br />

1 , M. Graham 3 , M. Laird 1 , T. Matthews 3 , A.<br />

Petkau 3 , L. Schriml 5 , J. Shay 1 , E. Taboada 6 , G.<br />

Winsor 1 , G. Van Domselaar 3 , F. Brinkman 1 , W.<br />

Hsiao 2 ;<br />

1<br />

Simon Fraser University, Burnaby, BC,<br />

CANADA, 2 BC Public Health Microbiology<br />

and Reference Laboratory, Vancouver, BC,<br />

CANADA, 3 National Microbiology Laboratory,<br />

Winnipeg, MB, CANADA, 4 University of<br />

Lisbon, Lisbon, PORTUGAL, 5 University of<br />

42<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

Maryland School of Medicine, Baltimore, MD,<br />

6<br />

Laboratory for Foodborne Zoonoses, Lethbridge,<br />

AB, CANADA.<br />

Whole-genome sequencing (WGS) can provide<br />

increased pathogen typing discriminatory<br />

power as well as additional genomic information<br />

using comparative genomics tools. Canada’s<br />

Integrated Rapid Infectious Disease Analysis<br />

(IRIDA) platform will equip public health<br />

workers with user friendly tools for incorporating<br />

WGS into isolate typing and epidemiological<br />

pipelines to support real-time infectious<br />

disease investigation. In order to understand<br />

the practical requirements for implementing<br />

this platform within the Canadian health care<br />

network, a needs assessment was conducted<br />

by interviewing public health stakeholders<br />

and domain experts. User activities, lab<br />

management software, pathogen-tracking and<br />

reporting systems were profiled to better characterize<br />

levels of user computational expertise,<br />

required functionalities and information flow.<br />

Key gaps were identified in personnel training,<br />

data processing and sharing, data integration,<br />

as well as governance. Consistent capture of<br />

structured metadata is crucial for integrated<br />

data analyses, human health risk assessments,<br />

source attribution, ecosystems modelling and,<br />

in the simplest terms, to make sense of the genome<br />

data. In addition to the needs assessment,<br />

an ontology resource review was conducted<br />

to assess the utility of different community<br />

standards for fulfilling the needs of a genomic<br />

epidemiology program. No single ontology is<br />

sufficient to cover all attributes required for<br />

genomic epidemiology and the very breadth<br />

of many ontologies hinders their practical use<br />

in real-time by users with little bioinformatics<br />

expertise. User profiles and data requirements<br />

were harmonized with different sources<br />

in order to produce an OWL file containing<br />

metadata fields and terms describing isolate<br />

source attribution, lab analytics, sequencing/<br />

assembly/annotation processes as well as<br />

quality metrics, and patient demographics (or<br />

environmental descriptors). By adhering to<br />

the best practices of the Open Biomedical and<br />

Biological Ontology (OBO) Consortium, the<br />

application ontology we are developing allows<br />

consolidation of various existing ontological<br />

efforts into a resource directly compatible with<br />

IRIDA. The data integration capabilities of<br />

the IRIDA ontology are currently being tested<br />

in a Canadian government Genome Research<br />

and Development Initiative and will shortly be<br />

expanded to include antimicrobial resistance.<br />

This research is a key component of the IRIDA<br />

platform allowing for automated data integration<br />

alleviating the burden of manual analyses.<br />

Furthermore, outbreak response can be expedited<br />

after auto-detecting deviations above<br />

pre-defined biosurveillance thresholds. The<br />

lessons learned from the needs assessment and<br />

ontology development reported here should be<br />

informative for other countries with autonomous<br />

health regions.<br />

n 8<br />

IRIDA: A FEDERATED PLATFORM ENABLING<br />

RICHER GENOMIC EPIDEMIOLOGY<br />

ANALYSIS IN A PUBLIC HEALTH<br />

ENVIRONMENT<br />

F. Bristow 1 , J. Adam 1 , J. A. Carrico 2 , M. Courtot<br />

3 , B. Dhillon 3 , D. Dooley 4 , E. Griffiths 3 , J.<br />

Isaac-Renton 5 , A. Keddy 6 , P. Kruczkiewicz 7 , M.<br />

Laird 3 , T. Matthews 1 , A. Petkau 1 , L. Schriml 8 ,<br />

J. Shay 3 , E. Taboada 7 , P. Tang 4 , J. Thiessen 1 ,<br />

G. Winsor 3 , R. G. Beiko 6 , M. Graham 1 , G. Van<br />

Domselaar 1 , W. Hsiao 4 , F. S. Brinkman 3 ;<br />

1<br />

National Microbiology Laboratory, Winnipeg,<br />

MB, CANADA, 2 University of Lisbon, Lisbon,<br />

PORTUGAL, 3 Simon Fraser University, Burnaby,<br />

BC, CANADA, 4 BC Public Health Microbiology<br />

and Reference Laboratory, Vancouver,<br />

BC, CANADA, 5 BC Public Health Microbiology<br />

and Reference Laboratory, Burnaby, BC,<br />

CANADA, 6 Dalhousie University, Halifax, NS,<br />

CANADA, 7 Laboratory for Foodborne Zoonoses,<br />

Lethbridge, AB, CANADA, 8 University of<br />

Maryland School of Medicine, Baltimore, MD.<br />

Most public data analysis tools/pipelines for<br />

genomic epidemiology require considerable<br />

technical knowledge to execute, or cannot eas-<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

43


Poster <strong>Abstracts</strong><br />

ily integrate rich epidemiologic data, providing<br />

a barrier to wider use by public health workers.<br />

Canada’s Integrated Rapid Infectious Disease<br />

Analysis (IRIDA) web-based bioinformatics<br />

platform provides a model approach that<br />

aims to address these issues and complement<br />

other international bioinformatics pipelines<br />

for genomic epidemiology. The IRIDA development<br />

team comprises five interconnected<br />

working groups: 1) Ontology and Database;<br />

2) Microbial Typing; 3) Architecture and API;<br />

4) Tools Development; 5) User Experience.<br />

Teams are embedded in Canadian national and<br />

provincial public health agencies, and in academia,<br />

to engage end users and stakeholders<br />

during design and implementation phases of<br />

the project. IRIDA implements secure storage<br />

of WGS data, epidemiological and application<br />

metadata, data analysis pipelines, visualization<br />

of results, and a federated data sharing model<br />

intended to facilitate secure communication<br />

within and between provincial and federal public<br />

health institutions in Canada. Metadata is<br />

encoded in an application ontology following<br />

community recognized standards and extending<br />

existing OBO domain ontologies (http://<br />

www.obofoundry.org/) to promote interoperability.<br />

Data analysis pipelines and execution<br />

provenance are transparently implemented<br />

using Galaxy, and federated data sharing and<br />

analysis is realized with a common REST<br />

API across platform instances. Data analysis<br />

tools being further developed include SNV-<br />

Phyl for phylogeographic analysis, in silico<br />

microbial typing capability, and IslandViewer/<br />

visualization tools for antimicrobial resistance,<br />

virulence factor, and genomic island analysis.<br />

Linkage with other international genomic epidemiology<br />

initiatives, involving public genomic<br />

data release with more limited metadata, is<br />

also envisaged. A publicly available academic<br />

version of IRIDA that does not provide access<br />

to potentially sensitive epidemiologic metadata<br />

will provide IRIDA’s analysis tools for wider<br />

research use. An initial IRIDA version is being<br />

tested in the public health environment<br />

using current outbreak data, enabling further<br />

refinement of ontology and tool development.<br />

IRIDA is free, open-source software that may<br />

be a useful platform for other countries with<br />

autonomous health regions that wish to empower<br />

their public health workers’ genomic<br />

analysis capabilities. See http://www.irida.ca<br />

for more information.<br />

n 9<br />

PIPELINE FOR THE AUTOMATIC<br />

IDENTIFICATION OF PATHOGENS<br />

A. Andrusch, P. Dabrowski, A. Nitsche;<br />

Robert Koch Institute, Berlin, GERMANY.<br />

Special diagnostics of infectious diseases<br />

nowadays rely on the gold standard of nucleic<br />

acid detection that is the polymerase chain<br />

reaction (PCR). The specific detection by PCR<br />

has its limit in that every pathogen has to be<br />

tested for with an own assay. This limitation<br />

can be overcome by adopting the molecular<br />

open view capacities that next-generation<br />

sequencing (NGS) can provide. NGS-based<br />

methods allow for the representative sequencing<br />

of all nucleic acids contained in clinical<br />

samples, enabling the downstream analysis<br />

of all generated reads for various known<br />

pathogens at once. This comes at the price of<br />

necessary filtering steps for the removal of<br />

background reads originating from the patient.<br />

Beyond that, NGS cannot only extend the<br />

diagnostic possibilities provided by PCR, but<br />

also serve as a stepping stone in the detection<br />

of hitherto unknown and novel pathogens. The<br />

‘Pipeline for the Automatic Identification of<br />

Pathogens’ (PAIPline) presented here, is a new<br />

complete workflow for the pathogen search in<br />

NGS datasets. It includes several steps for the<br />

preprocessing and quality control of the raw<br />

data to ensure that only information-rich reads<br />

are evaluated. It furthermore includes steps<br />

for the assignment of reads to their respective<br />

taxons based on reliable, established referencebased<br />

algorithms. Filtering of background<br />

reads, contaminants and organisms of low<br />

interest as well as the evaluation of ambiguous<br />

read information is automatically done before<br />

44<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

the results are presented. Analysis results are<br />

shown in a highly accessible manner, allowing<br />

the user to gain a quick overview as well<br />

as permitting deep analysis. The performance<br />

of the PAIPline was benchmarked on real and<br />

artificial datasets of known compositions and<br />

compared to competing tools. The results and<br />

discussed features show that the presented approach<br />

is a viable strategy for the identification<br />

of pathogen sequences in NGS datasets.<br />

n 10<br />

SEPARATION OF FOREGROUND AND<br />

BACKGROUND READS IN MIXED NGS<br />

DATASETS<br />

S. Tausch, A. Nitsche, B. Renard, P. Dabrowski;<br />

Robert Koch Institute, Berlin, GERMANY.<br />

NGS is a valuable technology for rapid and indepth<br />

analysis of clinical samples, as it allows<br />

sequencing of a pathogen’s whole genome<br />

directly from patient material within as little<br />

as 26 hours. However, the follow-up analysis<br />

is severely slowed down by the abundance of<br />

reads originating from the host. Thus, in order<br />

to exploit the full potential of the technology<br />

for rapid diagnostics, a method for rapid in<br />

silico removal of host reads is necessary. Commonly,<br />

a mapping-based approach is used to<br />

separate reads: either reads mapping to a background<br />

reference or reads not mapping to a<br />

foreground reference are discarded. However,<br />

while the former approach is highly specific<br />

in discarding only true background reads and<br />

the latter is highly sensitive in only keeping<br />

foreground reads, neither offers a good balance.<br />

Hence we have aimed at developing a<br />

novel tool specifically geared towards both<br />

specific and sensitive separation of foreground<br />

and background reads. In order to determine<br />

whether a read belongs to the foreground or<br />

the background, we train markov chains of<br />

an order k from 4 to 12 on user-provided sets<br />

of foreground and background reference sequences,<br />

where each state is a k-mer of length<br />

k and each transition is one of the four possible<br />

bases A, C, G and T. We then calculate the<br />

difference of log likelihoods of each transition<br />

observed within a read with regards to<br />

the foreground and the background markov<br />

chains. This difference is then used as a score<br />

for the separation of reads, with scores smaller<br />

than 0 indicating a background read and scores<br />

larger than 0 indicating a foreground read.<br />

We have tested our tool on several datasets,<br />

including Cowpoxvirus sequenced from a<br />

human host. In all cases, our tool was faster<br />

than any competing tool (achieving speeds of<br />

up to 10 Megabases/second using 4 CPUs),<br />

including Kraken and mapping via bowtie2.<br />

At the same time, we consistently achieved<br />

the best F-Score of all tested tools. Our tool is<br />

developed in python and java and available for<br />

download from http://sourceforge.net/projects/<br />

rambok/ We have developed a freely available,<br />

easy to use, rapid and both highly sensitive and<br />

specific tool for the separation of foreground<br />

and background reads in mixed NGS datasets.<br />

We believe that this will be highly useful as an<br />

initial filtering step for anyone analyzing viral<br />

sequences via NGS.<br />

n 11<br />

A RAPID AND SCALABLE SINGLE<br />

NUCLEOTIDE POLYMORPHISM DISCOVERY<br />

AND VALIDATION PIPELINE FOR OUTBREAK<br />

INVESTIGATION OF BACTERIAL PATHOGENS<br />

B. Rusconi 1 , A. L. Rodriguez 2 , S. S. Koenig 1 ,<br />

M. Eppinger 1 ;<br />

1<br />

University of Texas at San Antonio - South<br />

Texas Center For Emerging Infectious Diseases<br />

(STCEID), San Antonio, TX, 2 University of<br />

Texas at San Antonio -Computational Biology<br />

Initiative, San Antonio, TX.<br />

Background: Assuring a timely and effective<br />

response in the control of bacterial outbreaks<br />

is challenging, as discriminatory power becomes<br />

of particular importance to distinguish<br />

outbreak isolates that form tight clonal complexes<br />

with only few genetic polymorphisms.<br />

The increase of throughput and concomitant<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

45


Poster <strong>Abstracts</strong><br />

cost reduction of next generation sequencing<br />

has allowed researchers to migrate from<br />

analyzing distinct loci in only few prototypical<br />

isolates to whole genome sequencing typing<br />

(WGST) of large bacterial collections. The<br />

growing sequence databases fortified with<br />

strain-associated metadata require efficient<br />

and well-integrated bioinformatics tools that<br />

will explore and harvest the information<br />

content of WGS data to enhance the traceability<br />

of microbes in support of informed and<br />

timely countermeasures. Particularly single<br />

nucleotide polymorphism (SNP) discovery<br />

and typing often surpass labor-intensive molecular<br />

typing schemes in offering both robust<br />

long- and short-term high-resolution variant<br />

markers. Methods: Our group has developed<br />

a bioinformatics SNP Discovery and Validation<br />

Pipeline (SNPDV) implemented on the<br />

open source platform Galaxy coded in Python<br />

that makes use of the wealth of sequence data<br />

for the whole genome sequence typing of<br />

bacterial pathogens. Its modular architecture<br />

guarantees high flexibility and scalability for<br />

SNP discovery and validation and can be run<br />

in serial, multiprocessor, or threaded version<br />

depending on available server capabilities.<br />

This pipeline allows to rapidly type strains of<br />

unknown provenance by testing allelic states<br />

in already established SNPs panels, or for unbiased<br />

de novo discovery. Results: Given the<br />

clonal nature of outbreaks, we have successfully<br />

deployed this pipeline in the investigation<br />

of the 2010 Haiti cholera and 2006 spinach<br />

outbreaks, and diverse other enteric and<br />

emerging pathogens, such as Yersinia pestis,<br />

Bacillus spp., Vibrio spp., and Acinetobacter<br />

baumannii, achieving improved phylogenetic<br />

accuracy and resolution. Discussion: Following<br />

the concept of genomic epidemiology<br />

the gathered phylogenomic data have major<br />

public health relevance in utilizing sequencebased<br />

information for improved surveillance,<br />

strain attribution, and source identification<br />

to contain outbreaks. Canonical SNPs can be<br />

implemented in efficient typing assays offering<br />

robust phylogenetic signals for outbreak inand<br />

exclusion that surpass classical technologies.<br />

The phylogenomic framework is a further<br />

critical resource to precisely cluster pathogens<br />

into subgroups distinguished by different genotypic,<br />

epidemiological, and phenotypic traits,<br />

such as whether particular genotypes reflect<br />

highly pathogenic subpopulations. Future<br />

developments are directed towards the cloud<br />

implementation of this pipeline in collaboration<br />

with the UTSA Cloud and Big Data Laboratory<br />

(NSF Cloud).<br />

n 12<br />

COMBINING LABORATORY AND<br />

EPIDEMIOLOGICAL DATA FOR OUTBREAK<br />

INVESTIGATION AND PUBLIC HEALTH<br />

SURVEILLANCE: LESSONS FROM AN<br />

OUTBREAK OF HIV-1 INFECTION LINKED TO<br />

INJECTION DRUG USE OF OXYMORPHONE _<br />

INDIANA, 2015<br />

E. M. Campbell 1 , R. R. Galang 1 , J. Gentry 2 , P.<br />

J. Peters 1 , S. J. Blosser 2 , E. L. Chapman 2 , C.<br />

Conrad 2 , J. W. Duwve 2 , L. Ganova-Raeva 3 , W.<br />

Heneine 1 , D. Hillman 2 , H. Jia 1 , L. Lui 2 , J. Lovchik<br />

2 , A. Perez 2 , P. Peyrani 4 , P. Pontones 2 , S.<br />

Ramachandran 3 , J. C. Roseberry 2 , M. Sandoval<br />

2 , A. Shankar 1 , H. Thai 3 , G. Xia 3 , Y. Khudyakov<br />

3 , W. H. Switzer 1 ;<br />

1<br />

Division of HIV/AIDS Prevention, National<br />

Center for HIV/AIDS, Viral Hepatitis, STD,<br />

and TB Prevention, CDC, Atlanta, GA, 2 Indiana<br />

State Department of Health, Indianapolis,<br />

IN, 3 Division of Viral Hepatitis, National Center<br />

for HIV/AIDS, Viral Hepatitis, STD, and<br />

TB Prevention, CDC, Atlanta, GA, 4 Division of<br />

Infectious Diseases, University of Louisville,<br />

Louisville, KY.<br />

In January 2015, a cluster of HIV-1 infections<br />

was detected in rural Indiana among persons<br />

who reported injecting the prescription opioid,<br />

oxymorphone. As of May, HIV-1 infection<br />

was diagnosed in 153 individuals. Molecular<br />

analyses of HIV-1 and HCV sequences were<br />

combined with epidemiological data via a<br />

novel bioinformatics pipeline to infer the<br />

timing of HIV transmission relative to HCV<br />

and to explore risk factors associated with<br />

46<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

the inferred transmission network. Interviews<br />

were conducted with HIV-1-positive patients<br />

to capture high-risk contacts with respect to<br />

needle-sharing events and sexual encounters.<br />

HIV polymerase (pol) and HCV NS5B gene<br />

sequences obtained from blood specimens<br />

from newly diagnosed infections were phylogenetically<br />

analyzed. Clusters were defined<br />

when HIV-1 pol sequences were highly genetically<br />

related (0.99).<br />

Genetic distances < 1.5% were used to infer<br />

the HIV-1 transmission network. Molecular,<br />

demographic, behavioral, and high-risk contact<br />

data were combined to discern transmission<br />

and haplotype networks. Interactive exploration<br />

of these networks through open source<br />

tools informed public health response and<br />

helped to prioritize resources. One large cluster<br />

of HIV-1 subtype B infections was identified<br />

(n= 55) that was comprised of three primary<br />

haplotypes. Of 36 HIV-infected specimens<br />

with HCV antibody results, 34 (94%) were<br />

HCV co-infected. Among all HCV infections,<br />

genotype 1a (n=82) was most common, followed<br />

by 3a (n=29), 2b (n=5), and 1b (n=3).<br />

Three unique clusters of HCV strains were<br />

identified (Cluster 1, n = 45; Cluster 2, n =9;<br />

Cluster 3, n = 7. Of 118 HCV-infected specimens<br />

with HIV antibody results, 38 (32.2%)<br />

were HIV co-infected. The overwhelming<br />

majority of HIV transmission events (98.3%)<br />

occurred during needle-sharing rather than<br />

sexual encounters (1.7%). Positive assortativity<br />

(r=0.17) among needle-sharing encounters<br />

was observed between individuals who<br />

identified as commercial sex workers. In this<br />

outbreak, a single strain of HIV-1 was introduced<br />

into a population infected with multiple<br />

HCV strains. Due to the lack of molecular<br />

diversity, transmission network reconstruction<br />

was uninformative until contextualized<br />

with epidemiological data afforded by patient<br />

interviews. The heterogeneity of HCV strains<br />

(clustering and non-clustering) suggests earlier<br />

introduction of HCV compared with HIV.<br />

These data demonstrate the outbreak potential<br />

with introduction of HIV-1 into a community<br />

where HCV prevalence is high among persons<br />

who inject prescription opioids. These results<br />

also highlight the utility of combining multiple<br />

bioinformatics methods into a single pipeline<br />

to rapidly synthesize data from local, state, and<br />

federal public health institutions for characterizing<br />

transmission networks and to provide a<br />

rapid public health response.<br />

n 13<br />

AN AUTOMATED PIPELINE FOR MICROBIAL<br />

GENOME SEQUENCE ASSEMBLY AND<br />

CHARACTERIZATION<br />

F. Onmus-Leone, E. Snesrud, L. Appalla, P.<br />

McGann, A. Ong, R. Maybank, M. Hinkle, E.<br />

Lesho, R. Clifford;<br />

Walter Reed Army Institute of Research, Silver<br />

Spring, MD.<br />

Background: The Multidrug-resistant organism<br />

Repository & Surveillance Network<br />

(MRSN) collects and characterizes multidrug<br />

resistant bacteria isolated throughout the US<br />

military healthcare system. To complement<br />

species identification, antibiotic susceptibility<br />

testing and molecular assays, the MRSN has<br />

begun to employ whole genome sequencing<br />

(WGS) for routine bacterial characterization.<br />

Objective: To perform WGS on the hundreds<br />

of isolates received each month, MRSN has<br />

developed an automated analysis pipeline built<br />

from commercial, open source and custom<br />

software. Methods: Sequencing platforms<br />

used by MRSN are the Illumina MiSeq, Illumina<br />

NextSeq and PacBio RSII. Assembly begins<br />

with the merging of overlapping sequencing<br />

reads using FLASh and the filtering of lowquality<br />

portions of reads with Btrim. Filtered<br />

reads are then assembled with GS Assembler<br />

software. Assembly quality is measured by<br />

read coverage, the number of contigs and contig<br />

N50. Species identification is performed by<br />

searching against the SILVA 16S rDNA sequence<br />

database; contamination is indicated by<br />

the presence of 16S sequences from multiple<br />

species or the existence of multiple contigs<br />

with a complete 16S gene from one species.<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

47


Poster <strong>Abstracts</strong><br />

Assemblies that pass quality control continue<br />

through the pipeline for the next phase of analysis,<br />

which includes identification of resistance<br />

genes, identification of virulence genes, plasmid<br />

Inc typing and multilocus sequence typing.<br />

These analyses are BLAST-based searches<br />

against nucleotide and protein databases from<br />

public sources that are quality checked, expanded<br />

with the addition of novel genes reported<br />

from the literature and curated by MRSN.<br />

The RAST annotation server and Prokka are<br />

used for “working” whole genome annotation.<br />

Final genome annotation is performed by the<br />

NCBI upon sequence submission. Determining<br />

strain relatedness is critical for hospital infection<br />

control and disease epidemiology. For<br />

whole-genome phylogenetic analysis, sequencing<br />

reads from isolates are aligned to a reference<br />

genome using Bowtie. High quality SNPs<br />

and indels in the core genome are used to build<br />

evolutionary trees. Overall genome similarity<br />

(chromosomes and plasmids) among strains is<br />

measured using BLAST-based methods; similarity<br />

matrices are used for UPGMA clustering.<br />

Results: Our pipeline allows the automated<br />

analysis of 50 genomes per week and can be<br />

readily scaled up. We have processed ~1800<br />

bacterial genomes sequenced on Illumina platforms<br />

to date. Custom scripts control the pipeline,<br />

handle file management, produce reports<br />

and prepare results for loading into the MRSN<br />

database. Conclusion: The value of WGS data<br />

to infectious disease surveillance is unequaled,<br />

allowing comprehensive identification of resistance<br />

loci and definitive determination of strain<br />

relatedness. The analytic pipeline developed<br />

by MRSN will be used to characterize all bacteria<br />

in our repository.<br />

n 14<br />

ARMOR-D: A DATABASE LINKING<br />

DEMOGRAPHIC, PHENOTYPIC AND<br />

GENOMIC DATA FOR MULTIDRUG-<br />

RESISTANT BACTERIA<br />

R. Clifford 1 , M. Julius 1 , Y. Kwak 1 , F. Onmus-<br />

Leone 1 , L. Appalla 1 , E. Snesrud 1 , J. Padilla 1 ,<br />

G. Ward 1 , M. Sparks 1 , P. Waterman 2 , M. Hinkle<br />

1 , E. Lesho 1 ;<br />

1<br />

Walter Reed Army Institute of Research, Silver<br />

Spring, MD, 2 Armed Forces Health Surveillance<br />

Center, Silver Spring, MD.<br />

Background: In response to Executive Order<br />

#13676 Combating Antibiotic Resistant Bacteria,<br />

interagency efforts to link phenotypic,<br />

demographic and sequence data in a central<br />

database similar to Genbank have recently<br />

begun. Objective: To assist, we describe the<br />

Antimicrobial Resistance Monitoring and Research<br />

Database (ARMoR-D) developed for<br />

the Department of Defense (DoD) healthcare<br />

system. Methods: ARMoR-D is a relational<br />

database built upon SQL Server®, C#, .NET®<br />

and other enterprise technologies. Data objects<br />

are highly normalized and modeled after those<br />

naturally falling in this domain - primarily isolates<br />

and results. Related business rules are not<br />

part of the data model, but are implemented in<br />

the application code. User managed “dictionary”<br />

database tables enhance data quality control.<br />

ARMoR-D has a web-based user interface<br />

to support characterization and archiving of<br />

isolates. It imports data from submitting facilities<br />

and test results from the central referral<br />

laboratory, and manages repository inventory.<br />

ARMoR-D stores high-level whole genome<br />

sequencing data from a semi-automated pipeline,<br />

including 16S-based species identification,<br />

MLST typing, antibiotic resistance and<br />

virulence genes, quality control metrics, and<br />

the locations of sequence assembly files on the<br />

system. ARMoR-D has a generic data model<br />

that can be easily extended to support new<br />

organisms, sample types, laboratory test results<br />

and other data that pertain to the characterization<br />

of biological isolates and/or isolate reposi-<br />

48<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

tories. Due to its unique mission, the ARMoR<br />

Program typically collects multiple results<br />

for the same test per isolate; thus there is no<br />

limit on the number of data records associated<br />

with a given data object. Special features<br />

in ARMoR-D flag and resolve discrepancies<br />

among replicate test results. ARMoR-D also<br />

allows users to work effectively with de-identified<br />

data while retaining restricted access to<br />

personally identifiable information necessary<br />

for epidemiology and transmission analyses.<br />

Results: Currently, ARMoR-D contains 2 million<br />

test results from the centralized analyses<br />

of more than 30,000 clinical isolates, along<br />

with their associated demographic information<br />

and repository locations. The database also<br />

contains sequence-based results for more than<br />

1000 of these bacterial samples. Conclusion:<br />

ARMoR-D provides a useful model for those<br />

seeking to link demographic, phenotypic, and<br />

genomic data as they respond to the national<br />

strategy to combat antibiotic resistant bacteria.<br />

Future versions of ARMoR-D will include a<br />

public-facing webpage with decision support<br />

tools and calculators for optimized antibiotic<br />

dosing, and machine based learning algorithms<br />

for identifying genetic determinants of antimicrobial<br />

resistance and predicting drug-resistant<br />

phenotypes from the genome sequences of<br />

isolates that have not yet undergone automated<br />

susceptibility testing.<br />

n 15<br />

A WEB-HOSTED R WORKFLOW TO<br />

SIMPLIFY AND AUTOMATE THE ANALYSIS<br />

OF 16S NGS DATA<br />

J. Bradshaw 1 , T. Purucker 2 , K. Wong 1 , M.<br />

Molina 2 ;<br />

1<br />

ORISE, Athens, GA, 2 U.S. EPA, Athens, GA.<br />

Next-Generation Sequencing (NGS) produces<br />

large data sets that include tens-of-thousands<br />

of sequence reads per sample. For analysis of<br />

bacterial diversity, 16S NGS sequences are<br />

typically analyzed in a workflow that containing<br />

best-of-breed bioinformatics packages that<br />

may leverage multiple programming languages<br />

(e.g., Python, R, Java, etc.). The process to<br />

transform raw NGS data to usable operational<br />

taxonomic units (OTUs) can be tedious due to<br />

the number of quality control (QC) steps used<br />

in QIIME and other software packages for<br />

sample processing. Therefore, the purpose of<br />

this work was to simplify the analysis of 16S<br />

NGS data from a large number of samples by<br />

integrating QC, demultiplexing, and QIIME<br />

(Quantitative Insights Into Microbial Ecology)<br />

analysis in an accessible R project. User command<br />

line operations for each of the pipeline<br />

steps were automated into a workflow. In addition,<br />

the R server allows multi-user access<br />

to the automated pipeline via separate user<br />

accounts while providing access to the same<br />

large set of underlying data. We demonstrate<br />

the applicability of this pipeline automation<br />

using 16S NGS data from approximately<br />

100 stormwater runoff samples collected<br />

in a mixed-land use watershed in northeast<br />

Georgia. OTU tables were generated for each<br />

sample and the relative taxonomic abundances<br />

were compared for different periods over storm<br />

hydrographs to determine how the microbial<br />

ecology of a stream changes with rise and fall<br />

of stream stage. Our approach simplifies the<br />

pipeline analysis of multiple 16S NGS samples<br />

by automating multiple preprocessing, QC,<br />

analysis and post-processing command line<br />

steps that are called by a sequence of R scripts.<br />

n 16<br />

DOING PIPELINES BETTER<br />

M. J. Stanton-Cook, T. J. Robinson, N. L. Ben<br />

Zakour, S. A. Beatson;<br />

The University of Queensland, Brisbane, AUS-<br />

TRALIA.<br />

Over the last three years we have developed<br />

the Banzai Microbial Genomics Pipeline<br />

(https://github.com/mscook/Banzai-MicrobialGenomics-Pipeline).<br />

Banzai aims to<br />

simplify the analysis of microbial next-gen<br />

sequencing (NGS) datasets. It was specifically<br />

designed to distribute workload over internal<br />

and external High Performance Computing<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

49


Poster <strong>Abstracts</strong><br />

(HPC) resources. Banzai (in most cases) does<br />

not provide new NGS algorithms, but it harnesses<br />

the power of tried and tested NGS tools.<br />

Banzai simplifies, automates and distributes<br />

computational workloads, which is the typical<br />

bottleneck in analysis of large NGS datasets.<br />

Our local Banzai Microbial Genomics Pipeline<br />

install was initially tightly coupled to an aging<br />

HPC resource. Here, I will discuss how we<br />

future proofed and migrated our pipeline using<br />

DevOPs approaches. In particular, how we<br />

restructured the code base, abstracted out the<br />

details of the underlying compute resources,<br />

and how we now treat the entire Banzai Pipeline<br />

and underlying compute resources as an<br />

Infrastructure as Code model. This is great<br />

first hand lesson how we had to significantly<br />

change our development practice to consider<br />

long term sustainability of our pipeline. In less<br />

than three years, the Beatson Laboratory has<br />

had to scale from the simultaneous analysis of<br />

10 to 100 to 1000 genomes. By understanding<br />

and employing DevOps based approaches the<br />

genomics community will be able to build and<br />

scale reproducible analysis pipelines with minimal<br />

modification to their existing processes.<br />

n 17<br />

DEVELOPMENT OF A PLATFORM FOR<br />

PLANT VIRUS DIAGNOSTICS AND<br />

CHARACTERIZATION USING NEXT<br />

GENERATION SEQUENCING<br />

P. A. Gutiérrez, L. Muñoz Baena, H. Jaramillo<br />

Mesa, D. Muñoz Escudero, M. A. Marín Montoya;<br />

Universidad Nacional de Colombia, Medellin,<br />

COLOMBIA.<br />

Next generation sequencing methods are becoming<br />

an essential tool for the discovery and<br />

characterization of plant viruses as well as in<br />

the diagnostics of infected material in seed<br />

certification and virus management programs.<br />

Unfortunately, for traditional plant pathologists,<br />

efficient use and interpretation of the<br />

massive amount of data obtained from NGS<br />

is still a daunting task that generally requires<br />

of expert bioinformaticians to extract useful<br />

information. In this work, PVDT (Plant Virus<br />

Diagnostic Tool), an automated diagnostic<br />

platform for plant virus detection and characterization<br />

is presented. PVDT allows for raw<br />

data quality checking, sampling of reads and<br />

outputs an easy to interpret diagnosis of virus<br />

genera and their relative levels in the sample.<br />

A secondary analysis allows for virus assembly<br />

using either a sequential de novo method<br />

or mapping with a reference genome. Finally,<br />

PVDT determines whether detected viruses<br />

could be a novel species using ICTV criteria<br />

or an uncharacterized variant. The program<br />

was tested succesfully using simulated data<br />

sets and plant transcriptomes of Potato (Solanum<br />

tuberosum, S. phureja), Cape gooseberry<br />

(Physalis peruviana), Tamarillo (S. betaceum),<br />

Bell pepper (Capsicum annuum), Tomato (S.<br />

lycopersicum), Angel´s trumpet (Brugmansia<br />

candida) and Pinto peanut (Arachis pintoi).<br />

This platform is a contribution to the development<br />

of user-friendly tools for diagnostic<br />

purposes using NGS data and hopefullly will<br />

be useful to agricultural decision-making institutions.<br />

This work was funded by Universidad<br />

Nacional de Colombia (Grant VRI: 19438) and<br />

International Foundation for Science (Sweden,<br />

Grant: C/4634-2).<br />

n 18<br />

COMPARISON OF PHYLOGENIC<br />

METHODS FOR IDENTIFYING PUTATIVE<br />

TRANSMISSIONS OF ENTEROCOCCUS<br />

FAECIUM IN A HOSPITAL<br />

H. C. Lin 1 , T. Mi 1 , G. Wang 2 , W. Huang 2 , P.<br />

Mayigowda 1 , K. Murugesan 1 , A. Gupta 1 , J. T.<br />

Fallon 2 , N. Dimitrova 1 ;<br />

1<br />

Philips Research North America, Briarcliff<br />

Manor, NY, 2 New York Medical College, Valhalla,<br />

NY.<br />

Background: Enterococcus faecium is a major<br />

cause of hospital acquired infection in the<br />

US. Recently, whole genome sequencing data<br />

has been used for molecular typing, antibiotic<br />

resistance characterization and transmission<br />

50<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

route reconstruction. For the purpose of reconstructing<br />

putative transmission routes, a<br />

variety of phylogeny building methods have<br />

been developed and can help identify potential<br />

transmission events within a hospital environment.<br />

Method: We sequenced 149 E. faecium<br />

isolates of MLST type ST736 from 106 patients<br />

over 55 months at Westchester Medical<br />

Center, and reconstructed phylogenies on the<br />

samples with various distance based methods.<br />

To construct the phylogenies, we used a variant<br />

calling pipeline to determine likely SNPs<br />

within the samples, and then we applied various<br />

filters to remove inaccurate SNPs, such<br />

as removing SNPs in repetitive regions and<br />

phage regions. Afterwards, a pairwise distance<br />

matrix was built based on the SNP calls, and<br />

finally, we applied various phylogeny building<br />

methods to the distance matrix. The methods<br />

we experimented with were neighbor-joining,<br />

maximum likelihood, and Prim’s (undirected)<br />

and Edmond’s (directed) minimum spanning<br />

tree (MST) algorithms. The results of these<br />

methods were then compared to each other to<br />

determine how robust and accurate our results<br />

may be. Lastly, we also evaluated how frequently<br />

antibiotic resistance increased along<br />

the predicted transmission events in our tree.<br />

Results: The trees we generated mostly have<br />

good concordance with each other often with<br />

more than 80% of subtrees matching between<br />

results. Additionally, we also computed how<br />

frequently antibiotic resistance increased along<br />

transmissions suggested by the MST methods.<br />

Both MST methods showed that antibiotic<br />

resistance generally increased along transmission<br />

edges in the trees, and Edmond’s directed<br />

MST method showed a higher percentage of<br />

the predicted transmission events resulting in<br />

increased daptomycin resistance, with about<br />

3/4 of transmissions indicating an increase in<br />

resistance. As we expect resistance to increase<br />

as transmissions occur, this may indicate that<br />

Edmond’s directed MST produced a more<br />

accurate tree. Conclusion: Our preliminary<br />

results show that Edmond’s directed MST may<br />

produce an accurate phylogeny tree, although<br />

further verification may be needed, as other<br />

methods yield slightly different trees. In the future,<br />

we plan to examine clinical data in order<br />

to help validate the correctness of the transmissions<br />

predicted by our phylogeny trees.<br />

n 19<br />

ANALYSIS OF AN ENTEROVIRUS D68<br />

OUTBREAK THROUGH METAGENOMIC<br />

SEQUENCING<br />

H. Lin 1 , Q. Wan 1 , W. Huang 2 , G. Wang 2 , J.<br />

Zhuge 2 , S. M. Nolan 2 , J. T. Fallon 2 , N. Dimitrova<br />

1 ;<br />

1<br />

Philips Research North America, Briarcliff<br />

Manor, NY, 2 New York Medical College, Valhalla,<br />

NY.<br />

Background: Metagenomic sequencing can be<br />

used to detect the presence of microbial organisms.<br />

Many metagenomic techniques have<br />

been developed to explore and verify microbial<br />

ecology, evolution and diversity, especially for<br />

investigating infectious viruses and bacteria.<br />

By obtaining the population distribution of<br />

classified microbial genomes in certain samples,<br />

metagenomic tools have the potential to<br />

provide insights into the causality of infectious<br />

disease outbreaks. During the Enterovirus D68<br />

outbreak in 2014, 93 nasopharyngeal swab<br />

samples were obtained from patients with<br />

symptoms of respiratory infection and were<br />

sequenced at Westchester Medical Center.<br />

Methods: We used Kraken for the purpose of<br />

classifying reads based on taxonomic orders.<br />

By using Kraken, microbial reads can be identified<br />

and subsequently enable the investigation<br />

of certain infectious disease outbreak. To<br />

obtain the best performance from Kraken, we<br />

build a pipeline to automatically fetch publicly<br />

available reference genomes from NCBI at<br />

user’s demand to keep Kraken’s classification<br />

results up-to-date for detecting pathogens present<br />

within a sample. The pipeline is not only<br />

able to extract reads by viruses, bacteria, and<br />

fungi species, but can also generate aggregate<br />

analysis indicating the most prevalent microorganisms<br />

within any taxonomic group. We also<br />

provide analysis about possible cohabitations<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

51


Poster <strong>Abstracts</strong><br />

of different pathogens, in terms of a pairwise<br />

correlation ranking between any species, as<br />

well as an overall plot of co-existing species<br />

within samples. Moreover, the distribution<br />

of reads classified into any species across all<br />

patients can be plotted. Results: Our pipeline<br />

detects 1148 species among reads from the<br />

93 patient samples. Overall, the number of<br />

identified Enterovirus D68 reads ranks the 4th<br />

(8.79%) among all microbial species, but dominates<br />

viral sequences found in our samples<br />

with a proportion of 96.40% of viral reads. The<br />

next highest occurrence of viral reads consists<br />

of Human respiratory syncytial virus (2.37%)<br />

and Rhinovirus B (0.32%). Additionally, the<br />

pipeline reveals a possible cohabitation between<br />

Enterovirus D68 and other bacteria and<br />

viruses, such as Rhinovirus, which can both<br />

cause symptoms of respiratory infection. We<br />

observe that EVD68 is exceeded by Moraxella<br />

catarrhalis (41.34%), Haemophilus influenzae<br />

(20.79%) and Pseudomonas aeruginosa<br />

(16.55%), which can be pathogenic bacteria.<br />

Conclusion: We built an automatic metagenomic<br />

analysis pipeline designed for detecting<br />

pathogens that can cause infectious disease<br />

outbreaks at hospitals. The pipeline provides<br />

tools to visualize and investigate sequence read<br />

classification to conveniently discover pathogens<br />

residing within samples and discover<br />

microbial cohabitations.<br />

n 20<br />

MUTATION RATE ANALYSIS TO AID<br />

IN IDENTIFYING TRANSMISSIONS OF<br />

ENTEROCOCCUS FAECIUM IN A HOSPITAL<br />

SETTING<br />

P. Mayigowda 1 , A. Gupta 1 , H. Lin 1 , K. Murugesan<br />

1 , T. Mi 1 , G. Wang 2 , A. Dhand 2 , J. Fallon 2 ,<br />

N. Dimitrova 1 ;<br />

1<br />

Philips Research North America, Briarcliff<br />

Manor, NY, 2 New York Medical College, Valhalla,<br />

NY.<br />

Next-generation sequencing (NGS) has proved<br />

instrumental in tracing the spread and evolution<br />

of bacterial populations through time and<br />

geographical locations. With genome-wide<br />

analysis of single nucleotide polymorphisms<br />

(SNPs) we can measure genetic evolution by<br />

comparing SNP changes in a genome and estimate<br />

the mutation rate over time of a particular<br />

pathogen of interest. This estimated mutation<br />

rate can then be used to determine if there may<br />

have been a transmission between any two<br />

patients. Namely, if the number of changes<br />

occurring over time in a pathogen sequenced<br />

from two patients is within the typical number<br />

of changes we expect to see based on the expected<br />

mutation rate, then we can infer that a<br />

transmission may have occurred. If the number<br />

of changes is outside the typical mutation rate,<br />

then we can conclude that a transmission likely<br />

did not occur. We analyzed 149 non-duplicate<br />

Enterococcus faecium isolates belonging to<br />

clone ST736 to estimate the mutation rate for<br />

E. faecium isolates to help identify a probable<br />

path of the spread of the pathogen. These isolates<br />

were collected from 106 patients at Westchester<br />

Medical Center, New York, on various<br />

dates from 2009 to 2013. Out of these 106<br />

patients, 28 had two or more isolates, which<br />

were compared pairwise to estimate the mutation<br />

rate using linear regression on the number<br />

of SNP differences over time for 59 pairs of<br />

isogenic samples. To visualize the mutation<br />

rates of multidrug resistant isolates, separate<br />

mutation rates were calculated for 7 pairs of<br />

samples that maintained their status as daptomycin<br />

susceptible (dap S), and 32 pairs that<br />

maintained daptomycin non-susceptible (dap<br />

NS) status. We also analyzed pairs of isolates<br />

that converted from dap S to dap NS (8 pairs)<br />

or vice-versa (12 pairs). The mutation rate was<br />

observed to be 10.0 Substitutions per Mb per<br />

Year (SMY) after filtering for outliers (mean ±<br />

3SD) through the mean shift approach. Paired<br />

isolates maintaining dap S status had a mean<br />

mutation rate of 5.85 SMY, while the paired<br />

dap NS isolates had a mutation rate of 12.3<br />

SMY. For conversion of isolates from dap S<br />

to dap NS the mutation rate was 12.6 SMY<br />

while dap NS to dap S had a particularly high<br />

mutation rate of 53.2 SMY. These results are<br />

52<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

comparable to a recent study which reported<br />

mutation rates for different clades of a phylogenetic<br />

tree constructed from 73 enterococci<br />

isolates. The rate was 49 ± 3 SMY for a clade<br />

belonging to clonal complex CC17, and was<br />

3.6 ± 0.6 SMY and 13 ± 2 SMY for two<br />

clades with mixed STs. We have developed<br />

a method for mutation rate inference using<br />

SNPs obtained from NGS data. Standardizing<br />

the filtering process will allow expanding this<br />

approach to other species. With studies reporting<br />

variation in mutation rates within species<br />

such procedural methodology becomes all the<br />

more important. This bioinformatics approach<br />

to identify disease transmission paths can help<br />

recognize, control, and diagnose bacterial<br />

clinical outbreaks.<br />

n 21<br />

COMPARISON OF RNA AND DNA<br />

EXTRACTION KITS FOR VIRUS DETECTION<br />

BY NEXT GENERATION SEQUENCING (NGS)<br />

J. Klenner, C. Kohl, P. Dabrowski, A. Nitsche;<br />

Robert Koch Institute, Berlin, GERMANY.<br />

A crucial step in the molecular detection of<br />

viruses in clinical specimens is the efficient<br />

extraction of viral nucleic acids. The total yield<br />

of viral nucleic acid from a clinical specimen<br />

is dependent on the specimens’ volume, the<br />

initial virus concentration and the effectiveness<br />

provided by the extraction method. Recent<br />

Next Generation Sequencing (NGS)-based<br />

diagnostic approaches provide a molecular<br />

‘open view’ into the sample, as they generate<br />

sequence reads of any nucleic acid present<br />

in a specimen in a statistically representative<br />

manner. However, since a higher virus-related<br />

read output promises better sensitivity in the<br />

subsequent bioinformatic analysis, the extraction<br />

method selected determines the reliability<br />

of diagnostic NGS. In this study four commercially<br />

available nucleic acid extraction<br />

kits (QIAGEN, Hilden, Germany: QIAamp<br />

Viral RNA Mini Kit, QIAamp DNA Blood<br />

Mini Kit, QIAamp cador Pathogen Mini Kit<br />

and QIAamp MinElute Virus Spin Kit) were<br />

evaluated by NGS. The nucleic acid yields<br />

and sequence read output were compared for<br />

four different model viruses comprising Reovirus,<br />

Orthomyxovirus, Orthopoxvirus and<br />

Paramyxovirus, each at defined but varying<br />

concentrations in the same sample. The total<br />

nucleic acid extracted was divided into two<br />

aliquots; one was subjected to RNA and the<br />

other to DNA processing for NGS. The yield<br />

of nucleic acids was determined by Qubit and<br />

virus-specific quantitative real-time PCR. NGS<br />

libraries were prepared for sequencing on<br />

the Illumina HiSeq 1500 system. Finally, the<br />

percentage of reads which could be assigned<br />

to each virus after extraction with the different<br />

kits was determined via mapping. As presented<br />

here, evaluation of the different commercial<br />

nucleic acid extraction kits indicates little<br />

deviation in the numbers for RNA and DNA<br />

reads, depending on the kit used.<br />

n 22<br />

TRANSMISSION OF HIGH RISK K.<br />

PNEUMONIAE CLONES IN HEALTH CARE<br />

NETWORKS LARGELY CHALLENGES THE<br />

CURRENT INFECTION PREVENTION AND<br />

CONTROL SYSTEM<br />

K. Zhou, M. Lokate, R. H. Deurenberg, G. C.<br />

Raangs, H. Grundmann, A. W. Friedrich, J. W.<br />

Rossen;<br />

University of Groningen, University Medical<br />

Center Groningen, Groningen, NETHER-<br />

LANDS.<br />

Controlling dissemination of multidrugresistant<br />

pathogens remains one of the major<br />

challenges in hospitals and public health. Here<br />

we describe an inter-institutional transmission<br />

of an ESBL-producing ST15 Klebsiella pneumoniae<br />

between patients caused by patient<br />

referral. An epidemiological link between<br />

the patient isolates was supported by patient<br />

contact tracing and phylogenetic analysis of<br />

the isolates obtained from May to November<br />

2012 using next generation sequencing (NGS).<br />

By May 2013, a patient treated in two institutions<br />

in two cities was involved in expanding<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

53


Poster <strong>Abstracts</strong><br />

the cluster. A clone-specific multiplex PCR<br />

was developed for patient screening by which<br />

another patient was identified in September<br />

2013. Environmental surface contamination<br />

and lack of consistent patient screening were<br />

identified as risk factors. Our study highlights<br />

the challenge of controlling the transmission<br />

of K. pneumoniae high risk clones (HiRiCs),<br />

suggesting the necessity for active surveillance<br />

and inter-institutional collaboration for<br />

outbreak management. In addition, the use of<br />

NGS for typing and for developing an outbreak-specific<br />

multiplex PCR facilitated rapid<br />

patient screening procedures and was important<br />

for optimizing outbreak management.<br />

n 23<br />

WHOLE GENOME SEQUENCING OF FOUR<br />

ENTEROPATHOGENIC E. COLI STRAINS<br />

ASSIGNED TO A NEW SEQUENCE TYPE<br />

ST4554<br />

M. Ferdous 1 , K. Zhou 1 , A. M. Kooistra-Smid 2 ,<br />

A. W. Friedrich 1 , J. W. Rossen 1 ;<br />

1<br />

University of Groningen, University Medical<br />

Center Groningen, Groningen, NETHER-<br />

LANDS, 2 University Medical Center Groningen<br />

and Certe Laboratory for Infectious Diseases,<br />

Groningen, NETHERLANDS.<br />

Enteropathogenic Escherichia coli (EPEC) is a<br />

leading cause of infantile diarrhoea in developing<br />

countries. They are comprised of a large<br />

heterogeneous group of strains and serotypes.<br />

EPEC strains assigned to a new sequence type<br />

(ST4554) were isolated from four different<br />

patients with gastrointestinal complaints in<br />

two different regions of the Netherlands during<br />

July 2013-February 2014. No epidemiological<br />

link was found between the four patients.<br />

All four isolates were non-motile, beta glucuronidase<br />

negative, sorbitol fermenting and<br />

of phylogenetic group B2. All were resistant<br />

to ampicillin, three to trimethoprim and trimethoprim/sulfamethoxazole<br />

whereas the<br />

fourth one was resistant to tetracycline. Whole<br />

genome sequencing (WGS) was performed on<br />

the four strains for detailed characterization<br />

and to compare them with their closest relative<br />

E. coli strain. The serogenotype and the<br />

presence of antibiotic resistance and virulence<br />

genes were determined using the Centre for<br />

Genomic Epidemiology (CGE) web tool. Phylogenetic<br />

analysis was done using Ridom Seq-<br />

Sphere+. The serogenotype of the isolates was<br />

O157:H39. Virulence profiling revealed the<br />

presence of adhesin genes eae (type kappa), tir,<br />

iha, and secretion system genes espA, espC,<br />

espI, espJ, and etpD. Virulence genes were<br />

located on a pathogenicity island, plasmid or<br />

insertion element. The antibiotic resistance<br />

genes strA, strB, sul and dfrA were located on<br />

a pCERC1-like resistant plasmid found in E.<br />

coli strain S1.2.T2R, whereas the tetA gene<br />

was located on transposon Tn121. Phylogenetic<br />

analysis using core genome MLST revealed<br />

that three of our isolates contained no different<br />

alleles but differed from the fourth one in 62<br />

alleles. Subsequent SNP analysis revealed 20<br />

SNPs among the three isolates: 5 non-synonymous,<br />

4 synonymous and 11 in intergenic<br />

regions. E. coli strain E2348/69 was found as<br />

the most closely related of our isolates when<br />

determining the phylogeny including 60 complete<br />

genomes of E. coli available in NCBI. A<br />

genome-wide comparison of the four isolates<br />

with E. coli E2348/69 shows that they lacked<br />

most of the mobile genetic elements (MGEs)<br />

found in E. coli E2348/69 except one prophage<br />

pp1, one insertion element IE5 and two pathogenicity<br />

islands (LEE and espC pathogenicity<br />

island). They possess additional MGEs as, e.g.,<br />

prophage p16 of EPEC O26:H11 strain 11368,<br />

plasmid pO55 of E. coli O55:H7 and plasmid<br />

pSS_O46 of Shigella sonnei. Therefore, dissemination<br />

of virulence genes by MGEs may<br />

have resulted in these new pathogenic strains.<br />

WGS allowed us to get insight into the virulent<br />

and resistant determinants of the new EPEC<br />

isolates and to reveal their phylogenetic relationship<br />

with known E. coli strains.<br />

54<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

n 24<br />

MOLECULAR CHARACTERIZATION OF SHIGA<br />

TOXIN PRODUCING ESCHERICHIA COLI<br />

(STEC) ISOLATES IN THE NETHERLANDS<br />

M. Ferdous 1 , A. M. Kooistra-Smid 2 , R. F. de<br />

Boer 2 , P. D. Croughs 3 , I. H. Friesema 4 , J. W.<br />

Rossen 1 , A. W. Friedrich 1 ;<br />

1<br />

University of Groningen, University Medical<br />

Center Groningen, Groningen, NETHER-<br />

LANDS, 2 University Medical Center Groningen<br />

and Certe Laboratory for Infectious<br />

Diseases, Groningen, NETHERLANDS, 3 Star-<br />

MDC, Rotterdam, NETHERLANDS, 4 National<br />

Institute for Public Health and the Environment,<br />

Bilthoven, NETHERLANDS.<br />

Shiga toxin producing Escherichia coli (STEC)<br />

is one of the major causes of human gastrointestinal<br />

disease and has been implicated in<br />

sporadic cases and outbreaks of diarrhoea,<br />

hemorrhagic colitis and haemolytic uraemic<br />

syndrome worldwide. Whole genome sequencing<br />

(WGS) was performed on 131 STEC<br />

isolates obtained from faeces of patients with<br />

gastrointestinal complaints from the regions<br />

Groningen and Rotterdam of the Netherlands,<br />

during April 2013 to March 2014 as part of the<br />

STEC-ID-net study. Multilocus sequence type,<br />

serogenotype, Shiga toxin encoding gene (stx)<br />

subtype and the presence of antibiotic resistance<br />

and virulence genes were determined<br />

using the Centre for Genomic Epidemiology<br />

(CGE) web tool. Phylogenetic analysis was<br />

done using Ridom SeqSphere+. Based on clinical<br />

symptoms (available for 76 patients), patients<br />

were divided into severe (n=28), moderate<br />

(n=35) and mild (n=13) groups. Statistical<br />

analyses were done by the Pearson Chi-Square<br />

and Fisher’s exact test. Based on WGS data a<br />

diversity of serotypes and sequence types (ST)<br />

were found with serotypes O91:H14 (14.5%),<br />

O157:H7(13%), O26:H11(11%), O103:H2<br />

(8%), O128:H2 (5%), O63:H6 (4%), O5:H9<br />

(3%) and sequence types ST33 (14.5%), ST11<br />

(13%), ST21 (13%), ST17 (7.6%), ST25<br />

(4.5%), ST583 (4%) being the most predominant<br />

ones. Several stx subtype combinations<br />

including stx1a (41%), stx2c, (8%), stx2f<br />

(8%), stx1c+stx2b (8%), stx1a+stx2c (7%),<br />

stx1c (7%), stx1a+stx2a (6%) and stx2a (6%)<br />

being predominant were found among the<br />

isolates. The presence of stx1 and stx2 gene<br />

together was significantly (P=0.02) associated<br />

with the severe patient group (46%). Among<br />

virulence genes other than stx, the toxin encoding<br />

genes toxB, astA and non-LEE-encoded<br />

effector protein encoding gene nleC were<br />

detected with high prevalence in this group<br />

(P


Poster <strong>Abstracts</strong><br />

n 25<br />

RAPID DIAGNOSTIC TESTING OF<br />

MYCOBACTERIUM TUBERCULOSIS<br />

CULTURES USING WHOLE-GENOME<br />

SEQUENCING<br />

P. Lapierre, J. Shea, T. Halse, M. Shudt, P. Van<br />

Roey, V. Escuyer, K. Musser;<br />

NY DOH Wadsworth Center, Albany, NY.<br />

Mycobacterium tuberculosis (MTB) remains<br />

an important pathogen today, infecting more<br />

than a third of the world population. The cost<br />

associated with the diagnostic and treatment<br />

of MTB can be considerable, specifically in<br />

cases involving MDR or XDR strains. Weeks<br />

to months are usually required to identify<br />

mycobacterial species, determine drug susceptibilities,<br />

and generate molecular genotyping<br />

for epidemiological purposes using traditional<br />

methods due to the slow growth rate of<br />

MTB. Whole genome sequencing (WGS) is<br />

perceived as a potential alternative for MTB<br />

diagnostics that will greatly improve the time<br />

required for the complete molecular profiling<br />

and antibiotic resistance prediction of new<br />

MTB cases. We have conducted a retrospective<br />

study on 87 clinical isolates, consisting of 51<br />

pure isolates on solid media and 36 early positive<br />

liquid cultures (MGIT), to determine the<br />

feasibility of using WGS as part of our routine<br />

testing algorithm for MTB samples. We have<br />

developed an efficient DNA extraction and<br />

library preparation method that yield consistent<br />

depth and data quality from WGS, utilizing<br />

Illumina MiSeq instrumentation. We built<br />

a bioinformatics pipeline that compares the<br />

Illumina reads against M. tuberculosis H37Rv<br />

reference strain for identification, in silico<br />

spoligotyping, resistance profiling and phylogenetic<br />

analysis. Our results show an almost<br />

complete concordance for strain identification<br />

and spoligotype determination when compared<br />

with current molecular methods, as well as<br />

more than 90% concordance between our in<br />

silico resistance prediction and our current<br />

molecular and conventional drug susceptibility<br />

testing methods. We estimate the total time<br />

required using WGS as a complete diagnostic<br />

test for culture samples to be approximately 7<br />

days. Given the diagnostic accuracy and reproducibility<br />

of this method, and the significant<br />

reduction in time from receipt of specimen to<br />

WGS results to predict antibiotic resistance, it<br />

is financially appealing to adopt this technology<br />

in a public health setting.<br />

n 26<br />

CHARACTERIZATION OF AN IMI-1<br />

CARBAPENEMASE-PRODUCING COLISTIN-<br />

RESISTANT ENTEROBACTER CLOACAE BY<br />

GENOME SEQUENCING AND INSERTIONAL<br />

MUTAGENESIS<br />

Z. Zong;<br />

West China Hospital, Sichuan University,<br />

Chengdu, CHINA.<br />

Background: A carbapenem-resistant Enterobacter<br />

cloacae, WCHECl-1060, was recovered<br />

from blood of a patient after stem cell transplantation.<br />

It was found carrying blaIMI-1, a<br />

carbapenemase gene by PCR and sequencing<br />

and was high-level resistance to colistin (MIC,<br />

64 mg/L). However, its genetic context of<br />

blaIMI-1 and mechanisms for colistin resistance<br />

were unclear. Methods: WCHECl-1060<br />

was subjected to whole genome sequencing<br />

with a ca. 100× coverage using the Hiseq 2500<br />

sequencer. Reads were assembled to contigs<br />

using Spades. MLST was performed using<br />

the assembled contigs. Phages and genomic<br />

islands were predicted using PHAST and IslandViewer<br />

tools. Insertional mutagenesis by<br />

Tn5 transoposon was performed and colistinsusceptible<br />

mutants were selected using replica<br />

plating. Inverse PCR and sequencing were<br />

used to identified genes interrupted by Tn5<br />

insertion. Results: A total of 5,591,800 reads<br />

and 698,975,000 bases were obtained with<br />

a 55.8% GC content. Reads are assembled<br />

to 21 contigs that are ≥500 bp (N50 metric,<br />

714,400 bp) and contain 4,808,707 bases.<br />

WCHECl-1060 belonged to a new ST, ST420.<br />

In addition to blaIMI, WCHECl-1060 also has<br />

quinuolone-resistance genes oqxA and oqxB<br />

56<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

and fosfomycin-resistance gene fosA. blaIMI-1<br />

was located on chromosome and the 16.2-kb<br />

region containing blaIMI-1 (8.1 kb upstream<br />

and 7.3 kb downstream of blaIMI-1) has much<br />

lower GC content (37.1%), suggesting a foreign<br />

origin. Indeed, the 16.2-kb region was<br />

identified as a genomic island by IslandViewer.<br />

Two genes responsible for colistin resistance,<br />

phoP and yqjA, were identified by Tn5 insertional<br />

mutagenesis. phoP is part of the phoP/<br />

PhoQ two-component system and mutations<br />

of phoP can lead to colistin resistance among<br />

other Enterobacteriaceae species in previous<br />

reports. yqjA encodes an inner membrane protein<br />

but has not been associated with colistin<br />

resistance before. The 675-bp phoP and 660-bp<br />

yqjA of WCHECl-1060 have at least 6 and 3<br />

nucleotides differences from those of other<br />

Enterobacter strains with sequences available<br />

in GenBank, respectively. Conclusions:<br />

This is the first genome sequence of blaIMI-<br />

1-carrying Enterobacter strain. blaIMI-1 was<br />

on a genomic island, which may explain why<br />

only few E. cloacae carry this gene. Colistin<br />

resistance in WCHECl-1060 are likely due to<br />

mutations of phoP and yqjA.<br />

n 27<br />

COMPUTATIONAL PIPELINE FOR NEXT<br />

GENERATION SEQUENCING-BASED<br />

PATHOGEN DETECTION IN CLINICAL<br />

SETTINGS<br />

L. Albayrak 1 , M. Rojas 1 , K. Khanipov 1 , S. M.<br />

Chumakov 2 , G. Golovko 1 , M. Pimenova 1 , Y.<br />

Fofanov 1 ;<br />

1<br />

University of Texas Medical Branch, Galveston,<br />

TX, 2 University of Guadalajara, Guadalajara,<br />

Jalisco, MEXICO.<br />

In recent past, next-generation sequencing<br />

(NGS) had been exclusively performed in<br />

large sequencing centers; however, dramatic<br />

progress in sequencing technology resulted in<br />

cost reduction and the improvement of quality<br />

throughput allowing it to be used by universities<br />

and even individual labs. Moreover, the<br />

latest “benchtop” sequencers have progressed<br />

into extremely cost-effective tools that can be<br />

used for pathogen detection in clinical settings.<br />

In order to transform NGS technology from<br />

being an exclusively research tool to being<br />

routinely used for clinical diagnostics, major<br />

challenges must be resolved, including: 1. The<br />

vast amount of data and computational complexity<br />

of NGS analysis tools requiring data to<br />

be stored away from place of origin, separating<br />

sequencing from analysis and the decision<br />

making process (as well as raising security<br />

and privacy concerns); 2. The absence of standardized<br />

analysis and task specific reference<br />

databases; 3. The large and often complicated<br />

reports generated by NGS data analysis pipelines<br />

which usually require Ph.D. level scientists<br />

to interpret. To address those challenges,<br />

we present a bioinformatics pipeline capable<br />

of evaluating NGS samples quickly and accurately,<br />

with relatively low computational infrastructure<br />

requirements. By utilizing a novel<br />

data format that reduces the size of sequenced<br />

reads files and keeps only high quality sequences<br />

in binary format, we can easily store,<br />

manipulate, and transfer large amounts of data<br />

produced by NGS instruments. The use of a<br />

robust collection of reference gene sequences<br />

instead of complete genomes greatly reduces<br />

the size of the reference database without<br />

compromising ability to detect and identify<br />

pathogens. Clustering together genes based on<br />

nucleotide similarity and using just a single<br />

representative sequence further reduces redundancy<br />

among gene sequences. We propose to<br />

use a reference-by-reference search against<br />

sequenced reads rather than a read-by-read<br />

search algorithm as used by BWA and Bowtie<br />

which generate reads-centric files (e.g., BAM/<br />

SAM). The output of the proposed algorithm<br />

is much smaller than a read-centric approach<br />

and it is tied to the reference sequences instead<br />

of sequencing reads. The proposed methods<br />

of clustering, curation, compression and<br />

reference-by-reference sequence search paves<br />

way to bringing NGS based pathogen identification<br />

methods to clinical and field diagnostic<br />

settings.<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

57


Poster <strong>Abstracts</strong><br />

n 28<br />

EVALUATION OF METAGENOMIC RNA-SEQ<br />

FOR DETECTION OF ENTEROVIRUS D68<br />

AND RESPIRATORY VIRAL PATHOGENS IN<br />

CLINICAL SAMPLES<br />

W. Huang 1 , G. Wang 1 , H. Lin 2 , J. Zhuge 3 , S. M.<br />

Nolan 1 , N. Dimitrova 2 , J. T. Fallon 1 ;<br />

1<br />

New York Medical College, Valhalla, NY,<br />

2<br />

Philips Research North America, Briarcliff<br />

Manor, NY, 3 Westchester Medical Center, Valhalla,<br />

NY.<br />

Background: Next-generation sequencing<br />

(NGS) is becoming a novel approach<br />

for identifying the causative pathogens of<br />

infectious diseases directly from clinical<br />

specimens. However, practical application of<br />

NGS is hindered by lack of proven sensitive<br />

and reliable preparation protocols and by the<br />

bioinformatics challenge of building accurate<br />

and complete databases for data analysis.<br />

Here, we explored the potential application of<br />

NGS in detection of viral pathogens in clinical<br />

samples. Methods: A simple metagenomic<br />

RNA-Seq protocol, with reverse transcription<br />

and Illumina Nextera XT technology incorporated<br />

but without amplification or enrichment,<br />

was employed to analyze 93 nasopharyngeal<br />

swab specimens collected from patients in<br />

the lower Hudson Valley, New York during<br />

an outbreak of enterovirus D68 (EV-D68)-<br />

associated respiratory illness in 2014. Among<br />

these samples, 72 were positive for EV-D68<br />

using real-time reverse-transcriptase PCR<br />

(rRT-PCR) (J. Clin. Microbiol., 2015; 53:1915-<br />

1920). To detect EV-D68 and other pathogenic<br />

viruses, we employed three bioinformatics<br />

tools for analysis of the NGS data: alignment/<br />

mapping using Illumina MiSeq Reporter, the<br />

PathSEQ Virome, and the sequence-based<br />

ultra-rapid pathogen identification (SURPI)<br />

viral software. Results: NGS sequencing<br />

yielded an average of 103,531 clusters passing<br />

filter per sample. Compared to the results from<br />

rRT-PCR, we detected 65, 66 and 69 samples<br />

positive for EV-D68 using the three bioinformatics<br />

tools, respectively. We also found<br />

that the scores in the PathSEQ Virome were<br />

reversely correlated to the cycle thresholds<br />

in the rRT-PCR assay. Both PathSEQ Virome<br />

and SURPI viral excelled in the simultaneous<br />

detection of multiple viruses. PathSEQ stood<br />

out in the final report, convenient for clinical<br />

use; whereas SURPI viral was more sensitive<br />

in identification of EV-D68 and other viruses.<br />

In addition, from these NGS sequence reads<br />

we were able to detect a variety of inhabiting<br />

bacteria by using SURPI Bacteria. Conclusion:<br />

Our results support NGS approach is<br />

comprehensive and efficient tool for pathogen<br />

detection. With improved sequencing protocols<br />

and bioinformatics tools, the NGS-based assay<br />

has a great potential in actionable identification<br />

of pathogens in clinical and public health<br />

laboratory settings.<br />

n 29<br />

A NEXT GENERATION SEQUENCING<br />

METHOD FOR THE PAN-IDENTIFICATION<br />

OF VIRAL PATHOGENS IN CSF FROM<br />

ENCEPHALITIS CASES<br />

Y. Sun 1 , C. Parikh 1 , Y. Ku 1 , D. Lamson 2 , J. Au-<br />

Young 1 , K. St. George 2 , A. Felton 1 ;<br />

1<br />

Thermo Fisher Scientific, South San Francisco,<br />

CA, 2 New York State Department of Health,<br />

Albany, NY.<br />

Encephalitis is commonly caused by viral infections<br />

and can be associated with significant<br />

morbidity and mortality. While variable at<br />

different times of the year, overall, the causative<br />

agent fails to be identified in more than<br />

75% of cases with current diagnostic methods.<br />

PCR- and serology- based approaches are<br />

limited by the requirement of prior knowledge<br />

and decisions on the pathogens to be tested.<br />

We demonstrate here a deep-sequencing based<br />

method with the potential for a more universal<br />

approach to testing of viral encephalitis<br />

cases. Initially, synthetic cerebrospinal fluid<br />

(sCSF) samples were spiked with several human<br />

encephalitic viruses, at a range of viral<br />

58<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

loads commonly seen in clinical CSF samples.<br />

Cultured, genomically quantitated, DNA and<br />

RNA viruses were used for spiking. Testing on<br />

the new platform was performed in a blinded<br />

manner. Samples were sequenced on the Ion<br />

Proton platform at a depth of approximately<br />

70 million per sample. Based on these data, we<br />

developed a bioinformatics pipeline to detect<br />

viral genome in the samples. Sequencing reads<br />

of human origin were filtered by aligning to<br />

human genome (hg19). The remaining reads<br />

were mapped to a reference database comprising<br />

viral genomes from NCBI. From a panel<br />

of 13 samples, virus was successfully and<br />

accurately identified in all but one of the 11<br />

virus-positive samples, which contained very<br />

low viral titers. In addition, the read depth of<br />

viral sequences correlated inversely with the<br />

Ct values obtained from virus-specific qPCR<br />

assays of the samples. To establish limit of<br />

detection with this approach, the 70 million<br />

sequencing reads were subsampled randomly<br />

to a range of 5-50 million for each sample. The<br />

bioinformatics analysis demonstrated that 5<br />

million reads is sufficient for detection of viral<br />

genomes in these samples. Once the workflow<br />

pipeline was established, the same approach<br />

was applied in a blinded manner, to 8 human<br />

CSF samples from patients with encephalitis.<br />

The samples had been tested previously for 14<br />

pathogens in a qPCR panel designed to detect<br />

viral encephalitic agents. Using 5 million<br />

reads, viral genomes were accurately identified<br />

in 4 samples. . When read depth was increased<br />

to 40-45 million, viral genome was accurately<br />

identified in one of the remaining 4 samples.<br />

The other 3 samples, which had tested negative<br />

for 14 target agents in the qPCR panel,<br />

remained negative for viral genomes in the<br />

new assay. The described new technique was<br />

demonstrated to be very effective for the detection<br />

of viral pathogens in CSF samples. Future<br />

experiments are planned to refine the technology<br />

for the detection and identification of<br />

causative pathogens in CSF from patients with<br />

encephalitis.<br />

n 30<br />

APPLYING WHOLE GENOME SEQUENCING<br />

TO DETECT MOLECULAR EVENTS LEADING<br />

TO THE ACQUISITION OF CARBAPENEM<br />

RESISTANCE IN CLINICAL SAMPLES<br />

L. Appalla 1 , P. Mc Gann 1 , E. Snesrud 1 , A.<br />

C. Ong 1 , F. Onmus-Leone 1 , M. Koren 2 , Y. I.<br />

Kwak 1 , P. E. Waterman 1 , E. P. Lesho 1 ;<br />

1<br />

Walter Reed Army Institute of Research, Silver<br />

Spring, MD, 2 Walter Reed National Military<br />

Medical Center, Bethesda, MD.<br />

Background: Multi-drug resistant organisms<br />

(MDROs) have emerged as a global threat to<br />

public health. Tracking the dissemination of<br />

MDROs and controlling their transmission are<br />

major challenges for the healthcare community<br />

worldwide. We have applied whole genome<br />

sequencing (WGS) and analysis to a collection<br />

of serial isolates from a single patient to<br />

identify the molecular events that led to the<br />

acquisition of carbapenem resistance during<br />

treatment. Initial cultures from the patient<br />

contained extended-spectrum β-lactamase<br />

(ESBL)-producing, carbapenem-susceptible,<br />

Escherichia coli. Multiple rounds of ertapenem<br />

treatment were administered, but the infection<br />

recurred after each course of antibiotics. Subsequent<br />

cultures yielded carbapenem-resistant<br />

E. coli and Morganella morganii. Methods:<br />

All isolates were sequenced using the Illumina<br />

MiSeq desktop sequencer. Sequencing data<br />

was assembled using an in-house pipeline that<br />

incorporates FLASh, Btrim and the GS De<br />

Novo Assembler. Assemblies were then passed<br />

through quality control filters that check read<br />

coverage, contig number and contig size. Further<br />

analysis using the bioinformatics pipeline<br />

included 16S species identification, multilocus<br />

sequence typing, comprehensive identification<br />

of antibiotic resistance genes and the detection<br />

of single nucleotide changes, insertion/deletion<br />

events and large-scale genomic rearrangements<br />

that distinguish the isolates. Results: All E.<br />

coli isolates were multilocus sequence type<br />

131 and carried 9 antibiotic resistance genes<br />

(including blaCTX-M-27, which confers an<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

59


Poster <strong>Abstracts</strong><br />

ESBL phenotype) on an IncF plasmid. The<br />

isolates were identical by genome sequencing,<br />

with the exception of 150 kb of plasmid<br />

DNA present only in the carbapenem resistant<br />

isolates. This DNA sequence included a sixty<br />

kilobase IncN plasmid carrying the carbapenemase<br />

gene blaOXA-181, present in M. morganii.<br />

In the M. morganii plasmid, blaOXA-181<br />

was flanked by IS3000 and ISKpn19, but in all<br />

but one of the carbapenem resistant E. coli isolates,<br />

a second copy of ISKpn19 had inserted<br />

adjacent to IS3000. Conclusion: blaOXA-181<br />

was acquired by a member of the virulent<br />

sequence type 131 E. coli clonal group via an<br />

IncN plasmid from M. morganii. Because M.<br />

morganii tends to have high intrinsic resistance<br />

to imipenem, and because blaOXA-181 has<br />

relatively weak carbapenemase activity and a<br />

substrate profile that includes penicillins but<br />

not extended-spectrum cephalosporins, the<br />

presence of blaOXA-181 in these strains was<br />

nearly overlooked. However, WGS and advanced<br />

sequence analysis techniques revealed<br />

that this gene was responsible for the acquisition<br />

of carbapenem resistance by E. coli. WGS<br />

represents a powerful approach for the surveillance<br />

of multidrug resistant microbes.<br />

n 31<br />

THE UTILITY OF WHOLE GENOME<br />

SEQUENCING IN CHARACTERIZING<br />

ACINETOBACTER EPIDEMIOLOGY AND<br />

ANALYZING HOSPITAL OUTBREAKS<br />

M. A. Fitzpatrick, E. A. Ozer, A. R. Hauser;<br />

Northwestern University, Chicago, IL.<br />

Background: Acinetobacter baumannii is a<br />

frequent cause of nosocomial infections and<br />

outbreaks. Whole genome sequencing (WGS)<br />

is a promising new technique for bacterial<br />

strain typing and outbreak investigation. Here<br />

we compare the performance of conventional<br />

methods with WGS for strain typing<br />

clinical Acinetobacter isolates and analyzing<br />

a carbapenem-resistant A. baumannii (CRAB)<br />

outbreak. Methods: We performed band-based<br />

typing, multi-locus sequence typing (MLST),<br />

and WGS on 148 Acinetobacter calcoaceticus-<br />

Acinetobacter baumannii complex bloodstream<br />

isolates collected from 2005-2012. Clustering<br />

dendrograms and phylogenetic trees were<br />

constructed using the results of band-based<br />

and sequence-based typing, respectively.<br />

Discriminatory power and level of agreement<br />

of the techniques were compared. WGS was<br />

then used to analyze an ICU CRAB outbreak<br />

that occurred in our hospital during the study<br />

period. Results: Phylogenetic trees inferred<br />

from core genome SNPs confirmed three<br />

Acinetobacter species within this collection.<br />

Four major A. baumannii sequence types (STs)<br />

circulated in our hospital over the course of the<br />

study, three of which have a global distribution<br />

pattern and one of which is novel. WGS<br />

indicated that a threshold of 2500 core SNPs<br />

accurately distinguished A. baumannii isolates<br />

with the same ST from those with different<br />

STs. Conventional band-based techniques performed<br />

less well in accurately assigning isolates<br />

to ST lineages and exhibited poor agreement<br />

overall with sequence based techniques.<br />

When WGS was applied to a CRAB outbreak,<br />

we found that a threshold of 2.5 core SNPs distinguished<br />

non-outbreak strains from outbreak<br />

strains. WGS was more discriminatory than<br />

conventional band-based techniques and was<br />

used to construct a more accurate transmission<br />

map that resolved many of the plausible<br />

transmission routes suggested by PFGE and<br />

epidemiologic links. More detailed accessory<br />

genome analysis identified a plasmid that was<br />

circulating among isolates over the course of<br />

the outbreak. Conclusion: Our study demonstrates<br />

that WGS is superior to conventional<br />

techniques for A. baumannii strain typing and<br />

outbreak analysis. These findings support incorporation<br />

of WGS into healthcare infection<br />

prevention efforts.<br />

60<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

n 32<br />

KLEBSIELLA PNEUMONIAE IN ITALY:<br />

INSIGHTS FROM SHORT- AND LONG-TERM<br />

SCALE STUDIES<br />

F. Comandatore;<br />

University of Milan and University of Pavia,<br />

Milan and Pavia, ITALY.<br />

The diffusion of multi-drug resistant (MDR)<br />

pathogenic bacteria represents one of the most<br />

important issues for global public health.<br />

Indeed, during the last twenty years, a dramatic<br />

increase of nosocomial infections and<br />

outbreaks due to MDR pathogens has been<br />

reported world-wide. Klebsiella pneumoniae<br />

(Kp) isolates able to resist to third-generation<br />

cephalosporins and carbapenems have been<br />

reported in Countries spanning from Asia<br />

through Europe to America. Kp infections result<br />

in high morbidity and mortality in people<br />

with weak immune systems, such as inpatients<br />

of hospital intensive care units (ICUs).<br />

In Italy, since the first 2000s, an increasing<br />

number of MDR Kp nosocomial infections<br />

has been reported. We built an epidemiological<br />

network, connecting together six hospital<br />

microbiology units, a public health veterinary<br />

laboratory, and two university bioinformatic<br />

groups. During the first year (2014-2015),<br />

we developed in-house pipelines to perform<br />

SNP and wide-genome analyses on hundreds<br />

(up to thousands) bacterial genomes. Our first<br />

research project (AAC, 2015, vol. 53(4), 389-<br />

396) regarded 89 isolates from hospital collections.<br />

We included those genomes in a 319<br />

Kp genomes worldwide database, spanning<br />

the genetic variability across the species. The<br />

analysis of this genomic database provided<br />

important insights into the origin of the pathogenic<br />

clonal group Kp CG258, and about the<br />

emergence of Kp in Italy. Indeed, we described<br />

a huge genomic recombination (~1.3Mb size)<br />

that occurred during the emergence of the<br />

Kp CG258. The time-calibrated phylogenetic<br />

analysis allowed us to date that recombination<br />

around 1985. On the basis of that phylogenetic<br />

reconstruction, we were able to describe the<br />

four major Kp CG258 clades in Italy, and date<br />

their emergences from 2009 to 2010. The second<br />

research project was focused on a shorttime<br />

scale: we used genomic epidemiology<br />

approach to study a Kp outbreak that occurred<br />

in an Italian hospital during 2013 (JCM, 2015,<br />

00545-15). We were able to identify, and genetically<br />

describe, this pathogenic Kp clone. It<br />

resulted to belong to the Kp CG258 and to be<br />

phylogenetically associated to one of the four<br />

Italian major lineages described above. Furthermore,<br />

the SNP analysis showed that this<br />

Kp clone spread across the ICU through a sole<br />

carrier, and not from inpatient to inpatient. In<br />

conclusion, genomic studies allowed us to obtain<br />

insights on Kp epidemiology at short- and<br />

long-time scale providing useful information<br />

for outbreak control and genome evolution of<br />

this important nosocomial pathogen.<br />

n 33<br />

COMPARATIVE GENOMIC ANALYSIS OF<br />

THE FIRST TWO VAND-TYPE VANCOMYCIN-<br />

RESISTANT ENTEROCOCCUS FAECIUM IN<br />

THE NETHERLANDS<br />

M. Rogers, J. Sinnige, J. Top, E. Brouwer, M.<br />

Bonten, R. Willems;<br />

UMC Utrecht, Utrecht, NETHERLANDS.<br />

Introduction: Enterococcus feacium has<br />

rapidly become an important nosocomial<br />

pathogen. Vancomycin-resistant E. feacium<br />

(VRE) strains are of particular importance, as<br />

these are often multi-resistant, which drastically<br />

limits treatment options. To date, there<br />

have been nine different types (vanA-G, vanL-<br />

N) of vancomycin resistance gene clusters<br />

described of which vanA and vanB are most<br />

frequently found. Recently, two epidemiologically<br />

unrelated vanD VRE isolates were found<br />

in two Dutch hospitals. Here we report on the<br />

phylogenetic analyses of the first vanD VRE<br />

in the Netherlands. Methods: Whole genome<br />

sequencing of these two strains was performed<br />

using the Nextera XT DNA Library Prep<br />

Kit (Illumina) for library preparation and sequenced<br />

on the MiSeq System (Illumina) with<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

61


Poster <strong>Abstracts</strong><br />

a 2x250 bp MiSeq Reagent Kit v2 (Illumina).<br />

Quality filtering of the reads was performed<br />

using Nesoni 0.109 and reads were assembled<br />

using SPAdes 2.5.1 genome assembler. Genes<br />

were predicted and annotated using PROKKA.<br />

For the phylogenetic analysis, a total of 104<br />

strains were used (73 publicly available strains,<br />

29 strains from our lab and the 2 vanD positive<br />

strain). Core alignment was performed using<br />

Bowtie2 and SAMtools was used for SNP calling<br />

between the two vanD-type VRE strains<br />

(E7962 and E8043). OrthAgogue was used<br />

for prediction of gene orthology relations, followed<br />

by clustering of orthogroups via MCL.<br />

A phylogenetic tree was constructed based on<br />

the core genome of the 104 strains using RAx-<br />

ML. Results: Phylogenetic inferences revealed<br />

that both vanD VRE clustered in clade A1 containing<br />

mostly clinical strains. Whole genome<br />

analysis revealed considerable SNP difference<br />

(2981 SNPs; 1.1*10-3 SNPs/Mb) between the<br />

two strains’ recombination-free core genome,<br />

indicating that both strains were not clonally<br />

related. Furthermore, both strains carried the<br />

entire vanD gene cluster (vanRD, vanSD,<br />

vanYD, vanHD, vanD, vanXD) on their largest<br />

scaffold (size: ~222kb for E7962 and ~192kb<br />

for E8043) and SNP analysis of these scaffolds<br />

revealed only 9 SNPs (4.6*10-5 SNPs/Mb) between<br />

them. Further analysis showed the presence<br />

of a 76kb core-region (present in all 104<br />

strains) in these vanD-scaffolds, followed by<br />

an accessory-region (of 116kb for E8043 and<br />

146 kb for E7962) containing the vanD gene<br />

cluster and phage-related genes, present only<br />

in the two vanD-VRE. Conclusion: These<br />

results suggest that the vanD gene cluster is<br />

located on a mobile genetic element that was<br />

acquired by two clonally unrelated strains from<br />

a common third source or from each other.<br />

n 34<br />

ACUITAS ® RESISTOME TEST - A HIGH<br />

THROUGHPUT TRIAGE TOOL FOR STRAIN<br />

TYPING BY WHOLE GENOME SEQUENCING<br />

R. K. Kersey, G. T. Walker, T. Rockweiler, W.<br />

Chang;<br />

OpGen, Gaithersburg, MD.<br />

Multi-drug resistant organisms (MDROs) are a<br />

global healthcare issue associated with an increase<br />

in morbidity and mortality. Acuitas ® Resistome<br />

Test, a high throughput multiplex PCR<br />

test, detects approximately 50 antibiotic resistance<br />

genes in Gram-negative bacilli including<br />

genes for carbapenemases, Extended Spectrum<br />

Beta Lactamases (ESBLs) and AmpC enzymes<br />

carried by MDROs. The Resistome Test is<br />

useful for genotyping carbapenem and cephalosporin<br />

resistance genes for surveillance of<br />

transmission events in hospitals. We validated<br />

gene specificity of the Resistome Test through<br />

evaluation of 265 culture isolates with reported<br />

gene subtypes that were adjudicated by whole<br />

genome sequencing to resolve discrepancies,<br />

which fell into three categories (missed genes,<br />

incorrect genotype and additional genes reported).<br />

Genomic DNA was extracted from broth<br />

cultures of isolates followed by Nextera library<br />

preparation, Illumina MiSeq sequencing, genomic<br />

assembly and sequence analysis using<br />

Ridom SeqSphere+ (MLST+) software.<br />

Results showed 100% concordance between<br />

gene results from the Resistome Test and<br />

whole genome sequencing. Our second objective<br />

was to illustrate that the Resistome Test<br />

was also useful as a triage tool to select culture<br />

isolates for strain typing by whole genome<br />

sequencing (WGS). We tested 65 Klebsiella<br />

pneumoniae isolates from two studies by the<br />

Resistome Test. Each K. pneumoniae isolate<br />

was positive for two to five of the following<br />

antibiotic resistance genes in various combinations:<br />

KPC, CTX-M-1, ACT, CMY, OXA-50,<br />

OXA-2, FOX, SHV and TEM plus ESBL<br />

variants of TEM and SHV. The Resistome<br />

Test provided a level of genotypic resolution<br />

62<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

that resolved clinical strains of K. pneumoniae<br />

prior to whole genome sequencing. Clinical<br />

isolates with distinct Resistome Test results<br />

were shown to be distinct strains by whole<br />

genome sequencing while isolates with identical<br />

Resistome Test results were often identical<br />

strains by whole genome sequencing. The Resistome<br />

Test was able to resolve 40% of the 65<br />

isolates as distinct strains, thereby identifying<br />

two potential groups of strain types for higher<br />

resolution by whole genome sequencing. We<br />

concluded that Acuitas ® Resistome Test is useful<br />

for detecting carbapenem and cephalosporin<br />

resistance in Gram-negative bacilli and as<br />

a triage tool to select culture isolates for strain<br />

typing by WGS.<br />

n 35<br />

PATHOGEN DISCOVERY IN TRAVELERS’<br />

DIARRHEA OF UNKNOWN ETIOLOGY BY<br />

METAGENOMIC SEQUENCING<br />

Q. Zhu, M. Jones, S. Highlander;<br />

J. Craig Venter Institute, La Jolla, CA.<br />

Infectious diarrhea is responsible for about<br />

million deaths each year. We are studying<br />

travelers’ diarrhea (TD) where the known<br />

causative agents are members of Enterobacteriaceae,<br />

such as enterotoxigenic Escherichia<br />

coli, Shigella and Salmonella, viruses such<br />

as norovirus, and parasites such as Giardia.<br />

Nevertheless, in over 40% of cases, a known<br />

pathogen cannot be identified by traditional<br />

clinical tests. “Pathogen negative” diarrhea is<br />

enigmatic, although this due, in part to a lack<br />

of appropriate cultivation and screening tests<br />

and poor sensitivity of the tests. DNA sequencing<br />

is increasingly being applied in attempts to<br />

characterize agents of infectious disease. We<br />

hypothesize that unrecognized pathogens are<br />

responsible for a significant proportion of TD.<br />

These may be known organisms with unrecognized<br />

pathogenic potential or may be completely<br />

new species with new mechanisms of<br />

virulence. We have performed deep NextSeq<br />

paired-end sequencing of DNAs from stools of<br />

eight pathogen-negative and two healthy traveler<br />

controls in an attempt to identify pathogens.<br />

A total of 132.8 Gb (ca. 12-20 Gb raw<br />

data/sample) of sequencing data were retained<br />

after quality filtering. Reads were mapped to<br />

the NCBI RefSeq genomic database, resulting<br />

in an average mapping rate of 72.4% (min:<br />

33.1%, max: 93.8%). In the samples where<br />

mapping was low, we believe that many of<br />

the unmapped reads likely represent new uncharacterized<br />

organisms. Taxonomic profiles<br />

were generated based on the mapping results,<br />

and revealed significantly uneven distribution<br />

of microbial groups among samples. The low<br />

complexity samples appear to be dominated by<br />

a single pathogen, while the high complexity<br />

samples may be the result of a mixed etiology.<br />

Three TD samples are enriched for E. coli sequences,<br />

despite the fact that enterotoxins were<br />

not detected in clinical screens. Two of these<br />

carry genes for the Shiga toxin. Additional<br />

TD samples had, for example, high abundance<br />

of Akkermansia muciniphila, Streptococcus<br />

spp., Campylobacter jejuni, or Alistipes shahii<br />

reads, while the two healthy traveler controls<br />

had high abundance of reads that mapped to<br />

several Bifidobacterium species, which were<br />

not present in the diarrheal samples. De novo<br />

assembly was performed for each sample<br />

(average N50 statistic: 6516.4), followed by<br />

contig binning and scaffolding. Near complete<br />

draft genomes were successfully recovered<br />

from the metagenomes. Some represent known<br />

species (such as E. coli and Campylobacter),<br />

while others could not be taxonomically placed<br />

in proximity to any known bacterial groups.<br />

Our results provide a glimpse into the microbiome<br />

diversity composition and potential etiological<br />

sources in “no pathogen identified” TD<br />

samples, and demonstrate the power of highthroughput<br />

DNA sequencing in the discovery<br />

of pathogens in infectious disease.<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

63


Poster <strong>Abstracts</strong><br />

n 36<br />

WHOLE GENOME SEQUENCING OF PORCINE<br />

EPIDEMIC DIARRHEA VIRUS BY ILLUMINA<br />

MISEQ PLATFORM<br />

L. Wang 1 , T. Stuber 2 , M. Prarat 1 , P. Camp 2 , S.<br />

Robbe-Austerman 2 , Y. Zhang 1 ;<br />

1<br />

Animal Disease Diagnostic Laboratory, Ohio<br />

Department of Agriculture, Reynoldsburg, OH,<br />

2<br />

National Veterinary Services Laboratories,<br />

Animal and Plant Health Inspection Service,<br />

United States Department of Agriculture,<br />

Ames, IA.<br />

Porcine epidemic diarrhea virus (PEDV) belongs<br />

to the genus Alphacoronavirus of the<br />

family Coronaviridae. PEDV was identified as<br />

an emerging pathogen in US pig populations in<br />

2013. Since then, this virus has been detected<br />

in at least 31 states in the US and has caused<br />

significant economic loss to swine industry.<br />

Active surveillance and characterization of<br />

PEDV are essential for monitoring the virus.<br />

Obtaining comprehensive information about<br />

the PEDV genome can improve our understanding<br />

of the evolution of PEDV viruses,<br />

the emergence of new strains, and improve<br />

vaccine designs. This study investigated the<br />

use of deep sequencing by the next-generation<br />

sequencing (NGS) Illumina MiSeq platform to<br />

obtain complete genome sequence information<br />

from PEDV virus isolates. Clinical samples<br />

were first subjected to a real-time RT-PCR assay<br />

specific for PEDV. Positive samples were<br />

then amplified using 19 pairs of PEDV specific<br />

primers (targeted amplification method). The<br />

amplified PCR products were mixed and used<br />

as input DNA to prepare a DNA library using<br />

the Nextera XT kit for NGS. Alternatively,<br />

a random-priming method was applied to<br />

prepare the input DNA for clinical samples<br />

with high viral loads (real-time PCR with Ct<br />

value


Poster <strong>Abstracts</strong><br />

to cluster strains based on distribution of 1,281<br />

accessory genes. We identified three major<br />

clades (A - C) characterized by a large variation<br />

in r/m ratio: 22.7 (all uncommon STs from<br />

UK), 0.9 and 3.7, respectively. Within Clade<br />

B and C, with few exceptions (e.g. ST-11 and<br />

ST-230), sequence types (ST) did not form<br />

monophyletic lineages and were composed by<br />

numerous BAPS populations characterized by<br />

a clonal structure, limited genetic variation and<br />

no temporal or geographical signals. Further<br />

GWAS analysis, which includes both core<br />

and accessory genome, did not detect overall<br />

genetic features correlated to geographical<br />

separation. On the contrary, both phylogenetic<br />

analysis based on the distribution of accessory<br />

genes showed geographical clustering of the<br />

isolates within each BAPS group. They also<br />

revealed several events of consecutive gene<br />

flow between isolates of the two countries,<br />

suggesting migration within a population.<br />

Our genomic study supported the theory of<br />

homogeneous global distribution of C. jejuni<br />

genotypes probably associated with rapid<br />

animal and/or human movement. Contrary to<br />

what expected, several lineages within ST45cc<br />

appear to be genetically monomorphic pathogens<br />

and were persistently detected over the<br />

years independently from geographical origin.<br />

Epidemiological investigations based on core<br />

genes might be affected by small resolution<br />

due to the clonal nature of certain lineages and<br />

a pangenome approach is recommended.<br />

n 38<br />

COMPLETE GENOME SEQUENCE OF<br />

STAPHYLOCOCCUS AUREUS STRAIN FROM<br />

A PIG, A UNIQUE T324-ST541-V KOREAN<br />

METHICILLIN RESISTANT S. AUREUS<br />

CLONE<br />

S. Lim, D. Moon, G. Jang, H. Lee;<br />

Animal and Plant Quarantine Agency, Anyang,<br />

KOREA, REPUBLIC OF.<br />

Methicillin-resistant Staphylococcus aureus<br />

(MRSA) has been a major causative agent<br />

of nosocomial infection, and it has also been<br />

reported from non-human sources. Livestock<br />

associated MRSA such as sequence type (ST)<br />

398 and ST541 has been reported in pigs with<br />

a high frequency in Korea. Especially, a spa<br />

type t324, sequence type ST541, and staphylococcal<br />

cassette chromosome mec element (SC-<br />

Cmec) type V (t324-ST541-V) was one of the<br />

predominant clones in pig production industry<br />

in Korea. To better understand of occurrence,<br />

genetic repertoire, and relatedness with other<br />

MRSA types, we sequenced and assembled<br />

the complete genomes of this predominant<br />

clone. A t324-ST541-V MRSA isolate designated<br />

K12PJN53 was isolated from healthy<br />

pig in 2012. The draft genome sequence of<br />

K12PJN53 was obtained by combined analyzing<br />

the results of Illumina MiSeq and Roche<br />

454 FLX sequencing systems. Each sequencing<br />

reads were assembled by the CLC genomic<br />

workbench 5.5 and the GS Assembler 2.6. A<br />

total of 458 genome sequences were obtained<br />

from S. aureus subsp. aureus in EzGenome<br />

database were compared with K12PJN53 by<br />

calculating average nucleotide identity (ANI)<br />

values. The genome of K12PJN53 consists of<br />

a single circular 2,880,108 bp chromosome<br />

with 32.88% GC content and two plasmids. A<br />

total of 2,042 protein coding regions, 57 tRNA<br />

genes, and 10 rRNA genes were detected.<br />

Among the annotated contigs, 14, 17, and 20<br />

contigs were annotated to antibiotic resistance,<br />

adherence and toxin genes, respectively.<br />

Tetracycline, macrolide, lincosamide and<br />

streptogramin B, and aminoglycoside resistant<br />

genes were found outside of the SCCmec elements<br />

with insertion sequence (IS) 256 or 431.<br />

In addition, metal-resistant genes were also<br />

identified in the internal and external regions<br />

of SCCmec elements. Several virulence genes<br />

such as elastin binding protein, fibrinogen<br />

binding protein, clumping factors A, hemolysin,<br />

and exfoliative toxin A were presented<br />

in K12PJN53, however, Panton-Valentine<br />

leukocidin was not detected. The genomic<br />

distance based on ANI of K12PJN53 strain<br />

was similar to the ST398 strains, which have<br />

emerged in European countries. This study is<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

65


Poster <strong>Abstracts</strong><br />

the first report of the draft genome sequence of<br />

novel livestock-associated MRSA ST541 strain<br />

isolated from a pig in Korea. This genome<br />

sequence assists to understand features of<br />

ST541 lineage including antibiotic resistance<br />

and virulence genes.<br />

n 39<br />

WHOLE GENOME SEQUENCING OF<br />

SALMONELLA NEWPORT CLONE<br />

JJPX01.0061 REVEALS PHYLOGENETIC<br />

EVIDENCE FOR ENDEMIC PERSISTENCE<br />

AND EXTENSIVE MICROEVOLUTIONARY<br />

DIVERSIFICATION AMONG EASTERN SHORE<br />

SURFACE WATERS<br />

R. L. Bell, C. Ferreira, E. Reed, C. Wang, E.<br />

Burrows, T. Muruvanda, J. Zheng, M. W. Allard,<br />

J. Pettengill, E. Strain, E. Brown;<br />

FDA, College Park, MD.<br />

Recurrent outbreaks of Salmonella enterica<br />

serovar Newport, XbaI PFGE pattern<br />

JJPX01.0061, have been linked to the consumption<br />

of tomatoes grown along the Eastern<br />

Shore of Virginia (VES) at least 6 times since<br />

2002. Environmental surveys of this region<br />

suggest that this subtype is endemic, persisting<br />

in surface waters of the VES microcosm.<br />

The genomic diversity of a large population of<br />

JJPX01.0061 isolated from disparate surface<br />

waters across the VES was investigated using<br />

whole genome sequencing (WGS) approaches.<br />

More than 70 environmental JJPX01.0061<br />

isolates spanning seven years from the VES<br />

were subjected to whole genome shotgun<br />

sequencing, assembled, and aligned using<br />

a reference-based mapping approach to a<br />

closed Newport genome (CFSAN024225). A<br />

maximum likelihood tree was then constructed<br />

based on total SNP variation and evaluated<br />

in light of geographic variation based on the<br />

location from which each isolate was collected.<br />

Genomic diversity within this PFGE<br />

subtype clustered the strains into four distinct<br />

clades or “genomovars” separated by a range<br />

of only 5 to more than 100 SNPs. Two clades<br />

(C and D) sorted isolates uniquely based on<br />

specific creeks and were identical (C, 0 SNPs<br />

intraclade variation) or nearly identical (D, 1<br />

SNPs). Two additional clades (A and B) were<br />

polyphyletic with respect to creek location<br />

suggesting two independent introductions<br />

of JJPX01.0061 variants into each of these<br />

locales. Finally, clade B, comprised largely<br />

of Newports from 2007, was separated from<br />

more recent Newport groupings by at least 100<br />

SNPs suggesting that substantial microevolutionary<br />

change has accrued within this lineage,<br />

a find consistent with its establishment and<br />

prolonged environmental persistence. Genetic<br />

diversification of JJPX01.0061supports its<br />

long-term and endemic persistence within this<br />

regional microcosm. Occasional reintroduction<br />

of distinct genomic variants into common<br />

creek environments is also seen pointing to<br />

a potential role for geese or other water fowl<br />

species in the local mixing of isolates.<br />

n 40<br />

MOLECULAR TYPING OF BRUCELLA<br />

MELITENSIS BY WHOLE GENOME<br />

SEQUENCING<br />

G. Garofolo 1 , J. T. Foster 2 , K. Drees 2 , I. Platone<br />

1 , K. Zilli 1 , M. Marcacci 1 , M. Ancora 1 , C.<br />

Cammà 1 , I. Mangone 1 , F. De Massis 1 , P. Calistri<br />

1 , E. Di Giannatale 1 ;<br />

1<br />

Istituto Zooprofilattico Sperimentale<br />

dell’Abruzzo e del Molise, Teramo, ITALY,<br />

2<br />

Department of Molecular, Cellular, and Biomedical<br />

Sciences, University of New Hampshire,<br />

Durham, NH, USA, NH.<br />

Brucella melitensis is the causative agent of<br />

brucellosis in sheep and goats, causing public<br />

health concerns through the consumption of<br />

contaminated milk or in people in close contact<br />

with livestock. Despite concerted eradication<br />

campaigns, the disease remains endemic<br />

throughout much of the Mediterranean basin.<br />

Brucella is highly clonal so single nucleotide<br />

polymorphism (SNPs) in a phylogenetic<br />

framework can be used for determining its<br />

66<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

evolution. Determining which lineages are<br />

present throughout Italy is crucial to understanding<br />

brucellosis epidemiology as well as to<br />

place Italian samples into a global evolutionary<br />

context. The aim of this study was to evaluate<br />

an NGS approach using different analyses<br />

to investigate B. melitensis diversity in Italy.<br />

Previous genetic assessment using variable<br />

number tandem repeats (VNTRs) revealed the<br />

presence of the west Mediterranean lineage<br />

structured in several clades that were sometimes<br />

geographically constrained. We selected<br />

16 B. melitensis isolates for whole genome<br />

sequencing representing maximum genetic and<br />

geographic diversity. Libraries were sequenced<br />

with both paired-end Illumina and Ion Torrent<br />

sequencing, and our analyses included 59<br />

publicly available B. melitensis genomes. We<br />

performed SNP analysis in read alignments<br />

and whole genomes using NASP pipeline, and<br />

in parallel we applied a gene-based approach<br />

using MLST+. Approximately 22,000 putative<br />

SNPs were identified among the B. melitensis<br />

samples. The MLST+ revealed 1,748 targets<br />

for the first chromosome and 876 targets for<br />

the second chromosome totaling 2,624 loci.<br />

Both approaches found that the Italian isolates<br />

formed 3 sub-clades as part of the B. melitensis<br />

strain Ether lineage, a strain that was isolated<br />

in Italy fifty years ago, suggesting that this<br />

lineage has been well established and successful<br />

for at least over half century in the Italian<br />

peninsula. Finally, we compared our results to<br />

a recently published shotgun metagenomic B.<br />

melitensis sequence from bones from a medieval<br />

grave in Sardinia (Italy) and confirm that<br />

this same lineage has been present in the region<br />

for centuries. This study is a step forward<br />

in understanding of Brucella evolution in the<br />

Mediterranean area, and demonstrates the utility<br />

of WGS SNP analysis, and the feasibility<br />

of MLST+ as a fast and reliable typing system<br />

for analyzing the epidemiology of Brucella in<br />

endemic regions.<br />

n 41<br />

GENOMIC EPIDEMIOLOGY AND<br />

TRANSMISSION OF SALMONELLA<br />

CHOLERAESUIS VAR. KUNZENDORF IN<br />

EUROPEAN PIGS AND WILD BOAR<br />

P. Leekitcharoenphon, F. M. Aarestrup, R. S.<br />

Hendriksen;<br />

Technical University of Denmark, Kgs. Lyngby,<br />

DENMARK.<br />

Salmonella Choleraesuis is a relative infrequent<br />

serovar adapted to pigs but also have a<br />

propensity to cause extraintestinal infections<br />

in humans. S. Choleraesuis var. Kunzendorf<br />

are responsible for the majority of outbreaks<br />

among pigs. The global transmission was<br />

believed to be a result of imported breeding<br />

pigs from Canada and the USA into Taiwan.<br />

In Europe, S. Choleraesuis is a relatively rare<br />

serovar, both in slaughter pigs and in breeding<br />

herds. In Denmark, only a few outbreaks have<br />

been reported among pig herds within the last<br />

decade; 1999 - 2000 and 2012 - 2013 and in<br />

both cases it has been impossible to identify<br />

the route of transmission and source of infection.<br />

In order to understand transmission and<br />

epidemiology of S. Choleraesuis, we have<br />

sequenced 108 S. Choleraesuis isolates from<br />

pig and wild boar from 12 European countries<br />

and USA. We applied SNP-based phylogenetic<br />

methods based on whole genome sequences to<br />

identify the population structure. We used Baysian<br />

phylogeny to estimate dates of divergence<br />

and phylogeographic analyses of lineages by<br />

using BEAST with Bayesian Skyline model of<br />

population size change and relaxed uncorrelated<br />

lognormal clock as the molecular clock.<br />

The S. Choleraesuis isolates yielded 2,428<br />

SNPs. We estimated that the ancestral emergence<br />

of S. Choleraesuis was in 1946. The<br />

mean evolutionary rates were approximated to<br />

be 1.58 x 10-6 SNPs/site/year corresponding<br />

to 7.5 SNPs/year. The isolates were divided<br />

into two complex clusters and they resided<br />

in sub-clusters according to countries and<br />

neighbour countries of isolation. According to<br />

the source of isolation, the wild boar isolates<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

67


Poster <strong>Abstracts</strong><br />

from Austria were clustered together but the<br />

wild boar isolates from Germany, Hungary<br />

and Estonia were clustered with strains from<br />

pig. The outbreak isolates from Denmark were<br />

distantly divided into two groups according to<br />

outbreak period. The recent outbreak isolates<br />

(2012-2013) contained an extra Q1 replicon,<br />

whereas some earlier outbreak isolates (1991-<br />

2000) had an extra I1 replicon. Some of the<br />

earlier outbreak strains were isolated from<br />

different organs from the same animal. Those<br />

isolates clustered by the pig where they were<br />

isolated from. Danish isolates together with<br />

farm geographical information were subjected<br />

to the discrete phylogeographic analysis using<br />

a standard Continuous-Time Markov Chain<br />

(CTMC). The spatial and temporal transmission<br />

of S. Choleraesuis isolates between different<br />

farms in Denmark was epidemiologically<br />

concordant with the data showing the contact<br />

between farms. These results provide the advantage<br />

of using WGS for elucidating evolution<br />

and transmission of S. Choleraesuis and<br />

emphasize the usefulness of the phylodynamic<br />

approaches to monitor the emergence and<br />

spread over time of these particular strains.<br />

These findings may help to promoting and<br />

establishing future prevention and control<br />

measurement of similar successful clones.<br />

n 42<br />

USING WHOLE GENOME SEQUENCING<br />

AND PHYLOGENETIC METHODOLOGIES<br />

TO CLUSTER SALMONELLA ENTERITIDIS<br />

ISOLATES BY SOURCE<br />

E. L. Stevens;<br />

Food and Drug Administration, College Park,<br />

MD.<br />

The Center for Food Safety and Applied Nutrition<br />

(CFSAN) collaborated with other state<br />

and federal labs to develop GenomeTrakr<br />

which houses the genomic sequences of thousands<br />

of clinical and environment/food isolates<br />

at the National Center for Biotechnology Information<br />

(NCBI). This repository of genetic data<br />

facilitates both surveillance and response to<br />

foodborne outbreaks. Whole genome sequencing<br />

(WGS) allows clusters of clinical isolates<br />

to be grouped and linked to the environmental/<br />

food source using the science of molecular<br />

phylogenetics and has been successfully applied<br />

in resolving outbreaks. Source attribution<br />

is the ultimate goal of combining sequence and<br />

epidemiological data for outbreak resolution.<br />

One important pathogen, Salmonella enterica<br />

serotype Enteritidis (SE), which caused 19%<br />

Salmonella illnesses in 2013 (19%), has been<br />

highlighted by epidemiologic insight as frequently<br />

coming from shell eggs. The Food<br />

and Drug Administration (FDA) published<br />

the 2010 Egg Rule to reduce Salmonella incidence<br />

rates caused by SE. However, other food<br />

and non-food sources such as beef, turkey,<br />

and chicken consumption, contact with live<br />

poultry, and international travel have been<br />

implicated in SE illness. Past methods have<br />

been unable to fully resolve some outbreaks<br />

because of the genetic similarity of SE, which<br />

WGS can resolve. Therefore, SE WGS data<br />

from GenomeTrakr was collected with the<br />

following parameters: (1) if it came from the<br />

United States; (2) had associated metadata<br />

indicating source, geographic location, clinical/environmental<br />

status, isolation source (e.g.<br />

chicken), and collection date. From there, SE<br />

isolates were summarized according to isolation<br />

source (e.g. specific to egg products), and<br />

a cluster analysis was performed to identify the<br />

variability of SE strains among the different<br />

food commodities to see if further analysis can<br />

determine genetic loci that may be predictive<br />

of SE source attribution. These aims involved<br />

using phylogenetic methods based on WGS<br />

to link environmental/food isolates with each<br />

other due to their shared similarity (i.e. few<br />

single nucleotide polymorphisms (SNPs)<br />

between them). This work may ultimately provide<br />

information in which the sequence data<br />

of a single isolate derived from a clinical case<br />

of salmonellosis could help inform the epidemiologic<br />

analysis by suggesting a potential<br />

source of the illness. Furthermore, these results<br />

can help to inform the effect of The 2010 Egg<br />

Rule.<br />

68<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

n 43<br />

CORRELATING PROPORTIONAL<br />

ABUNDANCES OF SALMONELLA, IN<br />

A COMPLEX METAGENOMES, WITH<br />

COVERAGE OF THE SALMONELLA GENOME<br />

K. G. Jarvis 1 , N. Daquigan 1 , J. R. White 2 , C. J.<br />

Grim 1 , D. E. Hanes 1 ;<br />

1<br />

FDA/CFSAN, Laurel, MD, 2 Resphera Biosciences,<br />

Baltimore, MD.<br />

Background: Culturing Salmonella from fresh<br />

produce requires resuscitation in nonselective<br />

pre-enrichment broths, followed by parallel<br />

selective enrichments in Tetrathionate (TT)<br />

and Rappaport-Vassiliadis (RV) broths, to inhibit<br />

background microflora. The cilantro microbiome<br />

consists mainly of Bacteroidetes and<br />

Proteobacteria. However, overnight nonselective<br />

pre-enrichment results in a shift to a lower<br />

proportional abundance of Proteobacteria and<br />

a dramatic increase in Firmicutes. Purpose:<br />

This study evaluates microbiome shifts over<br />

time in spiked cilantro cultures, and correlates<br />

the relative abundance of Salmonella to coverage<br />

of the 4.8 Mb genome. Methods: The<br />

microbiome of cilantro, spiked with S. enterica<br />

Newport at ~4CFU/25gm, was sequenced after<br />

0, 6, and 24-hours in nonselective Tryptic Soy<br />

Broth, utilizing an Illumina MiSeq. High-coverage<br />

control microbiomes, which were transferred<br />

to selective TT and RV broths, in parallel,<br />

and incubated for 24-hours following nonselective<br />

pre-enrichment, were also sequenced.<br />

Most Probable Number (MPN) analysis was<br />

performed at 6 and 24-hours to estimate CFU/<br />

ml Salmonella. MetaPhlAn and Resphera Insight<br />

software were used to analyze shotgun<br />

metagenomes, and 16S rRNA sequences, respectively.<br />

Metagenomic sequence reads were<br />

mapped to an S. enterica Newport genome<br />

by BLASTn to estimate Salmonella genome<br />

coverage. Results: The 0-hour cultures had<br />

9,511 BLASTn hits, with 579,864 (12%) bases<br />

covered, and an average read coverage of<br />

3.68. At 6-hours, BLASTn hits to Salmonella<br />

increased to 59,951 with an average read coverage<br />

of 16.96 accounting for 832,017 (17%)<br />

bases. BLASTn hits to Salmonella at 24-hours<br />

reached 175,532, and accounted for 1,081,888<br />

(22%) bases, with an average read coverage<br />

of 39.17. BLASTn hits in RV and TT cultures<br />

reached 18,110,119 and 21,096,773, representing<br />

average read coverages of 842 and 1033<br />

respectively, equating to 4,874,108 (99.94%)<br />

and 4,876,681 bases (99.99%) of the Newport<br />

genome represented in the RV and TT cultures.<br />

Proportional abundances of Salmonella 16S<br />

rRNA genes in 0, 6, and 24-hours cultures<br />

were 0.004% (1 of 25,000 reads, or 4 CFU/<br />

ml), 0.00% (0 of 25,000 reads, or 9 CFU/ml),<br />

and 5.81% (1,453 of 25,000 reads or 3.9 X 107<br />

CFU/ml), respectively. Salmonella proportional<br />

abundances in shotgun metagenomes were<br />

0.02%, 0.01%, 0.04%, 92%, and 86% for the<br />

0, 6, 24-hour, and, RV, and TT cultures, respectively.<br />

Conclusions: Correlating the proportional<br />

abundance of Salmonella with coverage<br />

of the genome, and CFU/ml in metagenomes<br />

is essential for understanding detection limits<br />

in complex food matrices such as cilantro.<br />

Careful consideration of analysis software to<br />

account for discrepancies in metagenomic databases<br />

is also important.<br />

n 44<br />

COMPARATIVE GENOMICS OF DRUG-<br />

RESISTANT SALMONELLA ENTERICA<br />

ISOLATED FROM DAIRY CATTLE AND<br />

HUMANS<br />

L. Carroll 1 , M. Wiedmann 1 , H. den Bakker 2 ,<br />

J. Siler 1 , M. Davis 3 , W. Sischo 3 , T. Besser 3 , L.<br />

Warnick 1 , R. Pereira 1 ;<br />

1<br />

Cornell University, Ithaca, NY, 2 Texas Tech<br />

University, Lubbock, TX, 3 Washington State<br />

University, Pullman, WA.<br />

Salmonella enterica is a pathogen of concern<br />

for both humans and cattle. It can be spread<br />

from cattle to human populations through direct<br />

contact with animals shedding Salmonella,<br />

as well as through the food chain. Infections<br />

caused by multidrug-resistant isolates can<br />

be challenging to treat, making multidrugresistant<br />

Salmonella a relevant human health<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

69


Poster <strong>Abstracts</strong><br />

hazard. The objective of this study was to use<br />

whole genome sequencing to study the evolutionary<br />

relationship of antibiotic-resistant<br />

S. Typhimurium, S. Newport, and S. Dublin<br />

isolated from dairy cattle and humans in Washington<br />

state and New York state from 2008 to<br />

2012. A total of 90 drug-resistant Salmonella<br />

isolates were selected for this study, 45 of<br />

which were from Washington state (20 from<br />

dairy cattle and 25 from humans) and 45 from<br />

New York state (21 from dairy cattle and 24<br />

from humans). Isolates were selected based<br />

on location, source, and serotype stratified by<br />

year. All isolates were tested for phenotypic<br />

antimicrobial resistance to 12 drugs using Kirby-Bauer<br />

disk diffusion, and all isolates were<br />

resistant to at least one drug. Genomes of all<br />

isolates were sequenced at Cornell University<br />

using the Illumina HiSeq platform and assembled<br />

de novo using SPAdes. In silico MLST<br />

and serotyping were performed using SRST2<br />

and SeqSero, respectively. SRST2 was also<br />

used in conjunction with the ARG-ANNOT<br />

database to detect the presence of antibiotic<br />

resistance genes in the genome of each isolate.<br />

Maximum likelihood trees were constructed<br />

using the core genome of each serotype using<br />

kSNP, and the Cortex variation assembler<br />

was used to detect SNPs and indels. The most<br />

common drug classes for which resistance<br />

genes were detected were aminoglycoside,<br />

sulfonamide, and tetracycline. Aminoglycoside<br />

and tetracycline were the two most common<br />

drug classes for which two or more resistance<br />

genes for each class were identified within the<br />

same isolate. Phylogenetic analyses of SNPs<br />

in the core genome of each serotype showed<br />

evidence for geospatial clustering. Further<br />

analyses will evaluate evolutionary phylogeny<br />

within each serotype and compare genotypic<br />

and phenotypic antibiotic resistance for all<br />

isolates to gain further insight into the spread<br />

of drug-resistant Salmonella between dairy<br />

cattle and humans in New York and Washington<br />

state.<br />

n 45<br />

NEXT GENERATION SEQUENCING AND<br />

CITRUS DISEASES; CURRENT STATUS AND<br />

FUTURE PROSPECTUS<br />

M. Shafiq, F. Ali, M. Saleem Haider;<br />

University of the Punjab, Lahore, PAKISTAN.<br />

Next-generation sequencing (NGS) high<br />

throughput technologies became available at<br />

the onset of the 21st century. They provide<br />

a highly efficient, rapid, and low cost DNA<br />

sequencing platform beyond the reach of the<br />

standard and traditional DNA sequencing technologies<br />

developed in the late 1970s. They are<br />

continually improved to become faster, more<br />

efficient and cheaper. They have been used<br />

in many fields of biology since 2004. Citrus<br />

is a large genus that includes several major<br />

cultivated species, including C. sinensis (sweet<br />

orange), Citrus reticulata (tangerine and mandarin),<br />

Citrus limon (lemon), Citrus grandis<br />

(pummelo) and Citrus paradisi (grapefruit).<br />

The draft genome of sweet orange (Citrus<br />

sinensis) has been estimated using NGS technologies.<br />

NGS has also been to study the genomes<br />

of several varieties of clementine mandarins<br />

and also to sequence full chloroplast<br />

citrus genome. NGS using RNAseq analysis<br />

has also been used to study the regulation of<br />

citrus genes expression during disease (citrus<br />

greening and CTV infection) development. It<br />

is expected that NGS will play very significant<br />

roles in many research and non-research areas<br />

of citrus biology and this technology will boost<br />

the future citrus research in Pakistan.<br />

n 46<br />

QUANTIFYING THE SWINE RESISTOME<br />

USING METAGENOMIC SEQUENCING<br />

P. Munk, V. D. Andersen, H. Vigre, F. M. Aarestrup;<br />

Techincal University of Denmark, Kgs. Lyngby,<br />

DENMARK.<br />

Monitoring of agricultural antimicrobial resistance<br />

has traditionally depended on the cultivation<br />

and phenotypic analysis of indicator<br />

70<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

bacteria. However, resistance genes are widely<br />

distributed and present in zoonotic agents as<br />

well as commensal bacteria that can exchange<br />

them by horizontal gene transfer. By directly<br />

sequencing animal fecal DNA, the entire intestinal<br />

resistome can be characterized at once.<br />

In this study, we aimed to develop a workflow<br />

appropriate for quantifying agricultural resistance<br />

and use it to characterize the Danish<br />

swine herd resistome. Ten Danish swine herds<br />

with diverse antimicrobial usage were enrolled<br />

in the study. In each herd, a fecal floor<br />

sample was obtained from 30 random pens.<br />

Those random samples were pooled for each<br />

herd and DNA was extracted using a modified<br />

QIAamp stool mini kit protocol. DNA was<br />

fragmented to 300 bp and prepared with the<br />

NEXTflex PCR-Free DNA Library Prep kit.<br />

Libraries were sequenced on a HiSeq2500 (PE,<br />

2x100 bp), producing roughly 7 billion bp/<br />

sample. MGmapper, a BWA-based pipeline<br />

for metagenomics, was used to map qualitytrimmed<br />

reads to the ResFinder database (2130<br />

resistance genes). To ensure high specificity,<br />

we only counted read pairs where both reads<br />

aligned with 50+ bp to the same reference<br />

gene. To minimize unspecific read mapping<br />

without excluding ambiguous alignments, read<br />

counts from gene variants were aggregated to<br />

gene and drug levels. The relationship between<br />

resistance gene abundances and herd-level<br />

antimicrobial consumption of more than seven<br />

drug classes was analyzed using correlation<br />

analysis. A total of 123 resistance genes were<br />

observed and 64 were detected in at least half<br />

of the samples. In all samples, tet(Q), mefA<br />

and tet(W) comprised over half the detected<br />

resistance genes. Besides tetracycline and<br />

macrolide resistance genes, beta-lactam and<br />

lincosamide resistance genes were also very<br />

prevalent. Positive correlations between drug<br />

use and gene abundances were frequently significant<br />

(p < 0.05). This was not the case for<br />

negative correlations, suggesting antimicrobial<br />

use in Danish swine herds is associated with<br />

an increase in resistance gene abundances.<br />

In conclusion, our metagenomic approach<br />

facilitates herd-level resistance monitoring in<br />

swine. The resolution is sufficient to observe<br />

antimicrobial-induced effects on the swine<br />

resistome and quantify more than 100 resistance<br />

genes. The data may furthermore be well<br />

suited to study other functional sequences like<br />

virulence genes and transposable elements,<br />

making metagenomics an attractive option for<br />

routine environmental monitoring.<br />

n 47<br />

MOLECULAR AND GENOMIC TYPING OF<br />

POULTRY ASSOCIATED SALMONELLA<br />

ENTERICA STRAINS FROM NIGERIA<br />

N. Useh 1 , H. Suzuki 2 , N. Akange 1 , M. Thomas 3 ,<br />

A. Foley 3 , M. Keena 3 , E. Nelson 3 , J. Christopher-Hennings<br />

3 , J. Scaria 3 ;<br />

1<br />

University of Agriculture, Makurdi, NIGERIA,<br />

2<br />

Yamguchi University, Yamaguchi, JAPAN,<br />

3<br />

South Dakota State University, Brookings, SD.<br />

Non-typhoidal Salmonellosis is one of the<br />

common cause of bacterial diarrhea worldwide.<br />

Globally 94 million cases of gastroenteritis<br />

and 115,000 deaths each year is estimated<br />

to be caused as a result of non-typhoidal<br />

Salmonellosis. Transmission of pathogenic<br />

Salmonella strains between different countries<br />

has increased due to global travel and food<br />

import. While North America and Europe have<br />

constituted active Salmonella surveillance<br />

programs, very limited epidemiologic data is<br />

available in developing countries, particularly<br />

in sub-Saharan Africa. Therefore, we have<br />

conducted an epidemiologic investigation of<br />

Salmonella prevalence in poultry samples from<br />

Nigeria. Salmonella was isolated by enrichment<br />

culture in tetrathionate broth followed by<br />

growth on XLT4 agar. Identify of the strains<br />

were further confirmed by Matrix Assisted<br />

Laser Desorption Ionization Time-of-Flight<br />

(MALDI-TOF). After positive identification,<br />

virulence of each strain was estimated using<br />

Human Colo-rectal cell (Caco-2) invasion<br />

assay. Total of 40 isolates were typed using<br />

Caco-2 invasion assay. These isolates were further<br />

typed using Next Generation Sequencing<br />

(NGS). Sequencing libraries were prepared us-<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

71


Poster <strong>Abstracts</strong><br />

ing Illumina Nextera XT kit and the sequencing<br />

was performed using Illumina 2 x 250 base<br />

paired end sequencing chemistry. Genome<br />

assembly of the strains was performed using<br />

Spades Assembler v 3.5.0 and annotated using<br />

Prokka v1.11. Comparative genomic analysis<br />

of the isolates was performed using ITEP tool<br />

kit. We further mapped the presence/absence<br />

of 200 Salmonella virulence associated genes<br />

in the sequenced genomes. Strain typing<br />

results revealed that the Salmonella isolates<br />

we collected belong to serotypes Anatum and<br />

Newington. There were several we fold differences<br />

in the Caco-2 invasiveness of the strains.<br />

We find that while few strains were not invasive,<br />

most strains were highly invasive. There<br />

was no clear correlation between Caco-2 invasiveness<br />

of the strains and the presence and<br />

absence of virulence genes in their genomes.<br />

This might indicate the presence of unknown<br />

virulence genes in these strains or the condition<br />

specific expression of known virulence<br />

genes. Currently, we are performing the detailed<br />

genomic comparison of invasive and<br />

non-invasive isolates, and this might give better<br />

understanding about the virulence mechanisms<br />

in Salmonella enterica serotype Antaum<br />

and Salmonella enterica serotype Newington.<br />

n 48<br />

16S RRNA SEQUENCING OF FOOD AND<br />

ENVIRONMENTAL SAMPLES ON THE<br />

ILLUMINA MISEQ<br />

N. Daquigan 1 , J. White 2 , C. Grim 1 , D. Hanes 1 ,<br />

K. Jarvis 1 ;<br />

1<br />

U.S. Food and Drug Administration, Laurel,<br />

MD, 2 Resphera, Baltimore, MD.<br />

Purpose: Microbiome profiling of food and<br />

environmental samples can enhance our understanding<br />

of associated microbial community<br />

complexities with the potential for pathogen<br />

identification. Sequencing low diversity amplicon<br />

libraries on the Illumina MiSeq requires<br />

introduction of diversity into the primer set.<br />

Here, a 16S rRNA sequencing protocol was<br />

developed and optimized for enteric pathogen<br />

surveillance in environmental and food<br />

samples. Methods: Four pairs of 16S rRNA<br />

gene primers, with 0-3 additional nucleotides<br />

to increase sequence diversity, were designed<br />

to span the V1-V3 regions of the 16S rRNA<br />

gene and Illumina adapter and Nextera index<br />

sequences. Cucumber, naturally contaminated<br />

Masala spice mix, and cilantro samples were<br />

cultured following FDA BAM methods. Some<br />

cilantro samples were spiked with Salmonella<br />

enterica at ~4 CFU/25g. DNA was extracted<br />

using the QIAcube for food and a standard<br />

CTAB protocol for hospital biofilm samples.<br />

16S rRNA amplicon libraries were normalized<br />

manually or with SequelPrep plates, pooled at<br />

8-11pM, multiplexed up to 96 samples per run,<br />

spiked with 10% PhiX at 12.5 or 20pM, and<br />

sequenced with Illumina 600-cycle v3 chemistry.<br />

Sequences were quality filtered using the<br />

Quantitative Insights in Microbial Ecology<br />

package and analyzed for taxonomic composition<br />

using Resphera Insight. Results: Newly<br />

designed 16S rRNA primers demonstrated<br />

identity to S. enterica, Campylobacter jejuni,<br />

Listeria monocytogenes, Shigella dysenteriae,<br />

Shigella sonnei, and Vibrio cholera using<br />

BLASTn analysis. In 96 sample MiSeq runs,<br />

SequelPrep plates reduced sample preparation<br />

time and provided consistent sample normalization<br />

(0.0005 - 2.0% reads passing filter per<br />

sample). Increasing library concentrations<br />

to 11pM with 20pM PhiX resulted in average<br />

data yields of 9.0G as compared to 7.7G<br />

with 8pM libraries. Quality filtering identified<br />

higher levels of non-specific amplicon contaminants<br />

in Masala (28%) as compared to cilantro<br />

(2%), cucumber (0.6%), and biofilm (0.7%)<br />

samples. Microbiome proportional abundances<br />

included: Staphylococcus (15%), Enterobacter<br />

(15%), and Bacillus (10%) species for Masala<br />

(10,000 reads); Pseudomonas (40%), Flavobacterium<br />

(19%) and Janthinobacterium (8%)<br />

species in cilantro (25,000 reads); Acinetobacter<br />

(15%), Rhizobium (12%), and Pseudomonas<br />

(12%) species in cucumbers (10,000<br />

reads); and Elizabethkingia (25%), Pseudomonas<br />

(13%), and Serratia (8%) species in<br />

biofilm samples (15,000 reads). S. enterica<br />

72<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

averaged 11.3% (2835 of 25,000 reads, n=6)<br />

and 0.46% (46 of 10,000 reads, n=2) in spiked<br />

cilantro and culture-positive Masala, respectively.<br />

Conclusions: Pathogens associated with<br />

food and environmental contamination, such<br />

as S. enterica, are important disease outbreak<br />

sources. Optimizing all aspects of our 16S<br />

rRNA sequencing protocol enables sequencing<br />

of multiple sample types important for public<br />

health safety.<br />

n 49<br />

SALMONELLA ENTERICA SEROVAR<br />

KENTUCKY ISOLATES FROM DAIRY COWS<br />

AND POULTRY DEMONSTRATE DIFFERENT<br />

EVOLUTIONARY HISTORIES AND HOST-<br />

SPECIFIC POLYMORPHISMS<br />

B. J. Haley, J. S. Van Kessel, J. S. Karns;<br />

USDA, ARS, EMFSL, Beltsville, MD.<br />

Salmonella enterica subsp. enterica serovar<br />

Kentucky is commonly isolated from dairy<br />

cows and poultry in the United States. Although<br />

it is not among the most frequently<br />

isolated serovars from cases of human salmonellosis,<br />

its high prevalence in livestock and<br />

poultry indicate it is a potential public health<br />

threat, particularly in light of the global spread<br />

of multi-drug resistant S. Kentucky ST198. To<br />

investigate genomic differences between S.<br />

Kentucky isolated from dairy farms and poultry<br />

operations, the genomes of 41 S. Kentucky<br />

ST152 isolates recovered from dairy cows and<br />

four S. Kentucky ST152 isolates recovered<br />

from poultry were sequenced using an Illumina<br />

NextSeq 500. Publically available S. Kentucky<br />

data were retrieved from the NCBI Sequence<br />

Read Archive and phylogenies were inferred<br />

after SNPs were detected using both ParSNP<br />

and the CFSAN SNPfinder. Phylogenetic inference<br />

demonstrated that S. Kentucky ST152<br />

isolates from poultry evolved from those<br />

frequently recovered from dairy cows. The S.<br />

Kentucky genomes in the clades dominated by<br />

cow isolates are differentiated from those in<br />

the clade dominated by poultry isolates by nine<br />

conserved single nucleotide polymorphisms.<br />

A significant number of the SNPs were located<br />

in open reading frames identified as iron-scavenging<br />

genes suggesting these differences are<br />

at least partly responsible for the host-specificity.<br />

Interestingly, among the isolates, there was<br />

no evidence of mixing of cow and poultry S.<br />

Kentucky. However, there was one isolate that<br />

appeared as intermediate and was rooted near<br />

the most recent common ancestor. Within the<br />

cow-specific clade, isolates were, in general,<br />

clustered by geography, however some geographical<br />

mixing was observed. Results of this<br />

analysis indicate a geographical location and<br />

source (cow or poultry) could be inferred from<br />

the genome sequences of S. Kentucky.<br />

n 50<br />

GENOMIC EPIDEMIOLOGY OF SALMONELLA<br />

ENTERICA SEROTYPE CERRO<br />

A. Thachil 1 , H. Suzuki 2 , A. Glaser 1 , M.<br />

Thomas 3 , S. Das 3 , G. Gopinath 4 , J. Jean-Gilles<br />

Beaubrun 4 , N. Addy 4 , H. Chase 4 , A. Jayaram 4 ,<br />

Y. Yoo 4 , T. Chung 4 , D. Hanes 4 , J. Scaria 3 ;<br />

1<br />

Cornell University, Ithaca, NY, 2 Yamaguchi<br />

University, Yamaguchi, JAPAN, 3 South Dakota<br />

State University, Brookings, SD, 4 FDA, Laurel,<br />

MD.<br />

Salmonella enterica is classified into more<br />

than 2500 serotypes. Salmonella enterica serotype<br />

Cerro is a strain type mainly associated<br />

with dairy cattle. In 1990s the incidence of this<br />

serotype in cattle was relatively rare. However,<br />

in the last decade Salmonella enterica serotype<br />

Cerro has emerged as a common serotype<br />

found in lactating cows in Northeastern United<br />

States. Although primarily adapted to cows,<br />

Salmonella enterica serotype Cerro has also<br />

caused human infections and outbreaks. This<br />

includes the 1985 ‘Carne Seca’-associated<br />

outbreak in New Mexico and 2012 outbreak<br />

in Arkansas prisons. To obtain better insights<br />

into the possible cause of the increased incidence<br />

of Salmonella enterica serotype Cerro,<br />

we performed a Next generation Sequencing<br />

(NGS) based comparative genomic analysis of<br />

Cerro strains isolated from farms in New York,<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

73


Poster <strong>Abstracts</strong><br />

Pennsylvania, Vermont and South Dakota.<br />

Strains were isolated and were identified as<br />

Salmonella using standard protocols. Serotyping<br />

of the strains was performed at National<br />

Veterinary Service Laboratory, Ames, Iowa.<br />

After serotyping over 200 isolates, 70 isolates<br />

were chosen for NGS. Sequencing libraries<br />

were prepared using Illumina Nextera XT<br />

kit and the sequencing was performed using<br />

Illumina 2 x 250 base paired end sequencing<br />

chemistry. The strains were assembled<br />

independently using CLC genome workbench<br />

and Spades Assembler v 3.5.0, and annotated<br />

using RAST and Prokka v1.11. Comparative<br />

genomic analysis of the isolates from our collection<br />

and other Cerro genomes available<br />

in NCBI was performed using ITEP tool kit.<br />

Also, 2800 Salmonella core gene dataset was<br />

used with in-house perl scripts to identify<br />

whole genome core gene SNP profiles of Cerro<br />

isolates and representative strains from other<br />

serovars. Phylogenetic analysis and illustration<br />

were completed using MEGA6 suite. To<br />

determine whether changes in virulence genes<br />

contributed to the higher incidence of this<br />

serotype, we mapped the presence/absence of<br />

nearly 200 Salmonella virulence associated<br />

genes in the sequenced genomes. Our preliminary<br />

results indicate that Salmonella enterica<br />

serotype Cerro of bovine and swine origin has<br />

similar genome properties. We are currently<br />

refining the final results from the comparative<br />

genome analysis and the complete data will be<br />

presented in the conference presentation.<br />

n 51<br />

USE OF NEXT GENERATION SEQUENCING<br />

TO EXPLORE THE DIVERSITY OF<br />

TOXINOTYPE V CLOSTRIDIUM DIFFICILE<br />

STRAINS ORIGINATING FROM A CLOSED<br />

POPULATION OF HUMANS AND SWINE<br />

K. N. Norman, H. M. Scott;<br />

Texas A&M University, College Station, TX.<br />

Clostridium difficile typically causes nosocomial<br />

infections; however, there have been<br />

increasing reports of community-acquired C.<br />

difficile infection (CA-CDI). These community-acquired<br />

cases have no recent history of<br />

hospitalization. The finding of similar strains<br />

in humans and animals, as well as retail meat,<br />

has raised concern that C. difficile is a potential<br />

foodborne pathogen. Previously we isolated<br />

C. difficile from a closed population of humans<br />

and swine to investigate the potential for<br />

C. difficile to transfer from swine to humans<br />

through occupational exposure. We found<br />

that there was not a significant difference in<br />

the prevalence of C. difficile in wastewater<br />

from humans who worked with swine versus<br />

humans who did not work with swine. Interestingly,<br />

the majority of strains isolated from both<br />

the human wastewater and swine fecal samples<br />

were classified as toxinotype V, North American<br />

Pulsed-field type 7 (NAP7). Pulsed-field<br />

gel electrophoresis (PFGE) and ribotyping<br />

are the standard methods used to differentiate<br />

between C. difficile strains; however these<br />

typing methods may not be the best suited<br />

methods for C. difficile, particularly in regards<br />

to toxinotype V strains. Typing of C. difficile is<br />

further complicated because PFGE is generally<br />

used in the United States, while ribotyping is<br />

commonly used in Europe, making comparisons<br />

between strains and studies difficult. Next<br />

generation sequencing may provide a more<br />

discriminatory method to differentiate between<br />

strains and will also provide evolutionary<br />

information about the strains. We conducted<br />

whole genome sequencing on 36 swine and<br />

28 human epidemiologically related, toxinotype<br />

V, NAP7 strains isolated from the closed<br />

population on the Illumina MiSeq sequencing<br />

platform. Library preparation was performed<br />

using Nextera XT DNA sample prep kits and<br />

each strain was individually indexed. MiSeq<br />

Reporter and Geneious Pro Software were<br />

used to assemble and align sequences, identify<br />

potential nucleotide polymorphisms, and<br />

facilitate phylogenetic analyses. We are currently<br />

analyzing the data including the single<br />

nucleotide polymorphisms found in the 64<br />

sequenced strains. Whole genome sequencing<br />

data will facilitate the comparison between<br />

PFGE-derived and genome sequencing-derived<br />

74<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

diversity. The diversity and evolution of toxinotype<br />

V, NAP7 strains are especially important<br />

because these strains are commonly found<br />

in both food animals and humans and many<br />

questions still remain about the potential role<br />

of food animals in CA-CDI. Understanding the<br />

true diversity within C. difficile toxinotypes<br />

and North American Pulsed-field types is also<br />

essential for discussions regarding appropriate<br />

and standardized typing methods.<br />

n 52<br />

A PATHOGENOMICS APPROACH TO<br />

IMPROVE THE ACCURACY OF VETERINARY<br />

CLINICAL DIAGNOSTICS<br />

S. Das 1 , M. Thomas 1 , A. Pillatzki 1 , L. Holler 1 ,<br />

M. Keena 1 , A. Foley 1 , E. Nelson 1 , J. Christopher-Hennings<br />

1 , H. Suzuki 2 , J. Scaria 1 ;<br />

1<br />

South Dakota State University, Brookings, SD,<br />

2<br />

Yamaguchi University, Yamaguchi, JAPAN.<br />

The recent advances in Next Generation Sequencing<br />

(NGS) technology promise cheap<br />

and fast whole genomic data and offer the<br />

possibility to revolutionize veterinary clinical<br />

diagnostics. It is now possible to improve the<br />

accuracy of clinical diagnostics when pathology<br />

data is combined with NGS based clinical<br />

specimen sequencing. This approach offers<br />

faster results and avoids the need for many<br />

different tests to obtain the final interpretation.<br />

We test such an approach in the clinical<br />

evaluation of Salmonellosis cases in Animal<br />

Disease Research and Diagnostic Laboratory<br />

(ADRDL), South Dakota. All bovine<br />

and swine salmonellosis cases submitted to<br />

ADRDL from September 2014 till date was<br />

included in this study. Salmonella isolates were<br />

identified using standard microbiological protocols.<br />

Further characterization of Salmonella<br />

strains were then performed using NGS. Sequencing<br />

libraries for NGS was prepared using<br />

Illumina Nextera XT kit and the sequencing<br />

was performed using Illumina 2 x 250 base<br />

paired end sequencing chemistry. Genome<br />

assembly of the strains was performed using<br />

Spades Assembler v 3.5.0 and annotated<br />

using Prokka v1.11. Comparative genomic<br />

analysis of the isolates was performed using<br />

ITEP tool kit. Antibiotic resistance gene profile<br />

and virulence gene profile of the isolates was<br />

determined using a custom PERL script. These<br />

results were then combined with pathological<br />

data. This combined approach revealed that in<br />

most cases Salmonella infection was accompanied<br />

by presence of other enteric pathogens<br />

such as rotavirus, Clostridium perfringens,<br />

hemolytic Escherichia coli and porcine epidemic<br />

diarrhea virus. Salmonella enterica serotype<br />

Dublin was found to be associated with<br />

septicemia and colitis in the absence of other<br />

pathogens. Antibiotic resistance gene profile<br />

obtained based NGS data provided further<br />

insights into the possible antibiotic susceptibility/resistance<br />

pattern of the strains.<br />

n 53<br />

QUALITY ASSESSMENT OF SINGLE<br />

NUCLEOTIDE POLYMORPHISM IN THE<br />

HIGHLY CLONAL SALMONELLA ENTERITIDIS<br />

D. Ogunremi;<br />

Canadian Food Inspection Agency, Ottawa,<br />

ON, CANADA.<br />

Single nucleotide polymorphism (SNPs) has<br />

emerged as the most informative genetic<br />

variation for the characterization of the highly<br />

clonal Salmonella Enteritidis (SE) to meet<br />

the goals of epidemiological and evolutionary<br />

investigations. Raw reads, contigs, draft<br />

or polished genomes are used to identify SNP<br />

variants typically by aligning the nucleotide<br />

reads of an isolate to a reference genome.<br />

Two relatively well-assembled genomes can<br />

be compared to determine the number of SNP<br />

variants between them. In addition, referencefree<br />

SNP detection from raw reads has also<br />

been proposed and used. There are limitations<br />

to detecting SNPs from the currently available<br />

software and bioinformatics pipelines<br />

and these include the use of a significantly<br />

disparate reference genome to align raw reads,<br />

sequencing errors, mis-assembly and repeats.<br />

To that end, large scale studies investigating<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

75


Poster <strong>Abstracts</strong><br />

SNPs typically involve Sanger sequencing of<br />

PCR amplicons representing the targets for the<br />

purpose of confirming any variants. Because of<br />

the responsibility imposed on food safety regulatory<br />

agencies to act promptly in forestalling<br />

human exposure to contaminated food, rapid<br />

but accurate SNP detection is required and<br />

this precludes lengthy testing procedures such<br />

as Sanger sequencing before acting to protect<br />

consumers. Bioinformatics pipelines for SNP<br />

detection will not only benefit from rigorous<br />

evaluation and routine quality assurance<br />

methods but may also require a systematic<br />

assessment using biologically relevant isolates<br />

as part of a pre-deployment procedure. To this<br />

end, polished genomes were developed for<br />

the chromosomes of three related SE isolates<br />

obtained from the same poultry facility on the<br />

same day, and were deposited in the GenBank<br />

(Accession numbers CP009084.2, CP009085.2<br />

and CP011942). Using whole genome analysis<br />

tools, bi-directional Sanger sequencing<br />

of target amplicons, and pyrosequencing, the<br />

three isolates were shown to differ from each<br />

other by 3-9 SNPs. Investigation was done to<br />

determine whether true SNPs can be accurately<br />

and consistently detected using the same bioinformatics<br />

platform applied at the different processing<br />

stages of each genome from raw reads<br />

to polished genomes. Variant calling from raw<br />

reads mapped to a reference genome led to the<br />

detection of a higher number of SNPs when<br />

compared to finished genomes, many of which<br />

could not be confirmed by Sanger sequencing<br />

(i.e., false SNPs). Progressive analysis of the<br />

nucleotide data refined the detection of SNPs<br />

by reducing the number of false positives arising<br />

from mis-assembly or repeats. The development<br />

of a wet chemistry, high-throughput,<br />

cost-efficient SNP-PCR was useful as a quality<br />

tool for confirming true SNPs and relatedness<br />

among isolates, providing confidence in testing<br />

results and averting the need to routinely<br />

develop polished genomes or carry out further<br />

lengthy testing procedures.<br />

76<br />

n 54<br />

ASM Conferences<br />

COMPARATIVE SEQUENCE ANALYSIS OF<br />

MULTI-DRUG RESISTANT INCA/C PLASMIDS<br />

FROM SALMONELLA ENTERICA<br />

M. Hoffmann 1 , J. Pettengill 1 , J. Miller 1 , N.<br />

Gonzalez-Escalona 1 , S. L. Ayers 2 , J. Payne 1 , S.<br />

Zhao 2 , J. Meng 3 , M. W. Allard 1 , P. F. Dermott 2 ,<br />

E. W. Brown 1 , S. R. Monday 1 ;<br />

1<br />

US Food and Drug Administration, College<br />

Park, MD, 2 US Food and Drug Administration,<br />

Laurel, MD, 3 University of Maryland, College<br />

Park, MD.<br />

Multidrug-resistant (MDR) determinants are<br />

often encoded on mobile plasmids. Presently,<br />

there are minimal data regarding antibiotic<br />

resistance and dissemination, especially as it<br />

applies to plasmid evolution and maintenance<br />

in zoonotic bacterial pathogens. Using next<br />

generation sequencing it is quite difficult to<br />

filter out large plasmid from chromosomal<br />

contigs, particularly if the plasmid has a<br />

low copy number approaching one, as with<br />

the chromosome. The Pacific Biosciences<br />

(PacBio) RS II Sequencer provides very long<br />

reads that greatly facilitates distinguishing<br />

plasmid sequences. We report here a very efficient<br />

plasmid isolation protocol for use with<br />

Salmonella enterica serovars that produces<br />

purified DNA that meets the criteria necessary<br />

for sequencing with PacBio technology.<br />

A total of six different Salmonella enterica<br />

isolates, representing six different serovars and<br />

containing the MDR-AmpC resistance profile,<br />

isolated from retail poultry meats, were used in<br />

this study. Salmonella plasmids were obtained<br />

using a modified mini preparation and transformed<br />

into Escherichia coli DH10Br. A Qiagen<br />

Large-Construct kit was used to recover<br />

highly concentrated and purified plasmid DNA<br />

that was sequenced using PacBio technology.<br />

The size of the closed IncA/C plasmids ranged<br />

from 104 kb to 191 kb and shared a stable,<br />

conserved backbone containing 98 core genes,<br />

with only six core gene differences. The six<br />

IncA/C plasmids encoded a number of antimicrobial<br />

resistance genes, including those for<br />

quaternary ammonium compounds and mer-


Poster <strong>Abstracts</strong><br />

cury. The numbers of resistance determinants<br />

varied from 8 to 17 (S. Newport (13), S. Typhimurium<br />

(13), S. Heidelberg (17), S. Infantis<br />

(14), S. Agona (8), and S. Kentucky (14))<br />

with some having two copies of blacmy-2,<br />

quacEΔ1, sugE, merE, merD, and merA. Additionally,<br />

we performed a comparative sequence<br />

analyses that included 1) 14 additional IncA/C<br />

plasmids derived from Salmonella enterica and<br />

2) 38 IncA/C additional plasmids derived from<br />

different genera to provide an evolutionary<br />

picture of antimicrobial resistance mediated by<br />

this common plasmid backbone. These findings<br />

shed light on the variations of the IncA/C<br />

in resistant bacteria from different sources.<br />

n 55<br />

APPLICATION OF WHOLE-GENOME<br />

SEQUENCING APPROACHES TO THE<br />

REAL-TIME CHARACTERIZATION OF<br />

BACTERIAL PATHOGENS IN FOOD-TESTING<br />

LABORATORIES<br />

C. D. Carrillo, D. Lambert, A. Koziol, P. Manninger,<br />

M. Gauthier, B. W. Blais;<br />

Canadian Food Inspection Agency, Ottawa,<br />

ON, CANADA.<br />

The timely identification and characterization<br />

of foodborne bacteria for risk assessment<br />

purposes is a key operation in a food safety<br />

investigation. Current methods require several<br />

days and/or provide low-resolution characterization.<br />

We have implemented procedures<br />

for the rapid production of whole-genome sequence<br />

(WGS) data for the characterization of<br />

bacterial pathogens (Salmonella spp., Listeria<br />

monocytogenes and Shiga-toxigenic Escherichia<br />

coli (STEC)) isolated in food-testing<br />

laboratories at the Canadian Food Inspection<br />

Agency (CFIA). To demonstrate the feasibility<br />

and accuracy of WGS as an alternative<br />

to traditional procedures, an analysis of over<br />

500 historical and contemporary isolates was<br />

conducted to compare WGS approaches to<br />

the characterization achieved through current<br />

methods. Genomic DNA was isolated from<br />

single colonies or broth cultures and sequencing<br />

libraries were constructed using the Nextera<br />

XT DNA sample preparation kit (Illumina,<br />

Inc., San Diego, CA). Sequence was generated<br />

on the Illumina MiSeq instrument with raw<br />

data sampling from the instrument during the<br />

sequencing run (for urgent samples) and/or<br />

following completion of the run. Automated<br />

pipelines for bioinformatic analysis were generated<br />

to perform read corrections, de novo<br />

assembly, quality assessment, sequence vector<br />

screening and removal, and identification of<br />

pertinent genetic markers. The entire set of<br />

input parameters and software versions used<br />

to conduct these analyses was automatically<br />

captured in a traceability report. Characteristic<br />

genetic markers were accurately identified in<br />

all cases where WGS data met minimum quality<br />

standards and a report of genomic analysis<br />

(ROGA) could be generated within 1 to 3 days<br />

following reception of colony isolates. ROGAs<br />

were presented in a user-friendly format for<br />

ease of use by risk assessors and recall specialists.<br />

Real-time WGS produces high-resolution<br />

characterization of bacterial pathogens at a<br />

cost and timeframe similar to methods that are<br />

currently in use and has the potential to replace<br />

lengthy biochemical characterization and<br />

typing procedures used in contemporary foodtesting<br />

laboratories.<br />

n 56<br />

GENERATING FORENSIC INSIGHT THROUGH<br />

EVIDENCE-ASSOCIATED MICROBIOME<br />

ANALYSIS<br />

A. Materna, F. Strino, J. Johansen, P. Liboriussen,<br />

L. Schauser;<br />

QIAGEN, Aarhus, DENMARK.<br />

NGS has revolutionized the field of microbial<br />

ecology, by revealing insight into environmental<br />

as well as host-associated microbiomes.<br />

Techniques for monitoring microbiome<br />

composition are today being used for a wide<br />

variety of purposes, including forensics applications.<br />

Here we present a user-friendly<br />

and NGS-platform independent bioinformatics<br />

solution for microbiome profiling and<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

77


Poster <strong>Abstracts</strong><br />

demonstrate its utility for crime scene investigation.<br />

Within the EU-funded “MiSAFE”<br />

research project (FP 7) our analysis solution<br />

was validated through a mock crime scene<br />

investigation. Partners extracted DNA from<br />

soil samples obtained from suspects’ boots<br />

and crime scenes. Then 2x 300bp paired read<br />

libraries were generated from 16S rRNA amplicons<br />

and sequenced on an Illumina MiSeq<br />

instrument. The resulting NGS data were<br />

subjected to our analysis workflow consisting<br />

of i) preprocessing and quality control, ii) clustering<br />

of data into operational taxonomic units<br />

(OTUs) and taxonomic assignment of OTUs,<br />

iii) detection of PCR artifacts (chimeras), iv)<br />

annotation of results with sample metadata,<br />

v) rarefaction analysis, beta diversity estimation<br />

and principal coordinate analysis (PCoA).<br />

The OTUs can be clustered de novo, or using<br />

common 16S sequence databases as reference.<br />

A number of additional statistical tools for<br />

microbiome analysis and microbial ecology<br />

complete the analysis solution, including richness<br />

calculation, hierarchical clustering, or<br />

the Multiple Response Permutation Procedure<br />

(MRPP) PERMANOVA testing for significant<br />

differences between two or more groups. The<br />

obtained results are richly visualized and can<br />

be browsed in the context of investigationrelevant<br />

metadata, which resulted in supporting<br />

evidence for the suspect being involved in<br />

the criminal act.<br />

n 57<br />

RAPID DETECTION AND GENETIC<br />

CHARACTERIZATION OF BURKHOLDERIA<br />

PSEUDOMALLEI AND ITS NEAR-NEIGHBORS<br />

THROUGH MULTIPLEX AMPLICON<br />

SEQUENCING<br />

J. Delisle 1 , J. Schupp 1 , J. Sahl 1 , R. Colman 1 , Y.<br />

Hueftle 1 , H. Heaton 1 , J. Gillece 1 , A. Vazquez 2 ,<br />

C. Hall 2 , J. Busch 2 , M. Mayo 3 , B. Currie 3 , D.<br />

Engelthaler 1 , P. Keim 1 , D. M. Wagner 2 ;<br />

1<br />

Translational Genomics Research Institute,<br />

Flagstaff, AZ, 2 Northern Arizona University,<br />

Flagstaff, AZ, 3 Menzies School of Health Research,<br />

Darwin, AUSTRALIA.<br />

Burkholderia pseudomallei, the causative agent<br />

of melioidosis, is a public health threat and potential<br />

bioterrorism agent endemic to Southeast<br />

Asia and Northern Australia. Current methodologies,<br />

such as real-time PCR, allow for rapid<br />

detection but only limited characterization. A<br />

rapid and robust assay for the detection and<br />

characterization of clinical and forensic materials<br />

suspected of containing Burkholderia<br />

pseudomallei would be of enormous benefit<br />

to epidemiological and forensic investigations.<br />

Next-generation sequencing of multiple<br />

informative genetic loci can provide efficient,<br />

rapid detection and differentiation from near<br />

neighbor species, as well as fine scale genetic<br />

characterization. We have developed a 68 locus<br />

amplicon sequencing assay that results in<br />

1) detection of B. pseudomallei; 2) differentiation<br />

from B. mallei and other near-neighbor<br />

species; 3) potential detection of strain mixtures;<br />

4) differentiation within B. pseudomallei;<br />

and 5) virulence gene characterization (11<br />

gene targets) within 24-48 hours, and from<br />

both culture and complex environmental or<br />

clinical samples. The system couples highly<br />

multiplexed amplification reactions with a<br />

universal amplicon indexing system, resulting<br />

in efficient multilocus amplicon sequencing<br />

from potentially hundreds of samples in a<br />

single Illumina MiSeq sequencing run. Utilizing<br />

redundant targets identified with Blast<br />

Score Ratio analysis for species identification,<br />

we show virtually 100% specificity using a<br />

panel of B. pseudomallei, B. mallei and close<br />

near-neighbors, such as B. thailandensis, B.<br />

oklahomensis, and B. humptydooensis, among<br />

others. We demonstrate detection of B. pseudomallei<br />

from as little as 10 genome copies of<br />

moderately to highly degraded DNA as well as<br />

differentiation within B. pseudomallei strains,<br />

utilizing variation within the targeted speciesspecific,<br />

MLST and virulence loci.<br />

78<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

n 58<br />

SHORT TANDEM REPEATS FOR MOLECULAR<br />

DETECTION OF PATHOGENS<br />

X. Guo 1 , Q. Yu 2 , P. Liu 1 , J. Watt 1 , Y. Li 3 , S.<br />

Sammons 3 ;<br />

1<br />

Eagle Medical Service and Centers for Disease<br />

Control and Prevention, Atlanta, GA,<br />

2<br />

Emory University, Atlanta, GA, 3 Centers for<br />

Disease Control and Prevention, Atlanta, GA.<br />

With recent development in sequencing technology,<br />

a large number of pathogen genome<br />

sequences provide it possible for comparative<br />

analysis of short tandem repeats (sTR) to<br />

understand their epidemiologic significance<br />

as genetic marker on gene expression regulation,<br />

host-pathogen interaction, new pathogen<br />

molecular identification and disease diagnosis.<br />

Here with bioinformatics methods including<br />

scientific programming and computational<br />

biological statistics, we were trying to identify<br />

important sTR in virus genomes and apply<br />

them on rapid identification of new pathogen<br />

and their diagnosis grouping. Result indicated<br />

that, among 36 NCBI-available west and<br />

central Africa (group A-D) monkeypox virus<br />

genomes, ten sTR with largest statistically<br />

significant difference were identified with their<br />

sequence patterns, copy number change and<br />

genetic mutations. Clustering MPVs with their<br />

copy number changes on these 10 sTR, almost<br />

the same grouping results can be obtained<br />

as clustering them by the alignment of their<br />

whole genome sequences. At the same time,<br />

a 5-nt repeat rearrangement has been found<br />

as TC-31k representatively among different<br />

groups of MPV. The TC-31k has their specific<br />

arrangement of ccatt/ccatc (T/C) within the<br />

west Africa, group C/D and group A/B of<br />

central Africa MPVs. Combined with the copy<br />

number change of a tandem repeat AGATT, the<br />

grouping property of a newly found MPV can<br />

be identified by one or two PCR sequencing<br />

or directly from NGS raw data. Considering<br />

the fact that TC-31k encodes a Kelch-like<br />

protein motif which is restricted in pox virus,<br />

the T/C rearrangement may reflect the variety<br />

of binding interaction with their host proteins.<br />

As a summary, through the systemly biological<br />

analysis of MPV genomes, several significant<br />

sTR were found to be group-specific for DNA<br />

sequence clustering and the molecular detection<br />

of newly found MPV pathogens.<br />

Keywords: short tandem repeat, pathogen<br />

identification, monkeypox virus, NGS<br />

n 59<br />

A WGMLST APPROACH TO SUBSPECIES<br />

IDENTIFICATION AND CHARACTERIZATION<br />

OF BACTERIA IN A CULTURE COLLECTION<br />

A. Luquette 1 , K. Chase 1 , H. Pouseele 2 , K. De<br />

Bruyne 2 , M. Wolcott 1 ;<br />

1<br />

USAMRIID, Frederick, MD, 2 Applied Maths<br />

NV, Sint-Martens-Latem, BELGIUM.<br />

Identification and characterization at the subspecies<br />

level is a challenge within bacterial<br />

species with high sequence similarity. There<br />

are a multitude of methods for subspecies<br />

identification, each with limitations in the ability<br />

to resolve close genetic neighbors. Nextgeneration<br />

sequencing allows for the entire<br />

genome to be used for sequence typing in a<br />

cost effective manner. While traditional multilocus<br />

sequence typing (MLST) schemes use<br />

only 6-10 housekeeping loci for identification,<br />

whole genome multi-locus sequence typing<br />

(wgMLST) schemes typically use 2000-4000<br />

loci from across the genome, providing greater<br />

sequence typing resolution. wgMLST pipelines<br />

were developed using commercial software<br />

(BioNumerics 7.5 software; Applied Maths<br />

NV) for Bacillus anthracis and Francisella tularensis<br />

for use by the Department of Defense<br />

Unified Culture Collection (UCC). Following<br />

whole genome sequencing, alleles were identified<br />

with two approaches: de novo assembly<br />

followed by a BLAST search and assemblyfree<br />

identification directly from the sequence<br />

reads. All calculations were done in the cloud<br />

so very limited local computational memory<br />

was needed. Alleles for different subspecies<br />

were generated and compared with current alleles<br />

in known sequence types. This approach<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

79


Poster <strong>Abstracts</strong><br />

also allows multiple schemes and subsets to be<br />

generated based on specific genome data, such<br />

as virulence factors. The development, application,<br />

and results of the typing schemes for<br />

Bacillus anthracis and Francisella tularensis<br />

will be presented, demonstrating the usefulness<br />

of the system. Next-generation sequencing<br />

and wgMLST will extend our capabilities to<br />

identify and characterize these and many other<br />

organisms from the UCC.<br />

n 60<br />

EFFECT OF SINGLETON REMOVAL ON THE<br />

MICROBIAL COMMUNITY STRUCTURE<br />

RESULTING FROM ILLUMINA PAIRED-END<br />

MISEQ 16S AMPLICON DATA<br />

K. Wong 1 , T. Shaw 2 , M. Molina 3 ;<br />

1<br />

Oak Ridge Institute for Science and Education,<br />

Oak Ridge, TN, 2 Institute of Bioinformatics,<br />

University of Georgia, Athens, GA,<br />

3<br />

USEPA, Office of Research and Development,<br />

Ecosystems Research Division, Athens, GA.<br />

Application of next generation sequencing<br />

(NGS) to study the microbial ecology in<br />

diverse environments has been increasingly<br />

popular in the last several years. The technique,<br />

which has been applied in different research<br />

areas, has the ultimate goal of gaining a<br />

better understanding of human and ecosystem<br />

functioning, as well as environmental sustainability.<br />

Because the bioinformatics analysis of<br />

NGS data is at the maturing stage, the quality<br />

control (QC) procedures involved in analyzing<br />

the data varies among different published studies.<br />

Since Singleton removal has been recently<br />

recommended in the QC step to remove artificial<br />

reads generated during Illumina sequencing,<br />

we investigated its effects on the community<br />

structure and diversity index from the<br />

16S amplicon data produced by the Illumina<br />

Paired-End technique. Our experimental samples<br />

were fresh and aged cow manure analyzed<br />

using the Quantitative Insight Into Microbial<br />

Ecology (QIIME) platform. Singleton removal<br />

did not have a significant effect on relative<br />

abundance results except for the removal of<br />

unknown sequences in most samples. However,<br />

while the alpha diversity of fresh manure<br />

was higher than that of aged ones before singleton<br />

removal, a decreased in the diversity of<br />

fresh manure samples was observed after the<br />

removal took place. Overall, results indicate<br />

that singleton removal does have a significant<br />

effect on microbial diversity results; therefore,<br />

we recommend to add a step for removal of<br />

singletons to the standard QC procedure when<br />

analyzing Illumina sequencing data.<br />

n 61<br />

COMPARISON OF BACTERIAL DIVERSITY<br />

BETWEEN ENDOSCOPIC MUSOCAL BIOPSY<br />

AND FECAL SAMPLE AMONG FECAL<br />

MICROBIOTA TRANSPLANT RECIPIENTS<br />

J. P. Haydek 1 , W. M. Tauxe 1 , E. Neish 2 , A.<br />

Ward 3 , T. Dhere 1 , C. S. Kraft 1 ;<br />

1<br />

Emory University School of Medicine, Atlanta,<br />

GA, 2 Emory University, Atlanta, GA,<br />

3<br />

Emory Healthcare, Atlanta, GA.<br />

Background: Endogenous bacteria on the colonic<br />

mucosal border play a fundamental role<br />

in nutritional absorption, epithelial stability<br />

and innate immunity. However, core questions<br />

regarding the natural microbiome, including<br />

diversity among different intestinal sites, usage<br />

of fecal bacteria as a proxy for gut flora,<br />

and variation between adherent and planktonic<br />

bacteria, remain unanswered. Clostridium difficile<br />

infection (CDI) is a diarrheal disease associated<br />

with 15-25% of antibiotic associated<br />

diarrhea cases, and causes between 500,000<br />

and 3 million new cases each year, with a total<br />

systematic cost estimated between $436 million<br />

and $3 billion. Fecal microbiota transplant<br />

(FMT), a procedure consisting of infusion<br />

of donor fecal material to a recipient with<br />

refractory CDI, has been increasingly shown<br />

to be a safe and effective therapy for chronic<br />

recurrent CDI, but questions still remain about<br />

its exact mechanisms of action and the microbial<br />

shifts associated with the procedure.<br />

Bacterial diversity of the colonic mucosa and<br />

feces have long been assumed to be equiva-<br />

80<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

lent, although this premise has increasingly<br />

been challenged. Methods: In a prospective<br />

study, stool samples and endoscopic mucosal<br />

biopsies were taken from 4 FMT recipients<br />

at the time of procedure, 2 weeks after the<br />

procedure, and 10 weeks after the procedure.<br />

DNA from the samples was lysed using the<br />

Mo Bio Powersoil® kit followed by extraction<br />

on the Qiagen EZ1 Advanced XL platform.<br />

High-throughput sequencing (paired end dual<br />

indexing, 250 base pair bidirectional reads)<br />

was performed using an Illumina MiSeq. Data<br />

analysis was performed using the R software<br />

suite and Vegan R package. Results: Four<br />

samples taken from the distal colon at time of<br />

FMT were compared to stool samples prior to<br />

the FMT procedure. Bray-Curtis dissimilarly<br />

metrics were calculated, with an average value<br />

of 0.77. Shannon diversity was also calculated<br />

for each individual sample, with no significant<br />

difference between the stool and endoscopic<br />

mucosal samples (P=0.15). Using a 1-sample<br />

t-Test, Bray-Curtis measures between samples<br />

were compared against the null hypothesis (no<br />

diversity difference) and was determined to<br />

be significant (P= 0.003). Conclusions: Highthroughput<br />

sequencing analysis comparing<br />

endoscopic mucosal biopsies and fecal samples<br />

show significant differences in bacterial diversity.<br />

Larger sample sizes and further analysis<br />

are needed to characterize the bacterial differences<br />

among colonic mucosal and fecal sites.<br />

n 62<br />

REVEALING HUMAN PATHOGENS IN<br />

LIVESTOCK FAECES BY METAGENOMES<br />

BASED ON HIGH-THROUGHPUT<br />

SEQUENCING<br />

A. Zhang, Y. Mao, T. Zhang;<br />

The University of Hong Kong, Hong Kong,<br />

HONG KONG.<br />

In recent years, human pathogens have raised<br />

researcher awareness as one of the pollutants<br />

in the environment. The diversity and<br />

abundance of pathogens from livestock faeces<br />

samples can be used to evaluate the environmental<br />

pollutant risk of these sources. In this<br />

study, we investigated 12 metagenomic DNA<br />

sequencing data sets of pigs and chicken faeces<br />

and swine wastewater and treated water. The<br />

metagenomic data set of 2.5G for each sample<br />

was derived from Illumina HiSeq 2000 2 ×<br />

100 bp paired-end sequencing. Metagenomic<br />

Phylogenetic Analysis (MetaPhlAn) was conducted<br />

to classify microbial communities and<br />

reveal human pathogens at species level for the<br />

distribution, diversity and abundance analysis.<br />

In summary, 63 different bacterial pathogen<br />

species were detected with the maximum<br />

abundance of 61% among all classified species<br />

of the database. Both the principal coordinates<br />

analysis (PCoA) and network analysis demonstrate<br />

a clear clustering pattern that microbial<br />

structures of pathogens share little similarity<br />

among different domestic animals. Pathogenic<br />

bacteria detected in the same host species of<br />

different growth periods have various distribution<br />

and abundance pattern. Relatively strong<br />

co-occurrence correlation (P-value < 0.01 and<br />

Spearman’s coefficient ρ > 0.6) was revealed<br />

between pathogens of Shigella sp. and Fusobacterium<br />

sp., which were clustered in the<br />

same module. The methodology demonstrated<br />

in this study may provide more effective and<br />

accurate approach for pathogenic pollution<br />

evaluation, and also suggestions on control of<br />

potential pathogens in livestock farm management.<br />

Acknowledgement: This research is<br />

financed by the Research Grants Council of<br />

Hong Kong (GRF17209914E).<br />

n 63<br />

EXPLORING PATHOGEN PLASMID DNA<br />

IN WASTEWATER AND SLUDGE USING<br />

METAGENOMICS APPROACH<br />

A. Li, T. Zhang;<br />

The University of Hong Kong, Hong Kong,<br />

HONG KONG.<br />

Plasmids work as mobile gene elements for<br />

genetic materials exchange between different<br />

microorganisms including human pathogens.<br />

Virulence plasmids may turn the bacteria<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

81


Poster <strong>Abstracts</strong><br />

cells into pathogenic strains. Therefore, the<br />

abundance of pathogen plasmid DNA would<br />

demonstrate the potential threats of the environmental<br />

samples to the public health. Municipal<br />

wastewater treatment plants (WWTPs)<br />

are hotspots for plasmid horizontal transfer<br />

because of the highly exchanging rate of biological<br />

compositions and chemical materials<br />

in activated sludge and anaerobic digestion<br />

sludge, while they also play significant roles<br />

in eliminating and digesting various human<br />

pathogens. In this study, we collected plasmid<br />

DNA samples from influent, activated sludge<br />

and digested sludge of two WWTPs. About<br />

3 Gb sequences of each sample was derived<br />

from next generation sequencing with Illumina<br />

HiSeq 2000 platform using the PE101 strategy.<br />

All the datasets were uploaded to MG-RAST<br />

for function annotation, and metagenomic phylogenetic<br />

analysis was conducted to screen out<br />

human pathogen at species level for the taxonomy,<br />

diversity and abundance analysis. The<br />

plasmid metagenomic datasets were compared<br />

with four corresponding total DNA metagenomic<br />

datasets obtained from previous studies<br />

to reveal the difference between the plasmid<br />

and the total DNA metagenomes. Compared<br />

with the total DNA metagenomes, the plasmid<br />

metagenomes extracted from the same sectors<br />

of the WWTP had significantly higher annotation<br />

rates, indicating that the functional genes<br />

located on plasmids can be commonly shared<br />

by the known microorganisms. This study also<br />

showed the distribution pattern of the pathogen<br />

plasmid DNA in the plasmid metagenomes.<br />

The abundance of pathogen DNA in the influent<br />

plasmid metagenomes was much higher<br />

than those in other metagenomes, and the<br />

digested sludge plasmid metagenomes had the<br />

lowest pathogen plasmid DNA abundance. All<br />

in all, the methodology used in this study indicated<br />

a novel way to evaluate the potential risk<br />

of pathogen plasmid DNA to the public health.<br />

Acknowledgement: This research is financed<br />

by the Research Grants Council of Hong Kong<br />

(GRF7190/12E).<br />

n 64<br />

A<br />

C. A. Gulvik, J. J. Avillan, E. Alyanak, M.<br />

Sjölund-Karlsson, B. M. Limbago;<br />

Centers for Disease Control and Prevention,<br />

Atlanta, GA.<br />

Clostridium difficile isolates were collected<br />

during 2010-2011 as part of the Emerging Infections<br />

program C. difficile Infection surveillance.<br />

Isolates (n = 53) were selected to represent<br />

the diversity of strain types as determined<br />

by geographic location of isolation, pulsedfield<br />

gel electrophoresis (PFGE) type, and PCR<br />

ribotype; other molecular data included PCR<br />

detection of toxin genes tcdA, tcdB, cdtA,<br />

and cdtB, and size of tcdC deletions. Epidemiological<br />

metadata associated with isolates<br />

includes patient age, U.S. state of residence,<br />

isolation year, and epidemiologic classification<br />

as healthcare- or community-associated).<br />

Paired-end Illumina sequencing was performed<br />

on isolate genomes to assess congruence of<br />

molecular typing methods commonly used for<br />

surveillance. Systematic comparison of nine<br />

genome assemblers widely used for C. difficile<br />

and other bacterial genomes revealed iterativeor<br />

multi-de Bruijn assembly with IDBA and<br />

SPAdes provided the fewest contigs, largest<br />

N50, most predicted genes, and largest contig.<br />

Maximum likelihood phylogeny inference<br />

was performed using aligned whole genomes,<br />

which averaged 60X coverage. Phylogenetic<br />

trees enabled us to classify five isolates with<br />

unclassifiable PFGE patterns. The overall<br />

concordance of genome extracted multi-locus<br />

sequence types (STs), PCR ribotypes (RT), and<br />

PFGE groups was very good when compared<br />

to whole genome phylogeny, and provides<br />

an illustration of how each group of U.S. C.<br />

difficile is related to others. This is useful because<br />

the nomenclature of various C. difficile<br />

typing methods (e.g., NAP01, ST-3, RT 027)<br />

lacks evolutionary context, whereas whole<br />

genome phylogeny provides a single illustra-<br />

82<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

tive comparator for contextualizing isolates<br />

regardless of the molecular typing method<br />

used. Two isolates occurred in clades with<br />

bootstraps of 65 and 100%, which otherwise<br />

contained single PCR RTs. Repeat sequencing<br />

and PCR ribotyping of these two isolates confirmed<br />

the unusual placement of a single RT<br />

014 isolate within the RT 020 clade, and vice<br />

versa. Fluoroquinolone resistance determinants<br />

were common (82%) among the hypervirulent<br />

epidemic RT 027 genomes; only one non-027<br />

isolate (RT 017) harbored a Thr82Ile mutation<br />

in GyrA, which confers fluoroquinolone resistance<br />

in other species. These molecular and<br />

epidemiological data will be publicly available<br />

through NCBI, and isolates have been deposited<br />

for distribution with BEI Resources.<br />

n 65<br />

IMPLEMENTATION OF WHOLE GENOME<br />

SEQUENCING (WGS) FOR SURVEILLANCE<br />

AND OUTBREAK DETECTION OF SHIGA<br />

TOXIN-PRODUCING ESCHERICHIA COLI<br />

(STEC) IN THE UNITED STATES<br />

R. L. Lindsey 1 , H. Carleton 1 , K. Joensen 2 , F.<br />

Scheutz 3 , L. Garcia-Toledo 1 , D. Stripling 1 , H.<br />

Martin 1 , N. Strockbine 1 , L. S. Katz 1 , L. Gladney<br />

1 , T. Griswold 1 , S. Im 1 , E. M. Ribot 1 , E.<br />

Trees 1 , H. Pouseele 4 , P. Gerner-Smidt 1 ;<br />

1<br />

Centers for Disease Control and Prevention,<br />

Atlanta, GA, 2 Technical University of Denmark,<br />

Lyngby, DENMARK, 3 Statens Serum<br />

Institut, Copenhagen, DENMARK, 4 Applied<br />

Maths, Sint-Martens-Latem, BELGIUM.<br />

Introduction: Shiga toxin-producing Escherichia<br />

coli (STEC) is an important foodborne<br />

pathogen capable of causing severe disease in<br />

humans. Current methods for characterization<br />

of STEC are expensive and time-consuming.<br />

Work has begun to replace traditional methods<br />

with those using whole genome sequence<br />

(WGS) data by developing an allele database<br />

of individual Escherichia genes in BioNumerics<br />

7.5, (Applied Maths, Austin, TX). This<br />

will allow characterization of Escherichia<br />

in a single workflow using a multi-locus sequence<br />

typing (MLST) approach. Materials<br />

and Methods: The Escherichia allele database<br />

was built with 314 annotated reference<br />

genomes from a geographically diverse collection<br />

of human, animal and environmental<br />

strains as well as genes encoding virulence<br />

factors, antimicrobial resistance and O and<br />

H antigens from databases at the Center for<br />

Genomic Epidemiology (DTU, Lyngby, Denmark).<br />

The reference genomes represent 50<br />

E. coli serogroups, four Shigella species and<br />

four additional Escherichia species. Multiple<br />

subschema will be built within the database<br />

to perform identification, characterization and<br />

subtyping, including classical, extended, core<br />

and whole genome MLST. To test the ability of<br />

the BioNumerics-based whole genome MLST<br />

approach to correctly identify, characterize and<br />

cluster strains, we analyzed 500 Escherichia<br />

isolates from sporadic and outbreak-related<br />

infections and compared the findings to those<br />

obtained previously with phenotypic and<br />

molecular subtyping methods. Results and<br />

Discussion: The Escherichia allele database<br />

contains 18,883 loci. For the 500 Escherichia<br />

isolates analyzed, there was 95% concordance<br />

in the results generated by the traditional and<br />

wgMLST approaches. Conclusions: The<br />

BioNumerics-based wgMLST approach provides<br />

a single, cost effective strategy to identify<br />

and characterize isolates for surveillance<br />

and outbreak investigations. The analysis tools<br />

in BioNumerics will enable end-users in public<br />

health laboratories to analyze WGS data they<br />

generate with little bioinformatics expertise,<br />

making the system equally efficient for local<br />

and central investigations. The system will be<br />

refined through continued collaboration with<br />

domestic and international partners.<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

83


Poster <strong>Abstracts</strong><br />

n 66<br />

DELINEATING COMMUNITY OUTBREAKS<br />

OF SALMONELLA ENTERICA SEROVAR<br />

TYPHIMURIUM USING WHOLE GENOME<br />

SEQUENCING: INSIGHTS INTO GENOMIC<br />

VARIABILITY WITHIN AN OUTBREAK<br />

S. Octavia 1 , Q. Wang 2 , M. Tanaka 1 , S. Kaur 1 , V.<br />

Sintchenko 2 , R. Lan 1 ;<br />

1<br />

University of New South Wales, Sydney,<br />

AUSTRALIA, 2 Westmead Hospital, Sydney,<br />

AUSTRALIA.<br />

Whole genome next generation sequencing<br />

(NGS) was used to retrospectively examine 57<br />

isolates from five epidemiologically confirmed<br />

community outbreaks (designated as Outbreak<br />

1 to Outbreak 5) caused by Salmonella enterica<br />

serovar Typhimurium (S. Typhimurium) phage<br />

type DT170. Most of the human and environmental<br />

isolates confirmed epidemiologically<br />

to be involved in the outbreaks were either<br />

genomically identical or differed by one to two<br />

single nucleotide polymorphisms (SNPs) with<br />

the exception of Outbreak 1. The isolates from<br />

Outbreak 1 differed by up to 12 SNPs, which<br />

suggests that the food source of the outbreak<br />

was contaminated with more than one strain<br />

while the other four outbreaks were caused by<br />

a single strain. In addition, NGS analysis ruled<br />

in isolates that were initially not considered to<br />

be linked with the outbreak, which increased<br />

the total outbreak size by 107%. The mutation<br />

process was modelled using known mutation<br />

rates to derive a cut-off value for the number<br />

of SNP difference to rule-in or rule-out a case<br />

being part of an outbreak. For an outbreak with<br />

less than one month ex vivo/in vivo evolution<br />

time, the maximum number of SNP differences<br />

between isolates is two and four SNPs<br />

using the slowest and the fastest mutation rates<br />

respectively. NGS of S. Typhimurium significantly<br />

increases the resolution of investigations<br />

of community outbreaks. It can also inform<br />

more targeted public health response by providing<br />

important supplementary evidence to<br />

rule-in and rule-out cases of disease associated<br />

with foodborne outbreaks of S. Typhimurium.<br />

n 67<br />

DEFINING CORE GENOME OF SALMONELLA<br />

ENTERICA SEROVAR TYPHIMURIUM<br />

FOR GENOMIC SURVEILLANCE AND<br />

EPIDEMIOLOGICAL TYPING<br />

S. Fu 1 , S. Octavia 1 , M. Tanaka 1 , V. Sintchenko 2 ,<br />

R. Lan 1 ;<br />

1<br />

University of New South Wales, Sydney,<br />

AUSTRALIA, 2 Westmead Hospital, Sydney,<br />

AUSTRALIA.<br />

Salmonella enterica serovar Typhimurium is<br />

the most common Salmonella serovar causing<br />

food borne infections in Australia and many<br />

other countries. Twenty one S. Typhimurium<br />

strains from Salmonella reference collection A<br />

(SARA) were analyzed using Illumina highthroughput<br />

genome sequencing. SNPs in 21<br />

SARA strains range from 46 SNPs to 11,916<br />

SNPs with an average of 1,577 SNPs per<br />

strain. Together with 47 selected from publicly<br />

available S. Typhimurium genomes, the S.<br />

Typhimurium core genes (STCG) was determined.<br />

The STCG consists of 3,846 genes,<br />

which is much larger than the set of 2,882<br />

Salmonella core genes (SCG) found previously.<br />

The STCG together with 1,576 core<br />

intergenic regions (IGRs) was defined as the S.<br />

Typhimurium core genome. Using 93 S. Typhimurium<br />

genomes from 13 epidemiologically<br />

confirmed community outbreaks, we demonstrated<br />

that typing based on S. Typhimurium<br />

core genome (STCG+ core IGRs) provides<br />

superior resolution and higher discriminatory<br />

power than that based on SCG for outbreak<br />

investigation and molecular epidemiology<br />

of S. Typhimurium. Both STCG and STCG+<br />

core IGRs typing achieved 100% separation<br />

of all outbreaks in comprison to SCG typing<br />

which failed to separate isolates from two<br />

outbreaks from background isolates. Defining<br />

the S. Typhimurium core genome allows<br />

standardization of genes/regions to be used for<br />

high resolution epidemiological typing of S.<br />

Typhimurium for genomic surveillance.<br />

84<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

n 68<br />

COMPARATIVE PHYLOGENOMIC ANALYSIS<br />

AND SEQUENCING OF M. TUBERCULOSIS<br />

DAYCARE OUTBREAK STRAINS IN CORK,<br />

IRELAND<br />

O. O. Ojo 1 , M. B. Prentice 2 ;<br />

1<br />

Southern University at New Orleans, New<br />

Orleans, LA, 2 University College Cork, Cork,<br />

IRELAND.<br />

Background and Hypothesis: In 2006, Cork-<br />

Kerry Health Region of Ireland had a tuberculosis<br />

(TB) notification rate of 16.3/100,000<br />

population, the highest nationwide. From 1999<br />

to 2006, TB rate in children


Poster <strong>Abstracts</strong><br />

fore, vast amounts of sequence data must be<br />

generated and analyzed to identify rare pathogen<br />

sequences. SPIDR-WEB is a sample-toresult<br />

process that relies on efficient laboratory<br />

and in silico steps. Clinical samples mostly<br />

comprise non-informative host RNAs or abundant<br />

housekeeping gene transcripts. SPIDR-<br />

WEB incorporates removal of non-informative<br />

RNAs (RNR), thereby enriching all other<br />

RNAs, including those from pathogens. This<br />

step enables either higher sensitivity and specificity,<br />

or less expensive and faster sequencing.<br />

Our custom EDGE bioinformatics data analysis<br />

platform provides rapid read classification<br />

at all taxonomic levels, and reliably detects<br />

all organisms present in a sample. EDGE is<br />

an efficient process, as it uses databases with<br />

pre-computed signatures, instead of aligning<br />

sequencing reads to the entire Genbank. In<br />

addition to RNR and EDGE, SPIDR-WEB<br />

includes robust, inexpensive and rapid sample<br />

lysis, RNA extraction, and library preparation<br />

steps.<br />

n 70<br />

EPIGENOMIC CHARACTERIZATION OF<br />

NEISSERIA GONORRHOEAE ISOGENIC<br />

MUTANTS AND CLINICAL ISOLATES TO<br />

EXAMINE THE ROLE OF DNA METHYLATION<br />

IN ANTIMICROBIAL RESISTANCE<br />

D. Trees, A. Abrams;<br />

CDC, Atlanta, GA.<br />

The emergence of multidrug-resistant Neisseria<br />

gonorrhoeae has hampered the control and<br />

prevention of gonorrhea in the United States<br />

and globally. Historically, most antimicrobial<br />

resistance in N. gonorrhoeae has resulted from<br />

the accumulation of mutations in a variety of<br />

chromosomal genes. The presence of these<br />

mutations results in levels of resistance to antibiotics<br />

that reduce the likelihood of successful<br />

therapy, and it has resulted in the elimination<br />

of some antibiotics as therapeutic agents. With<br />

the appearance of the mosaic form of penA,<br />

ceftriaxone MIC values increased 4-10 fold<br />

above those previously noted. An apparent<br />

consequence of the mosaic form of penA is the<br />

occurrence of treatment failures with various<br />

cephalosporins, and strains that contain the<br />

mosaic form are able to mutate to still higher<br />

levels of resistance to cefixime, cefpodoxime,<br />

and ceftriaxone. To increase the number of<br />

penA mutants available for analysis we used<br />

an approach, replicative mutagenesis, which<br />

allowed us to isolate large numbers of mutants<br />

in the penA gene. We used this approach to<br />

isolate a set of nine mutants in gonococcal<br />

strain 3502 (ceftriaxone MIC 0.06 µg/mL)<br />

which contains a mosaic-type penA gene. The<br />

MIC values to ceftriaxone for the 3502APMx<br />

mutants are >1.0 µg/mL. These 3502APMx<br />

mutants can be manipulated to increase their<br />

ceftriaxone MIC values to 6.0-8.0 µg/mL<br />

(3502APMx-x strains). The effects of mutations<br />

in the mosaic penA can be enhanced by<br />

second-site mutations to make gonococcal infections<br />

essentially untreatable. Whole genome<br />

analyses of the isogenic mutants did not identify<br />

significant novel genomic mutations that<br />

were shared among 3502APMx or 3502AP-<br />

Mx-x mutants. Therefore, genomic mutations<br />

alone did not fully explain the MIC patterns<br />

observed. To further elucidate the source<br />

of these increased MICs we looked at the<br />

methylation patterns of 3502 and the isogenic<br />

mutants. Initial results from the PacBio base<br />

modification detection analyses demonstrated<br />

that the 3502 reference, the nine AMP mutants,<br />

and a clinical control sample contained several<br />

shared m6A motifs and one m4C motif. It was<br />

also observed that as the ceftriaxone MICs increased,<br />

the number of modified motifs detected<br />

expanded, including some novel motifs that<br />

were classified as “unknown” by the software<br />

program. Methylated sites found in mutant<br />

isolates were associated with genes involved in<br />

transcription, translation, putative restrictionmodification<br />

systems, membranes, piliation,<br />

and phase variation. These results suggest that<br />

methylation might play a role in gonococcal<br />

antimicrobial resistance. Future study will be<br />

aimed to characterize the methylation patterns<br />

of additional clinical and laboratory reference<br />

86<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

isolates with varying degrees of antimicrobial<br />

resistance. These results will help to shed light<br />

on the role of epigenetics in antimicrobial<br />

resistance.<br />

n 71<br />

DEVELOPMENT OF A BIONUMERICS<br />

DATABASE AND EVALUATION OF AVERAGE<br />

NUCLEOTIDE IDENTITY USING MUMMER<br />

(ANI-M), RIBOSOMAL MULTI-LOCUS<br />

SEQUENCE TYPING (RMLST), AND RPOB<br />

GENE PHYLOGENY FOR IDENTIFICATION OF<br />

ENTERIC BACTERIA BY WHOLE GENOME<br />

SEQUENCE ANALYSIS<br />

G. Williams 1 , J. Pruckler 1 , R. L. Lindsey 1 , L.<br />

Gladney 1 , A. Huang 1 , L. S. Katz 1 , L. Garcia-<br />

Toledo 1 , S. Im 1 , K. Roache 1 , M. Turnsek 1 ,<br />

Z. Kucerova 1 , D. Stripling 1 , H. Martin 1 , B.<br />

Dinsmore 1 , S. van Duyne 1 , H. Carleton 1 , H.<br />

Pouseele 2 , N. Strockbine 1 , C. Tarr 1 , P. Fields 1 ,<br />

P. Gerner-Smidt 1 , C. Fitzgerald 1 ;<br />

1<br />

Centers for Disease Control and Prevention,<br />

Atlanta, GA, 2 Applied Maths, Sint-Martens-<br />

Latem, BELGIUM.<br />

Background: Conventional phenotypic and<br />

genotypic methods employed for identification<br />

of enteric bacteria, including Campylobacter,<br />

Escherichia, Shigella, Salmonella and Listeria,<br />

are labor-intensive, expensive, and require<br />

multiple workflows. We have begun development<br />

of an Enteric Identification Whole<br />

Genome Sequence (WGS) database. The<br />

PulseNet infrastructure (BioNumerics v 7.5) is<br />

being used to build the database, with the goal<br />

of identifying these enteric bacteria in a single<br />

workflow using WGS. Materials and Methods:<br />

Three different methods are being evaluated<br />

for inclusion in the BioNumerics Enteric<br />

Identification database: 1) Average Nucleotide<br />

Identity using MUMmer (ANI-m), which<br />

describes a pairwise distance between two<br />

genomes, 2) Ribosomal Multi-Locus Sequence<br />

Typing (rMLST), which is a presence/absence<br />

binary phylogeny for ribosomal genes, and 3)<br />

rpoB gene phylogeny, which describes the sequencing<br />

of rpoB and comparing it in a larger<br />

phylogeny. A set of genome assemblies - 157<br />

Campylobacter, 126 Escherichia, 23 Shigella,<br />

and 73 Listeria genomes, representing 23, 5,<br />

4, and 15 species for each genus, respectively<br />

- generated at CDC, provided by external partners,<br />

or publicly available through NCBI were<br />

selected to evaluate the methods. Results:<br />

ANI-m showed identities of ≥95% for members<br />

within a species for the six most common<br />

clinically-relevant Campylobacter species and<br />

≤92% for inter-species comparisons; ≥95%<br />

for members within a species for Escherichia<br />

and Shigella, and ≤90% for inter-species<br />

comparisons; ≥95% for members within a species<br />

for Listeria and ≤91% for inter-species<br />

comparisons. Due to the diversity of its four<br />

lineages, L. monocytogenes had slightly lower<br />

intra-species identity values (≥92%) and interlineage<br />

identity values (≥92% and ≤95%).<br />

Intra-lineage identity values for L. monocytogenes<br />

were consistent with ANI values of other<br />

Listeria species (≥95%). Total allele assignments<br />

for the 53 rMLST loci ranged from 14<br />

to 53 across the validation set, with fewer loci<br />

called for species rarely received at CDC. An<br />

rMLST phylogeny appropriately clustered all<br />

genomes in this evaluation to the species level<br />

when two or more genomes were represented<br />

for a species. Where the full-length rpoB gene<br />

was annotated, phylogenies appropriately clustered<br />

each species, with intra-species similarity<br />

≥91% and ≥95% for subspecies. Conclusions:<br />

This Enteric WGS Identification BioNumerics<br />

database will provide a single, unified,<br />

cost-effective approach for accurate species<br />

identification. Through continued collaboration<br />

with domestic and international partners,<br />

we will continue to test and refine the database<br />

and CLIA validate the reference identification<br />

methods within the next year.<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

87


Poster <strong>Abstracts</strong><br />

n 72<br />

TRANSFORMING PUBLIC HEALTH<br />

MICROBIOLOGY FOR CAMPYLOBACTER<br />

WITH WHOLE GENOME SEQUENCING:<br />

PULSENET AND BEYOND<br />

J. Pruckler 1 , D. Wagner 1 , G. Williams 1 , H.<br />

Carleton 1 , C. Bennett 1 , L. Joseph 1 , E. Trees 1 ,<br />

A. Huang 1 , L. S. Katz 1 , L. Gladney 1 , M. C.<br />

Maiden 2 , W. Miller 3 , Y. Chen 4 , S. Zhao 4 , P.<br />

McDermott 4 , J. Whichard 1 , H. Pouseele 5 , E. M.<br />

Ribot 1 , C. Fitzgerald 1 , P. Gerner-Smidt 1 ;<br />

1<br />

Centers for Disease Control and Prevention,<br />

Atlanta, GA, 2 University of Oxford, Oxford,<br />

UNITED KINGDOM, 3 United States Department<br />

of Agriculture, Albany, CA, 4 United States<br />

Food and Drug Administration, Laurel, MD,<br />

5<br />

Applied Maths, Inc., Sint-Martens-Latem,<br />

BELGIUM.<br />

Background: Conventional phenotypic and<br />

genotypic methods employed for identification<br />

and subtyping of Campylobacter are labor<br />

intensive, expensive, and imprecise. We have<br />

begun development of Enteric Reference Identification<br />

and Campylobacter subtype characterization<br />

whole genome sequence (WGS)<br />

databases. The PulseNet infrastructure (BioNumerics)<br />

will be used in conjunction with the<br />

existing BIGSdb platform to build the databases,<br />

with the goal of characterizing Campylobacter<br />

in a single workflow using WGS.<br />

Methods: Reference genomes (n=103) provided<br />

by FDA and USDA and strains (n=100)<br />

sequenced at CDC were used to develop the<br />

database. Assemblies and annotations were<br />

performed using the Computational Genomics<br />

Pipeline v0.4. These genomes cover the known<br />

members of the species and genera within<br />

Campylobacteraceae and the known genetic<br />

diversity of C. jejuni. The data will be used<br />

to set criteria for the current PubMLST.org/<br />

campylobacter locus definitions. Multiple subschema<br />

are being set up within the databases to<br />

perform identification and scalable, hierarchical<br />

subtyping that will include seven locus,<br />

ribosomal, core genome and whole genome<br />

(MLST, rMLST, cgMLST and wgMLST).<br />

Results: To date 203 reference genomes<br />

have been sequenced, annotated and used for<br />

development of the BioNumerics databases<br />

including an additional 600 isolates that are<br />

being used to validate the prototype databases.<br />

Conclusions: These WGS BioNumerics<br />

databases will provide a single, unified, costeffective<br />

approach for accurate species identification<br />

and subtyping to aid the surveillance of<br />

sporadic and outbreak related Campylobacter<br />

infections. Through continued collaboration<br />

with domestic and international partners, we<br />

will test and refine the nomenclature, databases<br />

and CLIA validate the reference identification<br />

subschema within the next year.<br />

n 73<br />

METAGENOMIC PATHOGEN DETECTION AND<br />

GUT MICROBIOME RESPONSE TO ACUTE<br />

SALMONELLA INFECTION<br />

A. D. Huang 1 , M. R. Weigand 1 , A. Pena-Gonzalez<br />

2 , K. T. Konstantinidis 2 , C. L. Tarr 1 ;<br />

1<br />

Centers for Disease Control and Prevention,<br />

Atlanta, GA, 2 Georgia Institute of Technology,<br />

Atlanta, GA.<br />

Background: Current diagnostic testing<br />

for bacterial foodborne pathogens relies on<br />

culture-based techniques even though many<br />

microorganisms, including known pathogens,<br />

cannot be cultured. Powerful sequence-based<br />

approaches such as metagenomics have potential<br />

to derive epidemiologically-relevant<br />

information directly from complex samples,<br />

bypassing the need to isolate individual organisms.<br />

However, such methods have not been<br />

systematically applied to foodborne pathogen<br />

detection because standardized bioinformatics<br />

techniques for analysis have not been established.<br />

Methods: We applied shotgun metagenomics<br />

to anonymized residual stool samples<br />

collected from foodborne outbreaks attributed<br />

88<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

to Salmonella to evaluate metagenomics as<br />

a diagnostic and disease surveillance tool, as<br />

well as to gain insight into the gut microbial<br />

community responses to foodborne bacterial<br />

infection. These outbreaks were geographically<br />

isolated and the etiologic agents were<br />

identified by culture methods as distinct strains<br />

of Salmonella enterica serovar Heidelberg.<br />

We performed shotgun sequencing on these<br />

samples using the Illumina MiSeq platform.<br />

Community and taxonomic analysis were<br />

performed using Parallel-META, Metaphlan,<br />

and GOTTCHA. Subspecies analysis was<br />

performed using BLAST recruitment analysis.<br />

Further phylogenetic analysis was performed<br />

on metagenomic assemblies of samples and<br />

resulting contigs matching S. enterica. Results:<br />

Sample consistency and human DNA<br />

sequence abundance varied greatly, often<br />

reducing the sequencing depth of the targeted<br />

microbial communities, yet referenced-based<br />

detection of Salmonella serovar Heidelberg<br />

was possible by metagenomic read recruitment<br />

as well as metagenomic assembly, even in<br />

samples with high human DNA content (90-<br />

96%). Taxonomic profiling revealed similar<br />

microbial community structures between individual<br />

patients from each localized outbreak;<br />

samples from different outbreaks clustered<br />

separately and were distinct from a subset of<br />

‘healthy’ references selected from the Human<br />

Microbiome Project. Microbial gut communities<br />

consistently showed reduced species<br />

diversity in each foodborne outbreak compared<br />

to ‘healthy’ references. Conclusions: These<br />

results highlight the potential utility of metagenomic-based<br />

diagnostic tools for foodborne<br />

pathogen identification and epidemiologically<br />

relevant clustering, even in samples with high<br />

human DNA abundance. Furthermore, shotgun<br />

metagenomic approaches offer additional insight<br />

into gut microbial community responses<br />

to foodborne illness that may hold clues to<br />

pathogen ecology.<br />

n 74<br />

INTEGRATING WHOLE GENOME<br />

SEQUENCING OF SALMONELLA ENTERICA<br />

SEROVAR ENTERITIDIS INTO THE PUBLIC<br />

HEALTH LABORATORY FOR SURVEILLANCE<br />

AND OUTBREAK INVESTIGATIONS<br />

K. J. Levinson 1 , M. Dickinson 2 , S. Wirth 2 , M.<br />

Anand 3 , D. J. Baker 2 , D. Bopp 2 , L. Thompson 2 ,<br />

K. A. Musser 2 , P. Lapierre 2 , W. J. Wolfgang 2 ;<br />

1<br />

School of Public Health, SUNY Albany,<br />

Albany, NY, 2 Wadsworth Center/NYSDOH,<br />

Albany, NY, 3 Bureau of Communicable Disease<br />

Control/NYSDOH, Albany, NY.<br />

Salmonella enterica serovar Enteritidis is<br />

a leading cause of foodborne illness in the<br />

United States. Pulsed-field gel electrophoresis<br />

(PFGE) is the gold standard for outbreak detection<br />

of enteric pathogens. However, the low<br />

genetic diversity of S. Enteritidis and frequent<br />

exchanges of mobile genetic elements limits<br />

how well PFGE can discriminate between<br />

isolates and identify clusters that may be<br />

epidemiologically linked. Two-thirds of all S.<br />

Enteritidis isolates received at the Wadsworth<br />

Center have PFGE patterns that are considered<br />

“endemic” and over half of these are pattern<br />

JEGX01.0004. Consequently, these cases<br />

are not routinely investigated by epidemiologists.<br />

To improve discrimination between<br />

sporadic and outbreak associated isolates, the<br />

Wadsworth Center began performing whole<br />

genome sequencing (WGS) single nucleotide<br />

polymorphism (SNP) based phylogenetic typing<br />

on all S. Enteritidis isolates in addition to<br />

PFGE typing. The goal of this project was to<br />

explore the utility of incorporating WGS-based<br />

typing into routine public health laboratory<br />

surveillance and to establish a standard for reporting<br />

WGS data in a manner that was useful<br />

for both laboratorians and epidemiologists. Using<br />

a pipeline developed in-house, we created<br />

cumulative SNP based phylogenetic trees from<br />

514 S. Enteritidis isolates in real time over a<br />

period of 20 months. Based on retrospective<br />

studies, we used a SNP diversity of 0-5 to de-<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

89


Poster <strong>Abstracts</strong><br />

fine a genomic cluster. Within the 80 genomic<br />

clusters identified, we found that 20% contained<br />

multiple PFGE patterns and conversely,<br />

we found 35 genomic clusters within PFGE<br />

pattern JEGX01.0004. We then analyzed these<br />

clusters in a time-dependent manner and present<br />

an intuitive non-tree based plot for rapid<br />

identification of “clusters of interest” (defined<br />

as clusters that contained at least 4 isolates<br />

collected within 60 days of each other). Importantly,<br />

these parameters can be modified in<br />

real time based on epidemiological feedback.<br />

By sequencing all S. Enteritidis isolates concurrently<br />

with PFGE typing, we show it is<br />

possible to cluster isolates both genomically<br />

and epidemiologically in a manner that is more<br />

discriminatory than PFGE typing. We also<br />

demonstrate the utility and feasibility of this<br />

method in the public health laboratory setting.<br />

As we anticipate that the increase of clusters<br />

detected will pose a challenge in conducting<br />

epidemiological follow-up, we are now focused<br />

on modifying the methods to better prioritize<br />

these clusters to create a more valuable<br />

tool for epidemiologic investigations.<br />

n 75<br />

DETECTION OF ANTIMICROBIAL<br />

RESISTANCE MARKERS IN NEISSERIA<br />

GONORRHOEAE AND MIXED GONOCOCCAL<br />

INFECTION DIRECTLY FROM CLINICAL<br />

SAMPLES USING NEXT GENERATION<br />

SEQUENCING<br />

R. Graham, A. Jennison;<br />

Queensland Department of Health, Coopers<br />

Plains, AUSTRALIA.<br />

Introduction: The rise in Neisseria gonorrhoeae<br />

strains with reduced susceptibility to<br />

antibiotics is a major public health concern,<br />

and the ongoing surveillance of antimicrobial<br />

resistance (AMR) in community strains of<br />

N. gonorrhoeae is essential. In many regions<br />

however, the majority of N. gonorrhoeae<br />

cases are now diagnosed by molecular assays<br />

only, meaning that no isolate is obtained for<br />

antimicrobial susceptibility testing. A number<br />

of molecular markers have been identified<br />

for reduced susceptibility to antimicrobials,<br />

however, screening for these markers requires<br />

multiple tests, and these may miss novel mutations.<br />

By sequencing the entire genome of N.<br />

gonorrhoeae directly from a clinical sample<br />

using next generation sequencing (NGS), both<br />

known and novel mutations associated with<br />

AMR can be identified. In addition, typing information<br />

such as NG-MAST and MLST types<br />

can be collected, and potential mixed gonoccocal<br />

infections can be detected. Methods: DNA<br />

was extracted from eleven urine Cobas PCR<br />

media specimens positive for N. gonorrhoeae<br />

by the Roche Cobas 4800 CT/NG test. DNA<br />

was enriched for microbial DNA using the<br />

NEBNext microbiome kit and sequenced using<br />

the Ion Torrent PGM workflow. Sequences<br />

that aligned to the human genome were filtered<br />

out and the remaining sequences were de novo<br />

assembled into contigs and searched for the<br />

regions of interest using Ridom SeqSphere.<br />

MLST and NG-MAST alleles were assigned<br />

according to the schemes at PubMLST.net<br />

and NG-MAST.net respectively. Results: All<br />

eleven of the clinical samples tested generated<br />

a sufficient number of N. gonorrhoeae<br />

sequence reads to provide full coverage of the<br />

genome at a depth of 6-130x. Complete MLST<br />

and NG-MAST profiles could be generated<br />

for each of the samples. None of the samples<br />

had more than one sequence type detected<br />

in the one sample, which would have been<br />

indicative of a same site mixed gonococcal<br />

infection. However, when samples of two different<br />

sequence types were artificially created<br />

in vitro, the two distinct allele sequences could<br />

be detected, suggesting that naturally occurring<br />

mixed infections would be identified. The presence<br />

of ten different AMR markers was investigated,<br />

and mutations associated with reduced<br />

susceptibility to cephalosporins, quinolones<br />

and tetracycline were identified. Conclusions:<br />

We found that multiple levels of N. gonorrhoeae<br />

typing information could be generated<br />

directly from clinical samples using NGS. This<br />

study also demonstrated proof of principle<br />

for utilising NGS to identify same site mixed<br />

90<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

gonococcal infections in a culture independent<br />

manner. Ten AMR markers were examined,<br />

here and with the complete genome information<br />

provided by NGS there is the potential for<br />

many more to be investigated and for novel<br />

markers to be identified, highlighting the potential<br />

of this technology to enable continued<br />

AMR surveillance in areas where culture is no<br />

longer performed.<br />

n 76<br />

SUBTYPING OF E. COLI STEC BY<br />

TRADITIONAL LABORATORY METHODS AND<br />

WGS, A COMPARISON!<br />

E. Litrup, K. Kiil;<br />

Statens Serum Institut, Copenhagen, DEN-<br />

MARK.<br />

The traditional subtyping of E. coli STEC<br />

strains consists of many different and somewhat<br />

laborious and slow methods. In Denmark,<br />

we began whole genome sequencing of all E.<br />

coli STEC strains isolated from humans in<br />

January 2015. From January through June,<br />

we received and typed more than 70 STEC<br />

strains by both the traditional methods and<br />

WGS. The methods performed in our laboratory<br />

are serotyping, PCR assays and dot blot<br />

hybridization among others. Whole Genome<br />

Sequencing was performed in-house on an<br />

Illumina MiSeq with the Nextera Library<br />

Preparation kit and 250bp paired reads. For the<br />

comparison of WGS and traditional subtyping,<br />

we focused on the serotype, the stx1 and stx2<br />

subtypes and the presence of several genes<br />

e.g. the eae and ehxA gene. For detection of<br />

the serotype and virulence genes, we used the<br />

CGE finders (Serotype Finder and Virulence<br />

Finder https://cge.cbs.dtu.dk/services/), but<br />

we also used reference based mapping to the<br />

databases behind the finders with srst2 (https://<br />

github.com/katholt/srst2). The serotype was<br />

the subtyping method with the most divergent<br />

results as expected. The Serotype Finder was<br />

able to detect all but one serovar detected by<br />

the traditional serotyping. Additionally, almost<br />

all the strains typed as rough or smooth<br />

in the laboratory, were assigned an O-type<br />

by the Serotype Finder tool. Further, all the<br />

non-motile strains with no phenotypic H-type,<br />

were assigned an H-type by the Serotype<br />

Finder tool. Finally, there were some cross<br />

reactions between different sera in the laboratory,<br />

where the Serotype Finder was also able<br />

to give a result regarding which H type was<br />

more similar to the one sequenced. Regarding<br />

the detection of the stx1 and stx2 variants, eae<br />

and ehxA genes, we found an almost 100%<br />

correlation to the laboratory results achieved<br />

by PCR and dot blot hybridization. Only in a<br />

few cases did the laboratory achieve different<br />

results. We used the Virulence Finder on<br />

contigs assembled denovo by CLC and also we<br />

used reference based mapping to the databases<br />

behind the finder using SRST2, and we saw<br />

no differences in the performance of the two<br />

approaches. It is believed that reference based<br />

mapping is important for gene detection in E.<br />

coli as it can be challenging to assemble the<br />

reads into acceptable contigs, but in the case of<br />

these four genes and their variants this was not<br />

the case. This might be due to the location of<br />

these genes; they are probably located in parts<br />

of the genome that are easy to assemble.<br />

n 77<br />

DR. JEKYLL AND MR. HYDE: THE CASE<br />

FOR MIXED NOCARDIA POPULATIONS<br />

IN CLINICAL ISOLATES AS REVEALED BY<br />

WHOLE GENOME SEQUENCE ANALYSIS<br />

A. C. Lauer, B. Lasker, J. R. McQuiston;<br />

Centers for Disease Control and Prevention,<br />

Atlanta, GA.<br />

Nocardia species are aerobic GC-rich,<br />

partially-acid fast, opportunistic, pathogenic<br />

actinomycetes. Every year, approximately 200<br />

nocardiae isolates are received by the Special<br />

Bacteriology Reference Laboratory (SBRL) at<br />

CDC, however the true incidence of nocardiosis<br />

in the U.S. remains unknown. Despite reports<br />

of resistance to co-trimoxazole, the treatment<br />

of choice for nocardiosis, the molecular<br />

mechanisms of resistance in nocardiae are not<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

91


Poster <strong>Abstracts</strong><br />

well understood. To investigate the molecular<br />

mechanisms responsible for resistance in nocardiae,<br />

a total of 144 isogenic isolates from<br />

7 N. farcinica and 5 N. nova clinical isolates,<br />

with known genetic relationships and resistance<br />

profiles, were produced in our laboratory.<br />

The genomes of these isogenic susceptible and<br />

resistant isolates were compared to detect genetic<br />

changes that directly correlated with the<br />

emergence of resistance. Surprisingly, variant<br />

detection yielded large numbers of high quality<br />

SNPs (~150,000) between isogenic isolates.<br />

Further obstacles were encountered that necessitated<br />

further investigation into the nature of<br />

the nocardial genome when only 49% to 84%<br />

of the sequence reads mapped to the available<br />

complete genomes. Therefore, genomes were<br />

also assembled de novo using SPAdes, Velvet,<br />

Mira, and CLC Genomics Workbench and all<br />

assemblies were evaluated using QUAST. No<br />

assembler, however, produced a single contig.<br />

Optical mapping data indicated that less than<br />

10% of the contigs from the best assemblies<br />

mapped to the optical map and revealed a 2<br />

MB difference between our isolates and the<br />

published reference strain for N. nova. Mapping<br />

of reads to housekeeping genes (16S,<br />

gyrB, rpoB) revealed the presence of a high<br />

level of minority alleles at specific positions.<br />

Alignment of the consensus sequences for<br />

isogenic isolates at specific genes showed that<br />

the minority alleles of one isolate were the<br />

dominant alleles in other isolates such that<br />

nucleotides appeared to alternate at specific<br />

bases. After contamination by other bacteria<br />

was ruled out, K-mer based trees show that<br />

despite the many differences between isolates,<br />

those which are isogenic group together and<br />

are phylogenetically closer to each other than<br />

to other members of the same species but from<br />

different parental lineages. The data suggest<br />

that clinical Nocardia isolates may occur naturally<br />

as mixed populations, and may explain<br />

why published nocardiae genomes often show<br />

multiple copies of housekeeping genes that<br />

are normally single-copy housekeeping genes<br />

such as rpoB and gyrB. Further investigations<br />

are needed to verify this hypothesis. A better<br />

understanding of the ecology of the organisms<br />

sequenced by SBRL is needed such that DNA<br />

extraction, genome sequencing, and genome<br />

analysis can be informative and representative<br />

of the true biology of the organisms before<br />

genomics can be solely relied upon for surveillance.<br />

n 78<br />

EVALUATING THE POTENTIAL OF USING<br />

NEXT-GENERATION SEQUENCING FOR<br />

DIRECT CLINICAL DIAGNOSTICS OF FECAL<br />

SAMPLES FROM DIARRHEA PATIENTS<br />

O. Lukjancenko 1 , K. G. Joensen 2 , F. M. Aarestrup<br />

1 ;<br />

1<br />

Technical University of Denmark, Kongens<br />

Lyngby, DENMARK, 2 Statens Serum Institut,<br />

Copenhagen, DENMARK.<br />

Diarrhea is a major global disease burden and<br />

rapid, precise identification of the causative<br />

pathogens is important to initiate treatment as<br />

well as prevent further spread and potential<br />

outbreaks. The current routine diagnostics<br />

involve a number of different procedures,<br />

and often the causative agent is not identified<br />

in time to guide clinical care. In addition,<br />

in many clinical cases the causative agent<br />

is never identified. With next-generation sequencing<br />

(NGS) becoming cheaper, it has huge<br />

potential in the routine diagnostic settings. The<br />

aim of this study was to conduct a pilot project<br />

to evaluate the potential of performing NGSbased<br />

diagnostics through direct sequencing of<br />

fecal samples. A total of 61 clinical diarrheal<br />

fecal samples, including 48 pathogen-positive<br />

and 13 pathogen-negative, were obtained<br />

from diarrhea patients as part of the routine<br />

diagnostics at the Department of Clinical Microbiology<br />

at Hvidovre University Hospital in<br />

Denmark. Ten control samples from healthy<br />

individuals were additionally included. Complete<br />

DNA content was extracted from fecal<br />

samples and sequencing was performed on the<br />

Illumina MiSeq system. Sequence data was<br />

analyzed by the MGmapper (http://cge.cbs.<br />

dtu.dk/services/MGmapper/) software to as-<br />

92<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

sess the species distribution. Sequence-based<br />

diagnostic prediction was performed for pathogenic<br />

bacteria and Giardia by evaluating the<br />

relative abundance of pathogens in the samples<br />

as well as the presence of pathogen-specific<br />

virulence genes. The results obtained from the<br />

sequence-based diagnostic were compared to<br />

the conventional findings for 51 (of the total<br />

61) diarrheal samples, 40 of which by conventional<br />

diagnostic methods were found positive<br />

for bacterial pathogens and 11 of which were<br />

found negative. The NGS-based diagnostic approach<br />

proved to enable detection of the same<br />

bacterial pathogens as the classical approach<br />

in 37 of the 40 clinically-positive samples as<br />

well predict responsible pathogens in eight of<br />

the eleven samples from clinically ill patients<br />

that had been found negative by conventional<br />

methods. Overall, the NGS-based diagnostic<br />

approach enabled pathogen-detection similar<br />

to the current routine diagnostics, and this<br />

analytical approach has the potential to be<br />

extended to be applicable for detection of<br />

other pathogens. With further expansion of<br />

the analysis and obtainment of more sequence<br />

data per sample, the method does hold promise<br />

for better diagnostic detection. At present,<br />

however, this type of metagenomic analysis is<br />

too expensive per sample, and the turnaround<br />

time too long for it to become part of routine<br />

diagnostics.<br />

n 79<br />

EXPLORATION OF WHOLE GENOME<br />

SEQUENCING OF SHIGA TOXIN PRODUCING<br />

E. COLI IN THE NEW YORK STATE PUBLIC<br />

HEALTH LABORATORY TO IDENTIFY<br />

SEROGROUP, VIRULENCE FACTORS, AND<br />

GENOMIC CLUSTERS<br />

S. E. Wirth, T. Quinlan, D. J. Baker, T. Halse,<br />

P. Lapierre, K. A. Musser, W. J. Wolfgang;<br />

Wadsworth Center, NYSDOH, Albany, NY.<br />

In the United States, it is estimated that shiga<br />

toxin producing E.coli (STEC) is responsible<br />

for at least 265,000 infections each year. STEC<br />

serogroup O157 causes about one quarter<br />

of these infections while the rest are due to<br />

non-O157 serogroups. Accurate and timely<br />

identification of STEC is crucial to patient care<br />

and epidemiological tracebacks that are carried<br />

out to identify sources. At the Wadsworth<br />

Center, some virulence factors and the more<br />

common serogroups are currently characterized<br />

for each isolate by using up to seven realtime<br />

PCR assays. Concurrently, pulsed-field<br />

gel electrophoresis (PFGE) is performed to<br />

identify the PFGE subtype to aid in epidemiological<br />

investigations. The use of multiple<br />

molecular tests for identification and typing is<br />

time-consuming and expensive. Studies have<br />

shown that Whole Genome Sequencing (WGS)<br />

can yield information regarding virulence factors,<br />

serogroup, and genomic subtyping from<br />

a single dataset potentially leading to reduced<br />

costs and improved turn around time. Furthermore,<br />

genomic subtyping can improve cluster<br />

resolution compared to the gold standard of<br />

PFGE. To begin to evaluate WGS as a “onestop-shop”<br />

for STEC identification and typing,<br />

we have sequenced and analyzed sporadic and<br />

outbreak-associated STEC isolates. WGS was<br />

performed on an Illumina MiseqTM using 2 x<br />

250 paired end chemistry. After passing quality<br />

control metrics, sequence reads were analyzed<br />

using an in-house developed bioinformatic<br />

pipeline and the Virulence Finder database.<br />

Serogroup and virulence profiles were assigned<br />

from raw reads and assembled genomes. In<br />

addition, the pipeline mapped raw reads to<br />

a single reference genome and a SNP based<br />

phylogenetic tree was produced. Our analysis<br />

of STEC WGS data shows a high degree of<br />

concordance between SNP-based phylogenetic<br />

clusters, PFGE clusters, and epidemiologically<br />

defined outbreaks. Additionally, we can reliably<br />

ascertain certain isolate serogroups and<br />

virulence factors. Our next steps are to refine<br />

the pipeline, analyze isolates prospectively in<br />

real-time, integrate the process into our clinical<br />

laboratory information system, and seek New<br />

York State clinical laboratory certification to<br />

report serogroup and virulence data.<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

93


Poster <strong>Abstracts</strong><br />

n 80<br />

NANOPORE + ION TORRENT SEQUENCING,<br />

ASSEMBLY, AND ANNOTATION OF CULTURE-<br />

FREE STREAMBED PLASMIDS REVEALS<br />

HITCHHIKING GENES FOR RESISTANCE TO<br />

MULTIPLE HUMAN CLINICAL ANTIBIOTICS<br />

S. D. Turner 1 , E. Gehr 2 , K. Libuit 2 , C. Kapsak<br />

2 , J. Herrick 2 ;<br />

1<br />

University of Virginia, Charlottesville, VA,<br />

2<br />

James Madison University, Harrisonburg, VA.<br />

Background: Transmissible plasmids affect<br />

environmental ecosystems by facilitating<br />

exchange and recombination of antibiotic resistance<br />

genes. This exchange occurs between<br />

native bacterial populations and introduced<br />

fecal pathogens selected for resistance in farm<br />

animals. As such, native bacteria in aquatic<br />

and soil habitats may act as incubators and<br />

sites for recombination of genes that are subsequently<br />

transferred to human pathogens.<br />

Methods: Transmissible plasmids were<br />

captured exogenously from stream sediment<br />

samples by releasing cells from sediment and<br />

conjugating with a rifampicin-resistant strain<br />

of E. coli. Transconjugants were selected on<br />

tetracycline-and rifampicin-amended medium.<br />

Plasmids were purified and electroporated<br />

into an electrocompetent E. coli strain and<br />

tested for decreased antibiotic susceptibility<br />

relative to the un-electroporated strain using<br />

a modified Stokes disk diffusion method. Antibiotics<br />

tested were tetracycline, gentamicin,<br />

ciprofloxacin, sulfamethoxazole/trimethoprim,<br />

imipenem, tobramycin, kanamycin, aztreonam,<br />

ticarcillin, piperacillin, tazobactam, and<br />

cefepime. Two plasmids were sequenced using<br />

both the Oxford Nanopore MinION and<br />

Ion Torrent PGM. Long-read nanopore data<br />

was assembled with the Celera assembler, the<br />

assembly was error-corrected using the more<br />

accurate short read data, and the resulting<br />

polished assemblies were annotated against<br />

UniProt, RefSeq, Pfam, and TIGRFAMs.<br />

Unassembled reads were further screened<br />

for genes encoding antibiotic resistance. Results:<br />

23 of 30 captured plasmids conferred<br />

decreased susceptibility to multiple antibiotics<br />

in addition to tetracycline. The most common<br />

phenotypes were tetR, kanR, ticR, pipR, tetR,<br />

kanR, ticR, pipR, and fepR. One plasmid conferred<br />

decreased susceptibility to seven of the<br />

tested antibiotics (tet, tob, kan, tic, tzp, pip,<br />

and fep). Nanopore sequencing and assembly<br />

of this plasmid resulted in two contigs, each<br />

approximately 15kb. Contigs were screened<br />

for antibiotic resistance against ResFinder,<br />

containing 2203 genes across 13 drug classes.<br />

The following genes were identified with<br />

>93% identity: tetC, tetG, aadA9, aadA2, and<br />

sul1 (2X) - suggesting the presence of one or<br />

more Class 1 integrons - floR, aph(3’)-Ic, strB,<br />

and blaCARB-2. A second plasmid exhibiting<br />

decreased susceptibility to tet, cip, kan, tic, and<br />

pip was also sequenced, resulting in a single<br />

~90kb contig. Annotation revealed a similar resistance<br />

profile. Conclusions: The presence of<br />

genes encoding resistance to multiple human<br />

clinical antibiotics on transmissible plasmids<br />

selected using tetracycline suggests that there<br />

may be a significant reservoir of antibiotic<br />

resistance genes in stream sediments and that<br />

these may be capable of transmission to pathogenic<br />

Enterobacteriaceae.<br />

n 81<br />

AVERAGE NUCLEOTIDE IDENTITY<br />

(ANI) ANALYSIS OF A DIVERSE SET OF<br />

ESCHERICHIA<br />

S. B. Im 1 , L. Rishishwar 2 , A. D. Huang 1 , L. S.<br />

Katz 1 , H. A. Carleton-Romer 1 , E. Trees 1 , N.<br />

Strockbine 1 , R. L. Lindsey 1 ;<br />

1<br />

Centers for Disease Control and Prevention,<br />

Atlanta, GA, 2 Georgia Institute of Technology,<br />

Atlanta, GA.<br />

The advent of high throughput sequencing<br />

technologies has provided the opportunity to<br />

explore new methods to characterize organisms,<br />

not only more efficiently, but also more<br />

discriminately than traditional phenotypic<br />

methods. Average nucleotide identity (ANI)<br />

94<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

is a value that can be used to compare the<br />

relatedness of two genomic sequences. Findings<br />

from ANI analysis have been shown to<br />

correlate strongly with those using DNA-DNA<br />

hybridization (DDH) methods; with ANI<br />

values of ≥ 94% being equivalent to ≥ 70%<br />

DDH for bacteria belonging to the same species.<br />

In previous studies with a limited number<br />

of strains, ANI was reported to successfully<br />

distinguish members of the genus Escherichia.<br />

To evaluate the limitations of ANI analysis to<br />

distinguish members of Escherichia/Shigella,<br />

we expanded our analysis to include more<br />

strains between species, as well as within and<br />

between common serogroups that are relevant<br />

to public health. One hundred seventy strains<br />

from nine different species and 20 different<br />

serogroups were tested; 13 diverse Salmonella<br />

spp. were included as an outgroup. To detect<br />

the ANI variability within a serogroup, at least<br />

seven strains were analyzed from each of the<br />

common Shiga-toxin producing E. coli serogroups:<br />

O26, O45, O103, O111, O121, O145<br />

and O157. ANI between two genomes was<br />

computed using the BLAST algorithm, where<br />

each genome in the dataset was compared in<br />

an all-against-all pairwise fashion. Each comparison<br />

yielded a two-way ANI value describing<br />

the genetic relatedness between organisms.<br />

We found ANI values of ≥ 98% between and<br />

among E. coli isolates belonging to serogroups<br />

O26, O69, O118, O146, O103, O121, O145,<br />

O45, O91 and O128; ≥ 97% between and<br />

among isolates belonging to serogroups O157,<br />

O127, and the four Shigella species; ≤ 89%<br />

between E. coli/Shigella strains and those of<br />

other Escherichia species; and ≤ 80% between<br />

isolates of the genera Escherichia and Salmonella.<br />

Pairwise ANI values showed higher<br />

percent similarity between closely related serogroups<br />

indicating a clear distinction between<br />

Escherichia species and Salmonella.<br />

n 82<br />

SEQUENCE AND STRUCTURAL VARIANCE<br />

IN RECENT BORDETELLA PERTUSSIS<br />

GENOMES<br />

M. M. Williams, M. R. Weigand, Y. Peng, K. E.<br />

Bowden, M. L. Tondella;<br />

Centers for Disease Control and Prevention,<br />

Atlanta, GA.<br />

Pertussis disease, caused by the bacterium Bordetella<br />

pertussis, has resurged in the U.S. in<br />

recent decades, despite high vaccine coverage.<br />

Genome analysis of 424 isolates was undertaken<br />

to determine causes for pertussis increase.<br />

Of these isolates, 411 were obtained from 34<br />

US states for the period 2000-2014. Three vaccine<br />

strains were also included, and the remaining<br />

10 isolates were from other countries<br />

and historic collections, isolated between 1939<br />

and 1967. Sequence variants, including insertions,<br />

deletions, small replacements, SNPs and<br />

MNPs (single and multiple nucleotide polymorphisms,<br />

respectively) were determined by<br />

mapping Illumina sequencing reads against the<br />

Tohama I vaccine reference. A subset of 171<br />

complete genomes were assembled from long<br />

read sequence data obtained using the Pacific<br />

Biosciences RS II platform. Assemblies were<br />

validated by comparison to genome optical<br />

maps and sequence accuracy was confirmed by<br />

independent analysis using DNA cluster sequencing<br />

(Illumina MiSeq or HiSeq platform).<br />

Genome structure variation was determined by<br />

whole genome alignment with progressiveMauve.<br />

A total of 4,655 variants were detected<br />

in 424 genomes, the majority of which were<br />

SNPs (89%). Twenty-nine percent of variants<br />

occurred in non-coding regions, 26% were<br />

synonymous, and 45% were non-synonymous<br />

variants. The locus displaying the most variation<br />

was prn, the gene encoding pertactin,<br />

a component of acellular pertussis vaccines<br />

that has mutated rapidly in the last five years,<br />

resulting in pertactin non-expression in the<br />

majority of current US isolates. Variants were<br />

found in 951 other open reading frames, involved<br />

in a wide variety of cellular functions.<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

95


Poster <strong>Abstracts</strong><br />

Phylogenetic analysis of SNPs clustered isolates<br />

by time. By contrast, 35 unique genomic<br />

structural patterns were observed among the<br />

171 assembled genomes of two vaccine strains<br />

and 169 isolates obtained from 27 states in<br />

2000-2014. Pattern variation was due to inversion<br />

of genome sections, many with IS481, an<br />

insertion sequence element found in 240-256<br />

copies per genome, located at the predicted<br />

boundary sites. Although B. pertussis does<br />

not demonstrate great variety at the nucleotide<br />

level, several genomic rearrangements are<br />

circulating in current strains, even within the<br />

same epidemic. Implications for improving<br />

molecular epidemiology and understanding<br />

pertussis pathogenicity will be explored.<br />

n 83<br />

NON-TRAVEL ASSOCIATED SALMONELLA<br />

TYPHIMURIUM ST313 IN THE UK<br />

P. Ashton, C. Lane, S. Nair, T. Peters, E. de<br />

Pinna, K. Grant, T. Dallman;<br />

Public Health England, London, UNITED<br />

KINGDOM.<br />

Salmonella enterica serovar Typhimurium<br />

typically causes self-limiting gastroenteritis.<br />

However, invasive non-typhoidal Salmonella<br />

disease has been recently documented in many<br />

sub-Saharan African countries, causing significant<br />

morbidity and mortality. The majority<br />

of these invasive isolates belong to one of two<br />

monomorphic lineages within a single Multi-<br />

Locus-Sequence-Type, ST313. There is also<br />

evidence that ST313 is more invasive than<br />

other closely related S. Typhiumrium in both<br />

humans (clinical reports) and chickens (in vivo<br />

experiments). Here, we present data on 16 isolates<br />

of ST313 received by the Public Health<br />

England Salmonella Reference Service. Three<br />

of sixteen isolates were associated with travel<br />

to Africa. All of these isolates caused invasive<br />

disease and they cluster within the previously<br />

described ST313 phylogenetic lineages. The<br />

majority of isolates of ST313 in England<br />

(13/16) are non-travel associated. Only 1 of<br />

these 13 isolates was associated with invasive<br />

disease and they cluster into 4 new phylogenetic<br />

lineages, significantly increasing the<br />

diversity observed within ST313. Accessory<br />

genome analysis reveals that there are genomic<br />

elements exclusively associated with the sub-<br />

Saharan African lineages including pro-phage<br />

and antibiotic resistance determinants. These<br />

results suggest that ST313 in England is epidemiologically,<br />

clinically and genomically<br />

heterogeneous, with a geographic relationship<br />

underlying this heterogeneity.<br />

n 84<br />

EFFECT OF SEQUENCING READ ERROR<br />

CORRECTION METHOD ON HQSNPS<br />

DISCOVERY AND PHYLOGENETIC TREE<br />

TOPOLOGY IN OUTBREAKS<br />

D. D. Wagner, H. Carleton, E. Trees;<br />

Centers for Disease Control and Prevention,<br />

Atlanta, GA.<br />

Benchtop next-generation sequencing (NGS)<br />

machines offered by the Illumina platform<br />

allow public health laboratories to rapidly<br />

sequence multiple bacterial strains in multistate<br />

foodborne outbreaks. High-quality single<br />

nucleotide polymorphisms (hqSNPs) provide a<br />

means for inferring phylogenetic associations<br />

among closely-related outbreak strains. Yet,<br />

NGS technologies often introduce miscalled<br />

bases or other types of errors that diminish<br />

the number of phylogenetically-informative<br />

hqSNPs. The current study shows how bioinformatics<br />

methods based upon free software<br />

packages mitigate NGS sequencing errors<br />

and increase counts of phylogeneticallyinformative<br />

hqSNPs. NGS datasets from five<br />

foodborne pathogen outbreaks were cleaned<br />

in QUAKE, BayesHammer, and FastX. In<br />

three Salmonella enterica outbreak clusters<br />

representing serovars Enteritidis, Newport,<br />

and Baildon, reads corrected/trimmed through<br />

Fastx, Quake, and BayesHammer all increased<br />

the counts of informative hqSNP positions by<br />

at least 9 positions or at most 195 positions<br />

96<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

when compared with hqSNPs inferred on<br />

uncorrected reads. In a cluster of Salmonella<br />

Paratyphi B var. (L+) tartrate+, the Bayes-<br />

Hammer-cleaned reads and FastX-trimmed<br />

reads yielded 314 and 311 informative hqSNP<br />

positions, respectively. In the same set, the<br />

uncorrected reads yielded 292 informative<br />

hqSNP positions, unexpectedly outperforming<br />

the Quake-cleaned reads with 241 hqSNP<br />

positions. In the fifth data set, a cluster of E.<br />

coli O157:H7, FastX-trimmed reads yielded<br />

66 hqSNP positions and Quake-cleaned reads<br />

yielded 65 hqSNP positions. BayesHammercleaned<br />

reads and uncorrected reads both performed<br />

worse for the O157:H7 set, with 61 and<br />

59 hqSNP postions, respectively. In the Enteritidis<br />

and Baildon sets, BayesHammer-corrected<br />

reads yielded hqSNP phylogenies with<br />

median bootstrap support values, 61% and<br />

24%, respectively, across all internal branches<br />

of each tree. By comparison, FastX-trimmed or<br />

untrimmed reads had a median bootstrap support<br />

of at least 49% for the Baildon cluster and<br />

0% bootstrap support for the Enteritidis cluster.<br />

For the Paratyphi B hqSNP phylogeny, median<br />

bootstrap values were 100% across all four<br />

read-trimming methods, but ranged from 1%<br />

support in the untrimmed tree up to 20% support<br />

in the FastX tree. Quake-corrected reads<br />

yielded hqSNP trees with the highest median<br />

bootstrap values for the Salmonella Newport<br />

cluster (median support = 51%) and the E. coli<br />

O157 cluster (median support =27%). These<br />

results indicate that BayesHammer enables<br />

discovery of the largest numbers of hqSNPs<br />

and while BayesHammer and Quake both perform<br />

well for inferring hqSNPs phylogenetic<br />

trees. Yet, as Quake may decrease the counts<br />

of phylogenetically-informative SNP positions,<br />

BayesHammer and FastX are likely the best<br />

first-pass cleaning/correction tools for hqSNPs<br />

pipelines.<br />

n 85<br />

EVALUATION OF WHOLE GENOME<br />

SEQUENCING TO CONFIRM OR REFUTE<br />

CLONALITY OF CONVENTIONAL GENOTYPE-<br />

DEFINED CLUSTERS OF MYCOBACTERIUM<br />

TUBERCULOSIS<br />

L. Cowan, J. Posey;<br />

Centers for Disease Control and Prevention,<br />

Atlanta, GA.<br />

Since 2004, the Division of Tuberculosis Elimination<br />

has conducted genotyping surveillance<br />

of Mycobacterium tuberculosis. Genotyping<br />

data is integrated with patient demographic<br />

and clinical data and routinely analyzed to<br />

identify suspected outbreaks. However, the<br />

discriminatory power of the current genotyping<br />

methods are sometimes insufficient, and some<br />

suspected outbreaks identified by genotyping<br />

include a mix of outbreak and sporadic cases<br />

or do not represent an outbreak. Genomic surveillance<br />

has the power to increase the accuracy<br />

of outbreak detection systems by analyzing<br />

the entire genome versus less than 1% using<br />

conventional methods. We conducted whole<br />

genome sequencing (WGS) for 20 (number of<br />

isolates per cluster, 10 - 100) suspected large<br />

outbreaks identified by routine genotyping.<br />

Reference-guided assemblies of Illumina sequence<br />

read sets were created using LaserGene<br />

SeqMan NGen (DNAStar). Polymorphisms<br />

between the clustered isolates were identified<br />

by comparing assembled genomes including<br />

coverage statistics read depth and distribution<br />

of base calls for reliable differences. The final<br />

set of polymorphisms were confirmed in each<br />

assembly by visually checking each position in<br />

the mapped reads of the assembly. The number<br />

of single nucleotide polymorphisms (SNPs)<br />

identified for each cluster ranged from 8 to<br />

101. WGS exhibited a higher level of resolution<br />

for each cluster as compared to conventional<br />

genotyping methods and identified cases<br />

that were not involved in recent transmission<br />

among the samples analyzed. The preliminary<br />

results indicate that WGS data could result in<br />

more focused targeting of limited public health<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

97


Poster <strong>Abstracts</strong><br />

resources leading to more effective epidemiologic<br />

field investigations and an improved<br />

ability to identify where public health intervention<br />

will have the greatest impact. This curated<br />

data set can be used to evaluate bioinformatic<br />

pipelines for variant detection using whole<br />

genome SNP or multi-locus sequence typing<br />

(wgMLST) platforms.<br />

n 86<br />

RAPID WHOLE GENOME SEQUENCING<br />

AND DE NOVO ASSEMBLY PIPELINE FOR<br />

BORDETELLA PERTUSSIS USING MULTIPLE<br />

PLATFORMS<br />

Y. Peng, M. M. Williams, M. R. Weigand, K.<br />

Bowden, M. L. Tondella;<br />

Centers for Disease Control and Prevention,<br />

Atlanta, GA.<br />

In the U.S. and many other developed countries,<br />

pertussis is currently the least well controlled<br />

vaccine‐preventable bacterial disease<br />

despite excellent vaccination coverage. Whole<br />

genome sequences of Bordetella pertussis, the<br />

causative agent of pertussis disease, will help<br />

us better understand the epidemiologic and<br />

clinical relevance of current circulating strains,<br />

develop novel diagnostic assays, and elucidate<br />

the possible reasons for the current increase<br />

in pertussis in the U.S. and around the world.<br />

However, with hundreds of insertion sequences,<br />

repeat regions, and large rearrangements<br />

in the B. pertussis genome, whole-genome<br />

sequencing and assembly is challenging. There<br />

are currently over 400 B. pertussis incomplete<br />

genome assemblies publically available,<br />

most of them are composed of over hundreds<br />

contigs assembled from short read sequencing.<br />

Our group has developed a rapid whole<br />

genome sequencing and assembly pipeline for<br />

B. pertussis by taking advantage of the latest<br />

long read PacBio RSII sequencing platform;<br />

highly accurate, high coverage and low cost<br />

Illumina sequencing platforms; and the OpGen<br />

genome optical mapping system. High quality<br />

genomic DNA was isolated with the Qiagen<br />

Puregene Yeast/Bact. Kit. Utilizing P6 PacBio<br />

chemistry and 240 minutes movie recording,<br />

one SMRT cell produced over 50 X coverage<br />

subreads with N 50<br />

as high as 13 kb, which was<br />

able to assemble into one complete genome using<br />

HGAP 3.0. After removing the end overlap<br />

to close the circular genome, the structure of<br />

de novo assemblies was further tested and confirmed<br />

by optical mapping and finally polished<br />

by mapping with high quality short Illumina<br />

reads. So far, nearly 200 genomes have been<br />

completed using this multi-platform pipeline<br />

and sequencing of more than 200 additional<br />

isolates is currently underway. Deep genome<br />

variation analysis (SNPs, rearrangements and<br />

IS-elements, etc.) will direct future work with<br />

other ‘omics’ approaches, including RNASeq<br />

and peptide profiling, to determine causes for<br />

pertussis increase.<br />

n 87<br />

DEVELOPMENT OF A SUBTYPING ASSAY<br />

FOR DIRECT DETECTION AND TARGETED<br />

SEQUENCING OF SHIGA TOXIN-PRODUCING<br />

ESCHERICHIA COLI (STEC) FROM CLINICAL<br />

SAMPLES<br />

L. M. Gladney 1 , D. Fasulo 2 , R. L. Lindsey 1 , A.<br />

Huang 1 , E. Trees 1 , N. Strockbine 1 , E. R. Ribot 1 ,<br />

H. A. Carleton 1 , J. Besser 1 ;<br />

1<br />

Centers for Disease Control, Atlanta, GA,<br />

2<br />

Pattern Genomics, LLC, Madison, CT.<br />

Shiga toxin-producing Escherichia coli<br />

(STEC) is an important foodborne pathogen<br />

that causes approximately 265,000 infections<br />

and nearly 30 outbreaks per year in the US.<br />

STEC has been traditionally diagnosed by<br />

culture and subtyped using a variety of methods<br />

to help detect outbreaks. Recently, cultureindependent<br />

diagnostic tests (CIDTs) have<br />

been introduced in many clinical labs in place<br />

of culture-based methods. These methods are<br />

attractive because they are cost-effective, can<br />

be performed at point-of-care, and do not require<br />

culture of the pathogen. As a result, the<br />

current laboratory-based surveillance system<br />

in the US, PulseNet, is threatened because it<br />

relies on pure cultures to perform subtyping by<br />

98<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

pulsed-field gel electrophoresis (PFGE) and<br />

whole genome sequencing (WGS). To address<br />

this issue, we are developing a PCR-based<br />

subtyping assay that may be used directly with<br />

complex samples such as stool. First, we identified<br />

targets that are conserved in, and unique<br />

to STEC utilizing 342 genome sequences (104<br />

STEC, 201 other E.coli and 37 other Enterobacteriaceae).<br />

Using Daydreamer TM software<br />

(Pattern Genomics), we screened for DNA<br />

signatures that are unique to STEC and not<br />

present in any other E.coli. We then filtered<br />

out any signatures that were not specific due<br />

to a blast hit to anything other than STEC on<br />

Genbank. We performed a similar analysis<br />

to find unique signatures for two additional<br />

Escherichia species that may be phenotypically<br />

similar to E.coli and present in stool. We<br />

identified five targets and designed primers<br />

for STEC (primarily the Stx1 and Stx2 genes),<br />

which captures all of the top 7 (100%) and 16<br />

(80%) of top 20 serogroups as well as ten possible<br />

targets for closely related species E. fergusonii<br />

and E. albertii. We performed in silico<br />

PCR on our 342 genomes to confirm the presence<br />

and absence of the targets in our training<br />

dataset and to access the overall successfulness<br />

of the design. Among STEC serotypes included<br />

in our training set, the coverage of our<br />

primers was 98% and 77% for the Stx1 and<br />

Stx2 targets, respectively, while the accuracy<br />

was 100%. The low coverage of the Stx2 target<br />

may be due to fragmented assemblies with<br />

shorter Stx2 sequences in the reference genomes.<br />

Three additional targets capture one or<br />

more of the genomes not accounted for by the<br />

Stx1 or Stx2 target and are 100% accurate, but<br />

do not cover all STEC. The coverage and accuracy<br />

was 100% for the species targets. Next,<br />

we plan to develop targets to subtype lineages<br />

of STEC that we identify in our dataset and<br />

evaluate heterogeneity in adjacent regions to<br />

the STEC targets which may be used to differentiate<br />

strains in a targeted amplicon sequencing<br />

approach. This PCR assay should help<br />

public health labs bridge the gap between isolate<br />

characterization and shotgun metagenomic<br />

sequencing to control foodborne disease.<br />

n 88<br />

THE APPLICATION OF WHOLE<br />

GENOME MULTI-LOCUS SEQUENCE<br />

TYPING TO CHARACTERIZE LISTERIA<br />

MONOCYTOGENES<br />

H. A. Carleton 1 , L. S. Katz 1 , S. Stroika 1 , A.<br />

Sabol 1 , K. Roache 1 , Z. Kucerova 1 , E. M. Ribot<br />

1 , P. Evans 2 , U. Dessai 3 , K. Kubota 4 , H.<br />

Pouseele 5 , J. Besser 1 , C. Tarr 1 , E. Trees 1 , P.<br />

Gerner-Smidt 1 ;<br />

1<br />

Centers for Disease Control, Atlanta, GA,<br />

2<br />

Food and Drug Administration, Center for<br />

Food Safety and Applied Nutrition, College<br />

Park, MD, 3 United States Department of Agriculture,<br />

Food Safety and Inspection Service,<br />

Washington, DC, 4 Association of Public Health<br />

Laboratories, Washington, DC, 5 Applied<br />

Maths, Sint-Martens-Latem, BELGIUM.<br />

Background: The introduction of low cost<br />

rapid next generation sequencers (NGS) is<br />

revolutionizing public health microbiology<br />

because traditional phenotypic and genotypic<br />

characterization methods now can be replaced<br />

by whole genome sequencing (WGS). As part<br />

of the Advanced Molecular Detection (AMD)<br />

initiative at CDC, we focused on Listeria<br />

monocytogenes surveillance and transformation<br />

of PulseNet’s current pulsed-field gel<br />

electrophoresis (PFGE) -based surveillance<br />

into a WGS -based infrastructure for public<br />

health laboratories. In collaboration with FDA<br />

and USDA-FSIS, we developed a L. monocytogenes<br />

whole genome multi-locus sequence<br />

typing database (wgMLST) in BioNumerics<br />

7.5 and tested the utility of this approach in<br />

surveillance. Methods: Since September 2013,<br />

WGS has been performed on over 2000 clinical,<br />

food and environmental L. monocytogenes<br />

isolates. Nextera XT DNA libraries were sequenced<br />

on the Illumina MiSeq platform. After<br />

applying raw read quality controls, sequences<br />

with >20x coverage were uploaded in real-time<br />

to the Sequence Read Archive at NCBI and<br />

further analyzed using wgMLST. The sequences<br />

were cleaned and assembled using SPAdes,<br />

then alleles were identified from raw reads and<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

99


Poster <strong>Abstracts</strong><br />

assembled genomes. To compare performance<br />

of wgMLST and high quality single nucleotide<br />

polymorphisms (hqSNP) methods, a subset<br />

of the isolates was further characterized using<br />

hqSNP (LYVE-SET: github.com/lskatz/<br />

lyve-SET) approach. Results: The wgMLST<br />

database was built using 200 well characterized<br />

annotated reference genomes, representing<br />

plasmid and chromosomal diversity of L.<br />

monocytogenes. Originally, over 5800 unique<br />

loci were identified then further validated by<br />

removing redundant loci and improving locus<br />

definitions so presently there are 4814 loci in<br />

the wgMLST scheme. These loci represent<br />

on average 88.1% of the coding sequences in<br />

the reference data set and have 0.155% redundancy<br />

(2 loci defined in same position on genome).<br />

Number of loci identified per genome<br />

ranged from 2600 to 3100. For isolates identified<br />

as part of a cluster/outbreak, there was<br />

high correlation in the clustering of isolates by<br />

wgMLST and hqSNP analysis for epidemiologically<br />

confirmed outbreaks. Conclusion:<br />

This preliminary analysis suggests that the<br />

current Listeria wgMLST database adequately<br />

captures the diversity of Listeria monocytogenes<br />

that are characterized as part of real-time<br />

surveillance. Additionally, wgMLST can be<br />

used to identify clusters of epidemiologically<br />

related isolates and the wgMLST results match<br />

closely to hqSNP analyses.<br />

n 89<br />

USE OF WHOLE GENOME SEQUENCE<br />

K-MER-BASED SNP ANALYSIS FOR<br />

CLOSTRIDIUM BOTULINUM SURVEILLANCE<br />

IN THE UNITED STATES<br />

J. L. Halpin, J. K. Dykes, B. H. Raphael, S. M.<br />

Maslanka, C. Lúquez;<br />

Centers for Disease Control and Prevention,<br />

Atlanta, GA.<br />

Clostridium botulinum produces botulinum<br />

neurotoxin, an extremely potent toxin that<br />

causes a rare but serious paralytic disease. Efficient<br />

subtyping methods aid in characterization<br />

of C. botulinum isolated during outbreaks,<br />

both in confirming or excluding a suspect<br />

source, linking patients, and distinguishing<br />

case patients from sporadic illnesses. There are<br />

seven serologically distinct BoNTs (designated<br />

serotypes A through G) defined by neutralization<br />

of toxicity by specific polyclonal antibodies.<br />

In the United States, 67% of foodborne<br />

botulism cases reported from 2001 to 2012<br />

and 99% of infant botulism cases were due to<br />

serotypes A and B. We completed whole genome<br />

sequencing on serotype A and B isolates<br />

that were sent to CDC in 2014 and 2015 for<br />

characterization or were isolated at CDC from<br />

specimens (e.g. stool, foods) sent by the state<br />

public health laboratories. After strains were<br />

sequenced with the Ion Torrent PGM platform,<br />

reads were filtered and assembled de novo<br />

(Torrent Suite v4.4.3, SPAdes plugin v4.4.0.1).<br />

Assemblies with >20x average coverage (n =<br />

57) as well as 13 reference strains from Gen-<br />

Bank were compared using the program kSNP<br />

v3.0 (k=23, core SNPs) (http://sourceforge.net/<br />

projects/ksnp/) and the resulting SNP matrix<br />

file was imported into MEGA v6.0 to create a<br />

maximum parsimony tree with bootstrap values<br />

(n=500). Isolates from the same specimen<br />

as well as isolates from different specimens in<br />

a single botulism outbreak (n = 26) grouped together<br />

in the same clade and were phylogenetically<br />

distinct from epidemiologically unrelated<br />

isolates. kSNP can be a useful tool to quickly<br />

analyze small groups of isolates for relatedness<br />

in an outbreak situation, but analysis becomes<br />

cumbersome when large numbers of isolates<br />

are involved due to the required computing<br />

power.<br />

n 90<br />

INTERNATIONAL GENOMICS<br />

COLLABORATION FOR GLOBAL HEALTH<br />

SECURITY<br />

H. Cui, T. Erkkila, P. Chain;<br />

Los Alamos National Laboratory, Los Alamos,<br />

NM.<br />

Genomic science and next generation sequencing<br />

technologies have become a highly desir-<br />

100<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

able area for international collaboration in<br />

support of strengthening global health security<br />

and many other application areas. Los Alamos<br />

National Laboratory is leveraging a long history<br />

of genomics research and technology development<br />

to assist multiple partner nations in<br />

advancing their genomics and bioinformatics<br />

capabilities, focusing on pathogen detection,<br />

characterization, and biosurveillance applications.<br />

Our current partner countries include Republic<br />

of Georgia, Kingdom of Jordan, Kenya,<br />

Yemen, Gabon, Uganda, Peru, Iraq, Egypt,<br />

Liberia, Republic of Korea, and Thailand. Collaborations<br />

with other countries and regions<br />

are also being initiated. Such partnership allows<br />

us to collaborate with host nations and<br />

provide assistance in capability development<br />

to enhance public health surveillance capability<br />

and developing infectious disease detection<br />

and characterization techniques that can be<br />

maintained and further developed by the host<br />

countries. We continue to develop and provide<br />

to the partner countries with next generation<br />

sequencing protocols and bioinformatics pipelines<br />

that enable efficient sample preparation,<br />

instrument operation, and next generation<br />

sequencing data processing and analysis. Such<br />

collaboration efforts will not only benefit the<br />

host countries and regions with state-of-the-art<br />

genomics science methods and technologies,<br />

but also build a trusted international network in<br />

addressing global emerging infectious disease<br />

challenges. Here we detail our initial scientific<br />

collaboration efforts and highlight the research<br />

outcomes.<br />

n 91<br />

MLST + STRAIN TYPING OF MULTIDRUG-<br />

RESISTANT ORGANISMS (MDROS) USING<br />

ACUITAS® WHOLE GENOME SEQUENCE<br />

ANALYSIS<br />

W. Chang, R. Kersey, A. Saeed, V. Sapiro, T.<br />

Walker;<br />

OpGen, Inc., Gaithersburg, MD.<br />

Quick and efficient strain typing of MDRO<br />

clinical isolates is crucial for the prevention<br />

of outbreaks. In this study we developed<br />

MLST+ schemas to strain type closely related<br />

clinical isolates of eight Gram-negative species<br />

with the highest priority in healthcare<br />

facilities, by using Acuitas® Whole Genome<br />

Sequence Analysis (WGS) with next generation<br />

sequencing technology. We selected eight<br />

species of microbes based on their prominence<br />

and clinical relevance: Escherichia coli, Pseudomonas<br />

aeruginosa, Klebsiella pneumonia,<br />

Acinetobacter baumanii, Enterobacter cloacae,<br />

Citrobacter freundii, Klebsiella oxytoca, and<br />

Serratia marcescens. To create a stable MLST+<br />

schema, a reference genome sequence and<br />

enough query genome sequences of that species<br />

are required. Among our eight selected<br />

species, the first four already had reference genomes<br />

suggested by Ridom. For the remaining<br />

four species, we selected reference genome<br />

sequences by following the criteria provided<br />

by Ridom: the candidate reference genome<br />

must be finished, annotated, and accessible;<br />

the reference should ideally be constructed<br />

using Sanger sequencing; the reference isolate<br />

should be available from culture collections<br />

and DNA for sequencing must be available;<br />

and the reference isolate should preferably be<br />

the type strain or another well characterized<br />

strain of the species. Since the schema we developed<br />

will be used for strain typing MDROs,<br />

we always selected the strains from among<br />

human pathogens. The query genome sequences<br />

were drawn from finished genome and<br />

scaffold sequences of each species from The<br />

National Center for Biotechnology Information<br />

(NCBI). However, there were not enough<br />

genome sequences for Enterobacter cloacae,<br />

Citrobacter freundii, Klebsiella oxytoca, and<br />

Serratia marcescens in NCBI to create stable<br />

MLST+ schema; we supplemented those query<br />

sequences with our own assembled WGS to<br />

complete the query genome set. To test these<br />

schemas, Illumina MiSeq data were generated<br />

on a total of 74 clinical isolates of these eight<br />

species, and the whole genome sequences<br />

were then assembled and analyzed. The results<br />

demonstrated that Acuitas Whole Genome<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

101


Poster <strong>Abstracts</strong><br />

Sequence Analysis MLST+ can strain type<br />

these closely related clinical isolates of each<br />

species, evaluate evolutionary relationships<br />

among the isolates, and reveal the possibility<br />

of an outbreak occurrence. In conclusion, we<br />

successfully created MLST+ schemas for eight<br />

species: Escherichia coli, Pseudomonas aeruginosa,<br />

Klebsiella pneumonia, Acinetobacter<br />

baumanii, Enterobacter cloacae, Citrobacter<br />

freundii, Klebsiella oxytoca, and Serratia<br />

marcescens. These schemas successfully strain<br />

typed closely related clinical isolates, demonstrating<br />

the utility of Acuitas Whole Genome<br />

Sequence Analysis for transmission investigations<br />

and outbreak prevention.<br />

n 92<br />

COMPREHENSIVE ANALYSIS OF ANTIBIOTIC<br />

RESISTANCE IN MULTIDRUG-RESISTANT<br />

ORGANISMS (MDROS) BY WHOLE GENOME<br />

SEQUENCING USING ACUITAS® WHOLE<br />

GENOME SEQUENCE ANALYSIS<br />

W. Chang, R. Kersey, A. Saeed, V. Sapiro, T.<br />

Walker;<br />

OpGen, Inc., Gaithersburg, MD.<br />

The timely and efficient determination of the<br />

antibiotic resistance genes in clinical isolates<br />

is crucial for the prevention of outbreaks and<br />

the treatment of patients. In this study, we developed<br />

pipelines to comprehensively analyze<br />

antibiotic resistance genes in carbapenemresistant<br />

Enterobacteriaceae (CREs) and<br />

extended spectrum beta-lactamase (ESBL)<br />

producers using Acuitas® Whole Genome<br />

Sequence Analysis (WGS) with next generation<br />

sequencing (NGS) technology. To be able<br />

to comprehensively determine the resistance<br />

genes in clinical isolates of MDROs, we built<br />

a database consisting of all beta-lactamase<br />

variants with NCBI accession numbers from<br />

Lahey Clinic (http://www.lahey.org/Studies/).<br />

All genes of beta-lactamases are manually<br />

curated for the coding sequences with start<br />

codon and stop codon if those exist. The database<br />

was tested using WGS data assembled<br />

from Illumina MiSeq data generated on eight<br />

species of clinical isolates: Escherichia coli,<br />

Pseudomonas aeruginosa, Klebsiella pneumonia,<br />

Acinetobacter baumanii, Enterobacter<br />

cloacae, Citrobacter freundii, Klebsiella<br />

oxytoca, and Serratia marcescens; all such<br />

isolates were reported to harbor CRE and<br />

ESBL antibiotic resistance genes based on<br />

results of Sanger sequencing technology and<br />

the Acuitas® Resistome Test. Using Acuitas<br />

Whole Genome Sequence Analysis, we resolved<br />

closely related gene variants across the<br />

antibiotic resistance gene families KPC, NDM,<br />

OXA, CTX-M, CMY, TEM, SHV, ACT, IMP,<br />

VIM, DHA, PER, and VEB in these clinical<br />

isolates. For example, WGS resolved single<br />

nucleotide differences between gene variants<br />

KPC-2 and KPC-3, or single nucleotide differences<br />

between NDM-1 and NDM-4. Similarly,<br />

WGS resolved closely related gene variants 2,<br />

3, 14, 15, and 79 of CTX-M. The depth of our<br />

database facilitated our discovery of antibiotic<br />

resistance genes which were not previously reported<br />

for these clinical isolates. Furthermore,<br />

our variant determination isn’t limited to the<br />

contents of the database we developed. If the<br />

homologous sequence of a resistance gene is<br />

identified but not identical to the gene variant<br />

in the database, the coding sequence will be<br />

retrieved and searched against NCBI database<br />

to find the identical gene. If the identical variant<br />

still can’t be found, the gene is reported as<br />

a new variant of that family of beta-lactamases<br />

and then dynamically added to our database.<br />

In conclusion, we have created a database<br />

consisting of all beta-lactamase genes from<br />

Lahey Clinic website. Using the database with<br />

the Acuitas Whole Genome Sequence Analysis<br />

pipeline, we can comprehensively determine<br />

antibiotic resistance in multidrug-resistant<br />

organisms (MDROs), providing tools to help<br />

the prevention of outbreaks and the treatment<br />

of patients.<br />

102<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

n 93<br />

NEXT-GENERATION SEQUENCING (NGS)<br />

TECHNOLOGY TO STUDY VIRAL-HOST<br />

INTERACTIONS<br />

M. S. Montasser;<br />

University of Kuwait, Kuwait, KUWAIT.<br />

Cucumber mosaic virus (CMV) is a pathogen<br />

causing diseases in a wide variety of economically<br />

important vegetable crops worldwide.<br />

One strain of the virus designated CMV-KU1<br />

(patent No. US 8,138,390 B2) was isolated<br />

from tomato plants grown in Kuwait. The viral<br />

genome of this new isolate was found to be<br />

associated with a benign viral satellite RNA<br />

mini-genes. This can be used as a biological<br />

control agent against plant viruses. The whole<br />

gene was sequenced using Next-Generation<br />

Sequencing (NGS) technology. The mini-gene<br />

sequence analysis revealed that the viral satellite<br />

consisted of a single-stranded RNA, ranging<br />

in size from about 350 to 400 nucleotides.<br />

Based on NGS analysis and bioassay experiments,<br />

the nucleotide sequence was found to<br />

be unrelated to the viral genomic RNAs, and<br />

they do not share any significant sequence<br />

homology, and it is not required for viral replication<br />

and spread. However, this viral satellite<br />

RNA is totally dependent on the helper<br />

viral genomic RNAs for its own replication<br />

and spread. This mini-gene was found to be<br />

highly effective against severe necrotic CMV<br />

strains. The severity of this disease was greatly<br />

reduced by the presence of this benign mini<br />

gene, and this is correlated with a reduction<br />

in virion production. This reduction caused<br />

ameliorating disease symptoms in infected<br />

tomato plants infected with severe viral strains.<br />

This will provide a clear picture on a better<br />

understanding of severe viral infections affecting<br />

vegetable crops. In addition, detecting and<br />

identification of viral satellites, mini-genes,<br />

and whole viral genome is rapid and costeffective<br />

at the whole-genome level using new<br />

sequencing technologies.<br />

n 94<br />

VIRUS SURVEILLANCE IN FOOD AND<br />

WATER USING NEXT GENERATION<br />

SEQUENCING<br />

T. AW, S. Wengert, Y. Kim, J. Rose;<br />

Michigan State University, East Lansing, MI.<br />

Background: Contaminated food and water<br />

are important transmission routes of many different<br />

viruses, associated with diseases ranging<br />

from mild gastroenteritis to severe neurological<br />

symptoms. In addition of unreported outbreaks,<br />

there is increasing evidence of unrecognized<br />

waterborne and foodborne illnesses.<br />

There is currently no established method to<br />

fully describe the broad diversity of human<br />

viral pathogens in environmental samples despite<br />

the increasing importance of environmental<br />

viral infections in global public health. The<br />

recent advent of next generation sequencing<br />

technology coupled with metagenomics offers<br />

an opportunity to identify human viral pathogens<br />

in various environmental samples without<br />

a priori knowledge of what viruses may be<br />

present. The main objective of this study was<br />

to evaluate the Illumina sequencing to simultaneously<br />

detect and characterize viral pathogens<br />

in untreated and treated wastewater for reuse<br />

as well as from the food (fresh produce) itself.<br />

Methods: Various wastewater samples were<br />

collected from full-scale wastewater treatment<br />

facilities. Viruses were concentrated from 20<br />

to 100 liters of treated effluent using ultrafiltration<br />

method. Fresh produce items were<br />

collected from a produce distribution center.<br />

Viruses were recovered using Tris-glycine buffer<br />

followed by precipitation with polyethylene<br />

glycol. Viral nucleic acid was extracted and<br />

pooled for Illumina sequencing. Bioinformatics<br />

approaches were used to analyze the viral<br />

metagenomics fingerprints. Results: DNA<br />

sequences collected from wastewaters revealed<br />

viral communities that were dominated by<br />

bacteriophages with the subset of eukaryotic<br />

viruses. Most of the viral sequences (> 70%)<br />

were uncharacterized, indicating the presence<br />

of great viral diversity to be discovered. 15<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

103


Poster <strong>Abstracts</strong><br />

different human pathogens including emerging<br />

viruses such as bocavirus, polyomaviruses,<br />

novel astroviruses (MLB), picobirnavirus,<br />

salivirus A and Aichi virus were identified.<br />

Rotaviruses (A, D, F and G) were identified<br />

in 8 batches of the lettuce sample with high<br />

protein sequence similarity to the GenBank<br />

reference genomes. This shows that the contamination<br />

occurred at some point from the<br />

field through to the distribution center prior to<br />

moving to the store and suggests a potential<br />

risk for foodborne transmission of rotavirus<br />

from lettuce. Conclusions: The application of<br />

metagenomics coupled with next generation<br />

sequencing to describe diversity of the viromes<br />

in food and wastewater systems provided a<br />

broader outlook of the viral composition of<br />

these communities. This approach has the<br />

potential to (i) provide an appropriate tool to<br />

discriminate viruses associated with different<br />

hosts (e.g. human, animal, plant and bacteria)<br />

for microbial source tracking and (ii) identify<br />

etiologic agents associated with waterborne or<br />

foodborne disease outbreak.<br />

n 95<br />

CRIMEAN CONGO HEMORRHAGIC FEVER,<br />

2013 AND 2014 SUDAN<br />

C. Kohl 1 , M. Eldegail 2 , I. Mahmoud 2 , P.<br />

Dabrowski 1 , L. Schuenadel 1 , A. Radonic 1 , A.<br />

Nitsche 1 , A. Osman 2 ;<br />

1<br />

Robert Koch Institute, Berlin, GERMANY,<br />

2<br />

National Public Health Laboratory, Khartoum,<br />

SUDAN.<br />

The German Partnership Program for Excellence<br />

in Biological and Health Security was<br />

launched in 2013 and is funded by the German<br />

Federal Foreign Office. Currently, the program<br />

funds projects in 18 countries in the fields of<br />

infectious disease surveillance, detection &<br />

diagnostics, biosafety & biosecurity, capacity<br />

building and networking. In Sudan one focus<br />

of the partnership is the detection of highly<br />

pathogenic viruses and identification of known<br />

and yet unknown etiological agents in outbreak<br />

situations. In 2014 an outbreak of hemorrhagic<br />

fever in humans was reported from different<br />

states of Sudan (South Darfur, West Kordofan,<br />

South Kordofan). The NPHL investigated the<br />

cases and forwarded 29 sera samples from<br />

patients suffering from hemorrhagic fever<br />

to the RKI. The sample-set included a panel<br />

of 10 sera collected during former hemorrhagic<br />

fever outbreaks in the same region in<br />

2013. All sera were tested with qPCR assays<br />

for Marburg virus, Ebola virus and CCHFV.<br />

Additionally all samples were subjected to<br />

metagenomic deep sequencing on an Illumina<br />

MiSeq sequencer. CCHF was identified by<br />

two independent qPCR assays in a sample<br />

from November 2013 and November 2014,<br />

respectively. Deep sequencing confirmed these<br />

results. Based on the available sequences the<br />

novel CCHFV strain ‘Sudan 2014’ shares 96%<br />

identity (na) with its closest relative CCHFV<br />

SPU 187/90 from South Africa. CCHFV is<br />

reported to be transmitted by ticks in Europe,<br />

Asia and Africa and known as etiological agent<br />

of severe hemorrhagic fever in humans and<br />

livestock. Beside insect-repellent no preventive<br />

measures are available. The pathogenicity<br />

and characteristics of this novel strain have yet<br />

to be determined by cell-culture isolation and<br />

serology. Further molecular analysis will contribute<br />

to clarify the divergence of the CCHFV<br />

strains detected in 2013 and 2014. First results<br />

will be presented.<br />

n 96<br />

TRACKING PLANT VIRUS POPULATION<br />

STRUCTURE CHANGES WITH DEEP<br />

SEQUENCING OF VIRUS DERIVED SMALL<br />

RNAS<br />

D. Kutnjak 1 , M. Rupar 1 , I. Gutierrez-Aguirre 1 ,<br />

T. Curk 2 , J. F. Kreuze 3 , M. Ravnikar 1 ;<br />

1<br />

National Institute of Biology, Ljubljana, SLO-<br />

VENIA, 2 University of Ljubljana, Faculty of<br />

Computer and Information Science, Ljubljana,<br />

SLOVENIA, 3 International Potato Center<br />

(CIP), Lima, PERU.<br />

RNA viruses exist within a host as a cloud of<br />

mutant sequences, often referred to as quasi-<br />

104<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

species. The composition of the mutant cloud<br />

within a host is an important characteristic<br />

of the virus, since it represents a reservoir of<br />

genetic variants, which can be subjected to<br />

different evolutionary processes. High mutation<br />

rate of RNA viruses and their quasispecies<br />

nature drive their quick adaptability and evolution<br />

rate. Thus, new viruses can emerge as a<br />

result of different processes, such as e.g. host<br />

shifts (1). Next generation sequencing technologies,<br />

with their unprecedented depth, enable<br />

through investigation of the viral population<br />

structure within a host. This allows us to detect<br />

relevant mutations in the viral population<br />

even before the emergence of new pathogenic<br />

viruses or viral strains (2). Before being able to<br />

use such a tool to follow the emergence of new<br />

viral variants in a viral mutant cloud, efficient<br />

sample preparation and bioinformatics pipelines<br />

are required. Deep sequencing of virus<br />

derived small interfering RNAs (vsiRNAs; the<br />

derivates of a plant RNA silencing mechanism)<br />

has been used efficiently for the reconstruction<br />

of consensus viral genome sequences from<br />

insects and plants (3). In our research, we are<br />

focused on potato virus Y, an important potato<br />

pathogen. Using this virus as a model, we first<br />

tested if the variation observed in vsiRNAs<br />

reflects the full diversity of viral populations in<br />

plants. Using ultra deep Illumina sequencing,<br />

the diversity of two coexisting Potato virus<br />

Y sequence pools present within a plant was<br />

investigated: RNA isolated from viral particles<br />

and vsiRNAs. Our analysis pipeline included<br />

state of the art bioinformatic software for low<br />

frequency variant detection (LoFreq) and recombination<br />

pattern detection (ViReMa) and<br />

other commonly used NGS analysis tools. We<br />

showed that both sequence pools reflect highly<br />

similar mutational spectrum (4). Currently<br />

we are employing deep sequencing of small<br />

RNAs to track the dynamics of virus adaptation<br />

to several potato cultivars, which differ in<br />

their response to the virus. Particularly, we are<br />

interested if the structure of virus population<br />

is changed in different potato cultivars and if<br />

there are convergent patterns of virus evolution<br />

emerging in the same cultivar. References: 1.<br />

Domingo et al. 2012. Microbiol. Mol. Biol.<br />

Rev. 76, 159-216 2. Kreuze et al. 2009. Virology<br />

388, 1-7 3. Kutnjak et al. 2015. J. Virol.<br />

89, 4761-4769 4. Stapleford et al. 2014. Cell<br />

Host Microbe 15, 706-716<br />

n 97<br />

A SURVEY OF RNA VIRUSES INFECTING<br />

SOLANACEOUS CROPS IN THE PROVINCE<br />

OF ANTIOQUIA (COLOMBIA) USING NGS<br />

H. Jaramillo Mesa 1 , L. Muñoz 1 , J. F. Alzate 2 ,<br />

M. A. Marín Montoya 1 , P. A. Gutiérrez 1 ;<br />

1<br />

Universidad Nacional de Colombia, Medellin,<br />

COLOMBIA, 2 Universidad de Antioquia, Medellin,<br />

COLOMBIA.<br />

Solanaceae is a family of flowering plants of<br />

great economic importance in the food and<br />

pharmaceutical industries. Some important<br />

members of this family include potato (Solanum<br />

tuberosum and S. phureja), eggplant (S.<br />

melongena), tomato (S. lycopersicum), bell<br />

pepper (Capsicum annuum), tobacco (Nicotiana<br />

tabacum) and many garden ornamentals<br />

such as Angel´s trumpet (Brugmansia candida).<br />

The Andean region of South America is<br />

considered to be the center of diversity of several<br />

of these species and, as such, it is expected<br />

to have a large diversity of viruses. In this<br />

work we investigate the presence of viruses<br />

infecting potato, cape gooseberry (Physalis<br />

peruviana), tamarillo (Solanum betaceum), bell<br />

pepper, tomato and Angel´s trumpet in Antioquia<br />

(Colombia) using NGS. From the analysis<br />

of 14 transcriptomes evidence was found for<br />

the presence of six genera of RNA virus infecting<br />

these plants: Potexvirus (Alphaflexiviridae),<br />

Carlavirus (Betaflexiviridae), Tospovirus<br />

(Bunyaviridae), Crinivirus (Closteroviridae),<br />

Potyvirus (Potyviridae) and Polerovirus (Luteoviridae).<br />

Potato virus X (PVX, Potexvirus)<br />

was detected at very high levels in P. peruviana<br />

(8.9% of reads) and in lower amounts in<br />

S. lycopersicum (0.74%). Potyviruses were<br />

detected in S. betaceum (Tamarillo leaf malformation,<br />

TaLMV; 0.57%), S. lycopersicum (Potato<br />

virus Y, PVY; 0.031%), S. phureja (Potato<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

105


Poster <strong>Abstracts</strong><br />

virus V, PVV; 1.15%) and S. tuberosum (PVY,<br />

1.33%). Potato yellow vein virus (PYVV)<br />

was detected in S. lycopersicum (0.033%), S.<br />

phureja (1.90%) and S. tuberosum (0.26%).<br />

Potato leaf roll virus (PLRV), was found infecting<br />

B. candida (0.01%), C. annuum (0.001<br />

%), P. peruviana (0.023%) and S. tuberosum<br />

(0.029%). A tospovirus closely related to Alstroemeria<br />

necrotic streak virus (ANSV) was<br />

found in C. annuum only (0.043%) and seems<br />

to be the most important virus affecting this<br />

crop in Antioquia. Finally, the virus with the<br />

least relative abundance was PVS, found in the<br />

transcriptomes of S. lycopersicum (0.003%)<br />

and S. phureja (0.018%). All these viruses<br />

were detected and assembled using in-house<br />

Perl scripts and their phylogenetic relationships<br />

with related species are discussed. The<br />

information derived from this study was used<br />

to design specific molecular diagnostic tests<br />

using RT-PCR and RT-qPCR as a tool that will<br />

support seed certification and virus management<br />

programs in Colombia. This work was<br />

funded by Universidad Nacional de Colombia<br />

(Grants VRI: 28616, 26737) and International<br />

Foundation for Science (Sweden, Grant:<br />

C/4634-2).<br />

n 98<br />

DECIPHERING VIRAL RNA GENOMES IN<br />

METAGENOMIC DATA FROM A WIDE RANGE<br />

OF CLINICAL SAMPLES<br />

T. Ng, A. Montmayeur, L. Magana, E. Ramos,<br />

J. Vinje, P. Rota, S. Oberste;<br />

Centers for Disease Control and Prevention,<br />

Atlanta, GA.<br />

A significant component of laboratory-based<br />

surveillance for viral diseases is genetic characterization<br />

of pathogens by sequencing. Many<br />

RNA viruses are genetically highly diverse,<br />

even within a single serotype, therefore present<br />

a challenge in transforming the raw data<br />

into useful viral genomes. Furthermore, coinfections<br />

of multiple viral pathogens often<br />

occur in clinical samples, necessitating both<br />

laboratory procedures and a bioinformatic<br />

pipeline capable of simultaneous detection of<br />

different pathogen genomes. Here we demonstrate<br />

the laboratory procedures and bioinformatic<br />

analysis needed to routinely monitor<br />

viral pathogens from a wide range of samples,<br />

as our laboratories handle over 500 samples<br />

and generated 500M NGS raw reads annually.<br />

Initially, we hypothesized that NGS reads can<br />

be mapped to a list of reference genomes using<br />

a simple reference assembly algorithm. Our<br />

results indicated reference assembly using a<br />

list of pre-selected references does not consistently<br />

generate full genomes, especially for<br />

the highly divergent regions To analyze these<br />

complex clinical samples, the NGS data were<br />

analyzed with the metagenomic approach. Our<br />

pipeline analyzed the NGS data using de novo<br />

assembly, as well as BLAST analysis comparing<br />

against the entire GenBank viral genome<br />

database. To select an appropriate assembler,<br />

we compared the output of different de novo<br />

algorithms, including de bruijn assemblers,<br />

overlap layout consensus assemblers, and<br />

combinations of different assemblers. The<br />

combination approach showed significant<br />

improvement over single assemblers such as<br />

SOAPdenovo and ABySS, while the SPAdes<br />

assembler produced similar results in less time.<br />

We found, however, the adaptor and primer<br />

need to be trimmed perfectly first for any assembler<br />

to work well. Under-trimming or<br />

over-trimming produce gaps or misalignments<br />

that will significantly affect downstream de<br />

novo assembly. We’ll discuss the issues and<br />

solutions for analyzing datasets with multiple<br />

pathogens, low sequence coverage, sequence<br />

misalignment, and cross-sample / cross-run<br />

contamination. Manual inspection of pipeline<br />

output, and communication between bioinformaticians<br />

and laboratorians are the key to obtaining<br />

accurate output across different sample<br />

types and genome types. A graphic interface<br />

to view the NGS output has allowed more<br />

scientists without any programming skills to<br />

analyze the data. Our strategies to monitor<br />

viral pathogens such as poliovirus, measles<br />

virus and norovirus are applicable to the epidemiologic<br />

investigation of other emerging<br />

106<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

pathogens. We’ll also discuss how to use NGS<br />

to investigate emerging pathogens, exemplified<br />

by our recent findings of novel astrovirus,<br />

picornavirus and rotavirus.<br />

n 99<br />

METHODS, APPLICATIONS AND<br />

EXPERIENCES WITH NEXT GENERATION<br />

WHOLE GENOME SEQUENCING IN PUBLIC<br />

HEALTH VIROLOGY<br />

D. M. Lamson 1 , J. Laplante 1 , J. McGinnis 1 , M.<br />

Shudt 1 , A. Kajon 2 , K. St. George 1 ;<br />

1<br />

Wadsworth Center, Albany, NY, 2 Lovelace Respiratory<br />

Research Institute, Albuquerque, NM.<br />

Potential uses for Next-Generation/Whole<br />

Generation Sequencing (NG/WGS) in public<br />

health virology include improved characterization<br />

for molecular epidemiology, viral evolution<br />

studies and genome-wide surveillance.<br />

The Wadsworth Center Virology Laboratory<br />

has initiated NG/WGS for several applications,<br />

initially focusing on adenoviruses (HAdV),<br />

influenza (InfA) and enterovirus D68 (EV-<br />

D68). Challenges include the separation of<br />

viral nucleic acid (NA) from high titer host<br />

NA in both primary samples and cultured<br />

isolates. For HAdV isolates cultured in A-549<br />

cells, high salt precipitation was used to preferentially<br />

remove host cell DNA prior to viral<br />

NA extraction. For InfA and EV-D68, primary<br />

samples or isolates were extracted, then<br />

RT-PCR amplified for either all 8 genomic<br />

fragments (InfA) or near whole genome (EV-<br />

D68). NA concentrations in extracted samples<br />

were determined using the standard Nextera<br />

protocol. Libraries were quantified, average<br />

fragment size determined and paired-end sequencing<br />

was performed on an Illumina MiSeq<br />

with MiSeq 500 cycle v2 kit. Sequences were<br />

de novo-assembled with SPAdes and remapped<br />

using Geneious Pro. Over 100 HAdV samples<br />

from each of the 6 species have been processed<br />

as well as 12 adeno-associated virus samples<br />

that co-purified with HAdV. Applications have<br />

included the investigation of multiple epidemic<br />

keratoconjuctivitis outbreaks, analysis of 47<br />

HAdV3 samples geographically and temporally<br />

distributed over 20 years, genetic characterization<br />

of intertypic recombinants, and<br />

the investigation of HAdV4 and 14 genomic<br />

variants in college infections. The analysis<br />

of 21 EV-D68 strains revealed from 1 to 104<br />

amino acid changes over the entire coding<br />

region during the 2014 outbreak, with greatest<br />

variability observed in the capsid gene.<br />

Fifteen influenza A/H1pdm09 specimens and<br />

44 A/H3N2 specimens from 2009-15 have<br />

been analyzed. Mutations were detected in all<br />

8 segments of both A/H1 and A/H3 viruses,<br />

with highest rates in the hemagglutinin and<br />

neuraminidase genes in both subtypes. Phylogenetic<br />

analysis showed all A/H1 specimens to<br />

cluster with the current vaccine strain, whereas<br />

recent A/H3 viruses were more similar to A/<br />

Switzerland/9715293/2013 than to the current<br />

vaccine strain. As technologies improve and<br />

processing time is reduced, NGS applications<br />

will become more routine. The potential for<br />

clarifying nomenclature, investigating viral<br />

evolution and performing more comprehensive<br />

surveillance is vast. By developing techniques<br />

for applications on primary samples and the<br />

implementation of more streamlined data analysis,<br />

better diagnostics and the resolution of<br />

more outbreak investigations will be obtained.<br />

Additionally, more extensive viral surveillance<br />

will enhance the potential for advance warning<br />

of the emergence of drug-resistant and novel<br />

strains with pandemic potential.<br />

n 100<br />

ASSEMBLING WHOLE GENOMES FROM<br />

MIXED MICROBIAL COMMUNITIES USING<br />

HI-C<br />

I. Liachko 1 , J. N. Burton 1 , L. Sycuro 2 , A. H.<br />

Wiser 2 , D. N. Fredricks 2 , M. J. Dunham 1 , J.<br />

Shendure 1 ;<br />

1<br />

University of Washington, Seattle, WA, 2 Fred<br />

Hutchinson Cancer Research Center, Seattle,<br />

WA.<br />

Assembly of whole genomes from next-generation<br />

sequencing is inhibited by the lack of<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

107


Poster <strong>Abstracts</strong><br />

contiguity information in short-read sequencing.<br />

This limitation also impedes metagenome<br />

assembly, since one cannot tell which sequences<br />

originate from the same species within<br />

a population. We have overcome these bottlenecks<br />

by adapting a chromosome conformation<br />

capture technique (Hi-C) for the deconvolution<br />

of metagenomes and the scaffolding of de novo<br />

assemblies of individual genomes. In modeling<br />

the 3D structure of a genome, chromosome<br />

conformation capture techniques such as Hi-C<br />

are used to measure long-range interactions of<br />

DNA molecules in physical space. These tools<br />

employ crosslinking of chromatin in intact<br />

cells followed by intra-molecular ligation,<br />

joining DNA fragments that were physically<br />

nearby at the time of crosslink. Subsequent<br />

deep sequencing of these DNA junctions generates<br />

a genome-wide contact probability map<br />

that allows the 3D modeling of genomic conformation<br />

within a cell. The strong enrichment<br />

in Hi-C signal between genetically neighboring<br />

loci allows the scaffolding of entire chromosomes<br />

from fragmented draft assemblies.<br />

Hi-C signal also preserves the cellular origin<br />

of each DNA fragment and its interacting partner,<br />

allowing for deconvolution and assembly<br />

of multi-chromosome genomes from a mixed<br />

population of organisms. We have used Hi-C<br />

to scaffold whole genomes of animals, plants,<br />

fungi, as well as prokaryotes and archaea. We<br />

have also been able to use this data to annotate<br />

functional features of microbial genomes, such<br />

as centromeres in many fungal species. Additionally,<br />

we have applied our technology to<br />

diverse metagenomic populations such as craft<br />

beer, bacterial vaginosis infections, soil, and<br />

tree endophyte samples to discover and assemble<br />

the genomes of novel strains of known<br />

species as well as novel prokaryotes and<br />

eukaryotes. The high quality of Hi-C-based<br />

assemblies allows the simultaneous closing of<br />

numerous unculturable genomes, placement of<br />

plasmids within host genomes, and microbial<br />

strain deconvolution in a way not possible<br />

with other methods. Reference: Burton JN*,<br />

Liachko I*, Dunham MJ, Shendure J. Specieslevel<br />

deconvolution of metagenome assemblies<br />

with Hi-C-based contact probability maps. G3.<br />

2014, May 22;4(7):1339-46.<br />

n 101<br />

AN AUTOMATED WGS PIPELINE FOR<br />

ANALYSIS OF BACTERIAL GENOMES<br />

M. Thomsen 1 , H. Hasman 2,3 , A. Petersen 3 , R.<br />

Skov 3 , O. Lund 1 , A.R. Larsen 3 and F.M. Aarestrup<br />

2 ;<br />

1<br />

Danish Technical University – Systems Biology,<br />

CBS, Lyngby, Denmark, 2 , Danish Technical<br />

University – National Food Institute,<br />

Lyngby, Denmark, 3 Statens Serum Institut,<br />

Copenhagen, Denmark.<br />

Background New approaches within diagnostics<br />

and surveillance for species identification,<br />

clonal clustering, and identification<br />

of resistance genes are based whole genome<br />

sequencing (WGS). The Centre for Genomic<br />

Epidemiology (CGE) has been developing<br />

stand-alone web-tools for handling WGS information<br />

for outbreak investigation, epidemiological<br />

surveillance and diagnostics.<br />

Material and methods Based on previously<br />

published web-based CGE tools we developed<br />

a pipeline for automatic analysis of WGS data<br />

from bacterial isolate samples https://cge.cbs.<br />

dtu.dk/services/CGEpipeline-2.0/). The bacterium<br />

analysis pipeline (BAP) automatically<br />

identifies the bacterial species and identify<br />

the multilocus sequence type (MLST), spa<br />

type (S. aureus only), serotype (E. coli only)<br />

and antimicrobial resistance genes. To test the<br />

BAP, a set of 101 S. aureus strains originating<br />

from bacterial infections at Danish hospitals<br />

submitted to Statens Serum Institute (SSI) for<br />

genotypic by traditional Sanger sequencing<br />

were subjected to WGS on an Illumina HiSeq<br />

to a minimum coverage of 30x and assembled<br />

by Velvet prior to being uploaded to the<br />

BAP. Draft genomes including a mandatory<br />

metadata spread sheet containing sequenc-<br />

108<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

ing information as well as epidemiological<br />

data was submitted to the BAP through the<br />

Batch Upload webpage (https://cge.cbs.dtu.dk/<br />

services/CGEpipeline-2.0/) and automatically<br />

analysed by kmerFinder to verify the bacterial<br />

species and by ResFinder to identify antimicrobial<br />

resistance genes. When the bacterial<br />

species was correctly identified as S. aureus,<br />

MLST and spa typing were performed automatically<br />

and stored in the Isolate Overview<br />

page to be extracted as an Excel sheet for<br />

further analysis. Results and discussion The<br />

bacterial species was correctly identified as S.<br />

aureus for all isolates. Therefore, MLST and<br />

spa typing was performed automatically and<br />

could be compared to the previous data for the<br />

strains. An MLST type was assigned for 99%<br />

of the genomes and good concordance was<br />

found between the clonal complexes inferred<br />

from spa types and WGS analysis through the<br />

pipeline. For spa typing, 92 of the genomes<br />

showed the same spa type as found by Sanger<br />

sequencing previously. Seven genomes showed<br />

a different spa type and 2 genomes failed to<br />

give a spa type in the pipeline. This was most<br />

like due to the small size reads (2*100 bp)<br />

obtained from the HiSeq sequencer, which is<br />

not optimal for assembly of highly repetitive<br />

DNA sequences such as the variable region of<br />

the spa gene. ResFinder successfully identified<br />

the mecA gene in all MRSA genomes and<br />

not in any of the MSSA genomes. Conclusion<br />

A combined bioinformatics platform<br />

was developed and made publicly available,<br />

providing easy-to-use automated analysis of<br />

bacterial whole genome sequencing data. The<br />

platform may be of immediate relevance for<br />

investigators using whole genome sequencing<br />

for clinical diagnostic and surveillance.<br />

n 102<br />

ON THE EDGE: ROBUST GENERALIZED<br />

BIOINFORMATICS FOR NEXT-GEN<br />

SEQUENCING NOVICES<br />

Chien-Chi Lo 1 , Po-E Li 1 , Joseph Anderson 2,4 ,<br />

Karen W. Davenport 1 , Kimberly A. Bishop-<br />

Lilly 3,4 , Yan Xu 1 , Sanaa Ahmed 1 , Shihai Feng 1 ,<br />

Tracey Allen K. Freitas 1 , Vishwesh P. Mokashi<br />

4 , and Patrick Chain 1<br />

1<br />

Los Alamos National Laboratory, 2 Defense<br />

Threat Reduction Agency, 3 Henry M. Jackson<br />

Foundation, 4 Naval Medical Research Center<br />

- Frederick<br />

With the continuing evolution of sequencing<br />

platforms and technologies, the so-called democratization<br />

of sequencing is in fact already<br />

in full swing. Despite the inherent challenges<br />

involved, moving sequencing technology into<br />

the field and closer to the source of diverse and<br />

interesting biological samples, is an attractive<br />

idea to many agencies. This even extends to<br />

OCONUS laboratories that are being equipped<br />

with next generation sequencing (NGS) platforms<br />

to complement more traditional molecular,<br />

cell, and microbiology methods for infectious<br />

disease research. However, many groups<br />

new to NGS and/or laboratories in remote or<br />

austere locations may be ill-equipped to handle<br />

the bioinformatic requirements associated with<br />

rapid production of massive, complex datasets<br />

from sequencing clinical or environmental<br />

samples. A collaborative effort, named EDGE<br />

bioinformatics, has begun to research, prototype,<br />

and program bioinformatic pipelines<br />

that can be deployed to new NGS laboratories<br />

in order to enable successful adoption of<br />

sequencing technologies by allowing robust<br />

processing of NGS data. These pipelines were<br />

initially developed with specific use cases and<br />

sample types in mind, although the modular<br />

design provides utility beyond these initial<br />

use cases. The pipelines are being designed<br />

to make use of the most common file formats<br />

and run in a linux environment on relatively<br />

inexpensive hardware so that barriers to adoption<br />

are minimal. A pilot program to install and<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

109


Poster <strong>Abstracts</strong><br />

utilize EDGE at an OCONUS DoD facility has<br />

already taken place, with successful processing<br />

of locally-generated data using a recently<br />

locally-installed MiSeq, and demonstrated<br />

reachback capability using CONUS support.<br />

n 103<br />

SUPERPHY: PREDICTIVE GENOMICS FOR<br />

THE PATHOGEN ESCHERICHIA COLI<br />

M. D. Whiteside 1 , C. R. Laing 1 , A. Manji 1 , J.<br />

Masih 1 , P. Kruczkiewicz 1 , E. N. Taboada 1 , V. P.<br />

J. Gannon 1<br />

1<br />

CANADA - Laboratory for Foodborne Zoonoses,<br />

Public Health Agency of Canada<br />

Introduction: Predictive genomics is the<br />

translation of raw genome sequence data into<br />

an assessment of the phenotypes exhibited by<br />

the organism. For bacterial pathogens, these<br />

phenotypes can range from environmental<br />

survivability, to the severity of human disease<br />

associated with them. Significant progress has<br />

been made in the development of generic tools<br />

for genomic analyses that are broadly applicable<br />

to all microorganisms; however, a fundamental<br />

missing component is the ability to<br />

analyze genomic data in the context of organism-specific<br />

phenotypic knowledge, which has<br />

been accumulated from decades of research<br />

and can provide a meaningful interpretation<br />

of genome sequence data. Implementation:<br />

In this study, we present SuperPhy, an online<br />

predictive genomics platform (http://lfz.corefacility.ca/superphy/)<br />

for Escherichia coli.<br />

The platform integrates the analyses tools and<br />

genome sequence data for all publicly available<br />

E. coli genomes and facilitates the upload<br />

of new genome sequences from users under<br />

public or private settings. SuperPhy provides<br />

real-time analyses of thousands of genome<br />

sequences with results that are understandable<br />

and useful to a wide community, including<br />

those in the fields of clinical medicine, epidemiology,<br />

ecology, and evolution. SuperPhy<br />

includes identification of: 1) virulence and<br />

antimicrobial resistance determinants 2) statistical<br />

associations between genotypes, biomarkers,<br />

geospatial distribution, host, source,<br />

and phylogenetic clade; 3) the identification<br />

of biomarkers for groups of genomes on the<br />

based presence / absence of specific genomic<br />

regions and single-nucleotide polymorphisms<br />

and 4) in silico Shiga-toxin subtype. Conclusions:<br />

SuperPhy is a predictive genomics platform<br />

that attempts to provide an essential link<br />

between the vast amounts of genome information<br />

currently being generated and phenotypic<br />

knowledge in an organism-specific context.<br />

n 104<br />

MOLECULAR SURVEILLANCE OF A.<br />

BAUMANNII IN A REGIONAL WASTE<br />

STABILISATION POND IN AUSTRALIA<br />

1, 2<br />

Maxim Sheludchenko, 2 Mohammad Katouli<br />

and 1 Helen Stratton<br />

1<br />

Smart Water Research Centre, Griffith University,<br />

Southport, Queensland and 2 Genecology<br />

Research Centre, University of the Sunshine<br />

Coast, Sippy Downs, Queensland, Australia<br />

A.baumannii is an increasingly important<br />

pathogen responsible for many nosocomial<br />

infections. Survival of these bacteria in the<br />

environment, especially in waste stabilization<br />

ponds (WSPs) have not been investigated before<br />

although some reports indicate isolation<br />

of these A. baumannii from filaments of active<br />

sludge of wastewaters and paleosol contaminated<br />

by waste leakage. Identification of A.<br />

baumannii from the environmental is normally<br />

done by growing samples on selective culture<br />

media but this requires the use of molecular<br />

techniques for confirmation. Furthermore,<br />

the ability of these bacteria to grow on some<br />

selective media may also cause misinterpretations<br />

of the data. In this study we report such<br />

an event and show that the use of molecular<br />

techniques especially 16S rRNA sequencing<br />

helped identification and surveillance of these<br />

bacteria in a WSP. Between October 2013 and<br />

September 2014 we surveyed the die off of a<br />

number of pathogens in the maturation pond of<br />

a WSP in a regional area of Australia. Samples<br />

cultivated on mCCDA agar containing selec-<br />

110<br />

ASM Conferences


Poster <strong>Abstracts</strong><br />

tive and recommended supplements for growth<br />

of Campylobacter, yielded a high number of<br />

colonies with characteristic of Campylobacter.<br />

However, subsequent PCR tests using speciesspecific<br />

primers proved that these bacteria<br />

were not Campylobacter. Using 16s rRNA, followed<br />

by additional confirmatory tests such as<br />

VS1, ompA, bla OXA-51-like<br />

, bla OXA-23-like<br />

and Class<br />

1 integrase gene confirmed that these colonies<br />

were in fact A. baumannii. Using this methodology<br />

an intensive sampling was carried out<br />

from the inlet and the outlet of this maturation<br />

pond and tested for A.baumannii. The results<br />

indicated the number of these bacteria in most<br />

sampling rounds did not differ significantly<br />

between the inlet and outlet of the maturation<br />

pond suggesting the survival of these bacteria<br />

in such waste treatment system. The implication<br />

of genomic technics for identification of<br />

bacteria in relation to public health studies will<br />

be discussed.<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

111


Index<br />

Bold number indicates presenter.<br />

Aanensen, D. M. S7:5<br />

Aarestrup, F. M. 41<br />

Aarestrup, F. M. 46<br />

Aarestrup, F. M. 78<br />

Aarestrup, F. M. S10:6<br />

Abrams, A. 70<br />

Achtman, M. S7:8<br />

Adam, J. 7, 8, S7:10<br />

Adams, M. D. S9:4<br />

Addy, N. 50<br />

Agarwala, R. S3:3<br />

Akange, N. 47<br />

Albayrak, L. 27<br />

Ali, F. 45<br />

Alikhan, N.-F. S7:8<br />

Allard, M. W. 39<br />

Allard, M. W. 54, S3:5<br />

Al-Shahib, A. S7:9<br />

Alyanak, E. 64<br />

Alzate, J. F. 97<br />

Anand, M. 74<br />

Ancora, M. 40<br />

Andersen, V. D. 46<br />

Anderson, M. S9:3<br />

Andrusch, A. 9<br />

Appalla, L. 13, 14, 30,<br />

S9:5<br />

Argimon, S. S7:5<br />

Ashton, P.<br />

83, S2:5,<br />

S7:9<br />

Au-Young, J. 29<br />

Avillan, J. J. 64<br />

AW, T. 94<br />

Ayers, S. L. 54<br />

Baert, L. S2:2<br />

Baker, D. J. 74<br />

Baker, D. J. 79<br />

Barrette, R. W. S2:3<br />

Barretto, C. S2:2<br />

Baugher, J. S4:6<br />

Beatson, S. A. 16<br />

Beiko, R. G. 8, S7:10<br />

Bell, R. L. 39<br />

Bennett, C. 72<br />

Ben Zakour, N. L. 16<br />

Bergmark, L. S10:6<br />

Berry, C. S7:10<br />

Besser, J. 87, 88<br />

Besser, T. 44<br />

Blais, B. W. 55<br />

Blosser, S. J. 12<br />

Boisvert, S. S4:5<br />

Bonomo, R. A. S9:4<br />

Bonten, M. 33<br />

Bopp, D. 74<br />

Bowden, K. E. 82<br />

Bowden, K. 86<br />

Bradshaw, J. 15<br />

Brettin, T. S4:5<br />

Brinkman, F. 7, S7:10<br />

Brinkman, F. S. 8<br />

Bristow, F. 7, 8, S7:10<br />

Bronstein, P. S3:3<br />

Brouwer, E. 33<br />

Brown, E. 39<br />

Brown, E. W. 54, S3:5<br />

Bulach, D. M. S7:7<br />

Burrows, E. 39<br />

Burton, J. N. 100, S4:7<br />

Busch, J. 57<br />

Cabral, J. S7:10<br />

Calistri, P. 40<br />

Cammà, C. 40<br />

Camp, P. 36<br />

Campbell, E. M. 12<br />

Carleton, H. 71, 72<br />

Carleton, H. 65, 84,<br />

S3:3, S3:6<br />

Carleton, H. A. 87, 88<br />

Carleton-Romer,<br />

H. A. 81<br />

Carrico, J. A. 7, 8<br />

Carriço, J. A. S7:10<br />

Carrillo, C. D. 55<br />

Carroll, L. 44<br />

Catalyurek, U. V. S3:5<br />

Catanzaro, A. S9:6<br />

Catanzaro, D. S9:6<br />

Chain, P. 90<br />

Chang, W. 34, 91, 92<br />

Chapman, E. L. 12<br />

Chase, H. 50<br />

Chase, K. 59<br />

Chen, Y. 72<br />

Chilton, C. S2:2<br />

Christopher-<br />

Hennings, J. 47, 52<br />

Chumakov, S. M. 27<br />

Chung, H. S9:3<br />

Chung, T. 50<br />

Clifford, R. 13, 14,<br />

S9:5<br />

Cohen, T. S9:6<br />

Colman, R. 57<br />

Colman, R. E. S9:6<br />

Comandatore, F. 32<br />

Conrad, C. 12<br />

Conrad, N. R. S4:5<br />

Cooper, A. 1<br />

Corander, J. 37<br />

Courtot, M. 7, 8<br />

Courtot, M. S7:10<br />

Cowan, L. 85<br />

Croughs, P. D. 24<br />

Crudu, V. S9:6<br />

Cui, H. 90<br />

Curk, T. 96<br />

Currie, B. 57<br />

Dabrowski, P. S10:4<br />

Dabrowski, P. 10, 21, 9,<br />

95<br />

Dallman, T.<br />

83, S2:5,<br />

S7:9<br />

Daquigan, N. 43, 48<br />

Das, S. 50, 52<br />

David, S. S3:3<br />

Davis, J. J. S4:5<br />

Davis, M. 44<br />

Davis, S. 2, S4:6<br />

Deatherage<br />

Kaiser, B. S4:4<br />

de Boer, R. F. 24<br />

De Bruyne, K. 59, S7:3<br />

Defibaugh-<br />

Chavez, S. S3:3<br />

Delisle, J. 57<br />

De Massis, F. 40<br />

den Bakker, H. 44<br />

112 ASM Conferences


Deng, X. S4:4<br />

de Pinna, E. 83, S2:5<br />

Dermott, P. F. 54<br />

Dessai, U. 88<br />

Deurenberg, R. H. 22<br />

Dhand, A. 20<br />

Dhere, T. 61<br />

Dhillon, B. 7, 8<br />

Dickinson, M. 74<br />

Di Giannatale, E. 40<br />

Dimitrova, N. 18, 19, 20,<br />

28<br />

Dinsmore, B. 71<br />

Dinsmore, B. S4:4<br />

Disz, T. S4:5<br />

Dooley, D. 7, 8, S7:10<br />

Drees, K. 40<br />

Dunham, M. J. 100, S4:7<br />

Dunn, S. 1<br />

Duwve, J. W. 12<br />

Dykes, J. K. 89<br />

Edirisinghe, J. S4:5<br />

Edwards, R. A. S4:5<br />

Einer-Jensen, K. 6, S7:4<br />

Eldegail, M. 95<br />

Elson, R. S2:5<br />

Engelthaler, D. 57<br />

Engelthaler, D. M. S9:6<br />

Eppinger, M. 11<br />

Erkkila, T. 90<br />

Escuyer, V. 25<br />

Evans, P. 88<br />

Evans, P. S. S3:5<br />

Ezeoke, I. S3:4<br />

Fallon, J. 20<br />

Fallon, J. T. 18, 19, 28<br />

Fasulo, D. 87<br />

Federhen, S. S6:4<br />

Fedosejev, A. S7:5<br />

Felton, A. 29<br />

Ferdous, M. 23, 24<br />

Ferreira, C. 39<br />

Fields, P. 71<br />

Fields, P. S4:4<br />

Fitzgerald, C. 71, 72<br />

Fitzgerald, C. S4:4<br />

Fitzpatrick, M. A. 31<br />

Flett, K. B. S9:3<br />

Fofanov, Y. 27<br />

Foley, A. 47, 52<br />

Foster, J. T. 40<br />

Fournier, C. S2:2<br />

Fredricks, D. N. 100, S4:7<br />

Friedrich, A. W. 22<br />

Friedrich, A. W. 23, 24<br />

Friesema, I. H. 24<br />

Fu, S. 67<br />

Galac, M. R. S3:4<br />

Galang, R. R. 12<br />

Ganova-Raeva, L. 12<br />

Garcia, D. S7:5<br />

Garcia-Toledo, L. 71<br />

Garcia-Toledo, L. 65<br />

Garofolo, G. 40<br />

Gauthier, M. 55<br />

Gehr, E. 80<br />

Gentry, J. 12<br />

Gerner-Smidt, P. 71, 72<br />

Gerner-Smidt, P. 65, 88,<br />

S3:6<br />

Gibbons, H. S. S3:4<br />

Gillece, J. 57<br />

Gimonet, J. S2:2<br />

Gladney, L. 71, 72<br />

Gladney, L. 65<br />

Gladney, L. M. 87<br />

Glaser, A. 50<br />

Glasner, C. S7:5<br />

Goater, R. S7:5<br />

Golovko, G. 27<br />

Gonzalez-<br />

Escalona, N. 54<br />

Gopinath, G. 50<br />

Graham, M. 7, 8, S7:10<br />

Graham, R. 75<br />

Grant, K. 83, S2:5<br />

Grau, F. R. S2:3<br />

Green, K. Y. S10:3<br />

Griffiths, E. 8, S7:10<br />

Griffiths, E. J. 7<br />

Grim, C. 48<br />

Grim, C. J. 43<br />

Griswold, T. 65<br />

Grundmann, H. 22<br />

Gulvik, C. A. 64<br />

Guo, X. 58<br />

Gupta, A. 18, 20<br />

Gutiérrez, P. A. 17, 97<br />

Gutierrez-<br />

Aguirre, I. 96<br />

Index<br />

Gymoese, P. S1:3<br />

Habibi, N. S6:3<br />

Haley, B. J. 49<br />

Hall, C. 57<br />

Halpin, J. L. 89<br />

Halse, T. 25, 79<br />

Hanes, D. 48, 50<br />

Hanes, D. E. 43<br />

Hänninen, M.-L. 37<br />

Hauser, A. R. 31<br />

Haydek, J. P. 61<br />

Heaton, H. 57<br />

Hendriksen, R. S. 41, S10:6<br />

Heneine, W. 12<br />

Henry, C. S4:5<br />

Herrick, J. 80<br />

Highlander, S. 35<br />

Hillman, D. 12<br />

Hinkle, M. 13, 14<br />

Hinkle, M. K. S9:5<br />

Hjelmsø, M. H. S10:6<br />

Hoffmann, M. 54, S3:5<br />

Holler, L. 52<br />

Howden, B. P. S7:7<br />

Hsiao, W. 7, 8, S7:10<br />

Huang, A. 71, 72<br />

Huang, A. 87<br />

Huang, A. D. 73, 81<br />

Huang, W. 18, 19, 28<br />

Hueftle, Y. 57<br />

Im, S. 71<br />

Im, S. 65<br />

Im, S. B. 81<br />

Isaac-Renton, J. 8<br />

Ismail, A. 5<br />

Iwamoto, T. 3<br />

Jang, G. 38<br />

Janies, D. S3:5<br />

Janies, D. A. S3:4<br />

Janssens, K. S7:3<br />

Jaramillo Mesa, H. 17, 97<br />

Jarvis, K. 48<br />

Jarvis, K. G. 43<br />

Jayaram, A. 50<br />

Jean-Gilles<br />

Beaubrun, J. 50<br />

Jennison, A. 75<br />

Jia, H. 12<br />

Jironkin, A. S7:9<br />

Joensen, K. 65<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

113


Index<br />

Joensen, K. G. 78<br />

Johansen, J. 56, 6, S7:4<br />

Jones, M. 35, S4:4<br />

Joseph, L. 72<br />

Julius, M. 14<br />

Kajon, A. 99<br />

Kapsak, C. 80<br />

Karangwa, C. K. S10:3<br />

Karns, J. S. 49<br />

Kasam, M. S2:2<br />

Kato, S. 3<br />

Katz, L. S. 71, 72<br />

Katz, L. S. 65<br />

Katz, L. S. 81, 88,<br />

S3:3,<br />

S7:10<br />

Kaur, S. 66<br />

Kearse, M. 1, S7:2<br />

Keddy, A. 8, S7:10<br />

Keddy, K. H. 5<br />

Keena, M. 47, 52<br />

Keim, P. 57, S9:6<br />

Kenyon, R. W. S4:5<br />

Kersey, R. 91, 92<br />

Kersey, R. K. 34<br />

Khan, M. W. S6:3<br />

Khanipov, K. 27<br />

Khudyakov, Y. 12<br />

Kiil, K. 76<br />

Kim, Y. 94<br />

Kishony, R. S9:3<br />

Kjeldsen, M. S1:3<br />

Klenner, J. 21<br />

Klimke, W. S3:3<br />

Knox, N. S7:10<br />

Koenig, S. S. 11<br />

Kohl, C. 21, 95<br />

Konstantinidis,<br />

K. T. 73<br />

Kooistra-Smid,<br />

A. M. 23, 24<br />

Koren, M. 30<br />

Kornblum, J. S3:4<br />

Koziol, A. 55<br />

Kraft, C. S. 61<br />

Krepps, M. D. S3:4<br />

Kreuze, J. F. 96<br />

Kruczkiewicz, P. 8, S2:4,<br />

S7:10<br />

Ku, Y.-C. 29<br />

Kubota, K. 88<br />

Kucerova, Z. 71<br />

Kucerova, Z. 88<br />

Kuhn, J. 1<br />

Kuroda, M. 3<br />

Kurth, A. S10:4<br />

Kutnjak, D. 96<br />

Kwak, Y. 14<br />

Kwak, Y. I. 30<br />

Kwong, J. S7:7<br />

Laird, M. 7, 8<br />

Laird, M. R. S7:10<br />

Lambert, D. 55<br />

Lamson, D. 29<br />

Lamson, D. M. 99<br />

Lan, R. 66, 67<br />

Lane, C. 83<br />

Lapierre, P. 25, 74, 79<br />

Laplante, J. 99<br />

Lasker, B. 77<br />

Lauer, A. C. 77<br />

Lee, H. 38<br />

Leekitcharoenphon,<br />

P. 41<br />

Lesho, E. 13, 14<br />

Lesho, E. P. 30, S9:5<br />

Levinson, K. J. 74<br />

Li, A.-D. 63<br />

Li, Y. 58<br />

Liachko, I. 100, S4:7<br />

Liboriussen, P. 56, 6, S7:4<br />

Libuit, K. 80<br />

Lim, S. 38<br />

Limbago, B. M. 64<br />

Lin, H. 19, 20, 28<br />

Lin, H. C. 18<br />

Lin, Y. S3:4<br />

Lindsey, R. L. 71<br />

Lindsey, R. L. 65<br />

Lindsey, R. L. 81, 87<br />

Litrup, E. 76<br />

Liu, P. 58<br />

Lokate, M. 22<br />

Lovchik, J. 12<br />

Lui, L. 12<br />

Lukjancenko, O. 78, S10:6<br />

Luo, Y.<br />

Luquette, A. 59<br />

Lúquez, C. 89<br />

2, S3:5,<br />

S4:6<br />

Mabon, P. S7:10<br />

Machi, D. S4:5<br />

Magana, L. 98<br />

Mahmoud, I. 95<br />

Maiden, M. C. 72<br />

Mangone, I. 40<br />

Manninger, P. 55<br />

Mao, C. S4:5<br />

Mao, Y. 62<br />

Marcacci, M. 40<br />

Marín Montoya,<br />

M. A. 17, 97<br />

Markowitz, S. 1<br />

Martin, H. 71<br />

Martin, H. 65<br />

Maslanka, S. M. 89<br />

Materna, A. 56, S7:4<br />

Materna, A. C. 6, S7:4<br />

Matthews, T. 7, 8, S7:10<br />

Maybank, R. 13, S9:5<br />

Mayigowda, P. 18, 20<br />

Mayo, M. 57<br />

McDermott, P. 72<br />

Mc Gann, P. 30, S9:5<br />

McGann, P. 13<br />

McGinnis, J. 99<br />

McIntosh, M. T. S2:3<br />

McQuiston, J. R. 77<br />

Meng, J. 54, S3:5<br />

Mi, T. 18, 20<br />

Michot, L. S2:2<br />

Millard, A. S7:8<br />

Miller, J. 54<br />

Miller, W. 72<br />

Mitarai, S. 3<br />

Moine, D. S2:2<br />

Moir, R. 1, S7:2<br />

Molina, M. 15, 60<br />

Monday, S. R. 54, S3:5<br />

Montasser, M. S. 93<br />

Montmayeur, A. 98<br />

Moon, D. 38<br />

Munk, P. 46<br />

Muñoz, L. 97<br />

Muñoz Baena, L. 17<br />

Muñoz Escudero, D. 17<br />

Murase, Y. 3<br />

Murugesan, K. 18, 20<br />

Muruvanda, T. 39, S3:5<br />

Musser, K. 25<br />

114 ASM Conferences


Musser, K. A. 74<br />

Musser, K. A. 79<br />

Mustafa, A. S. S6:3<br />

Myers, R. S3:5<br />

Nair, S. 83, S2:5<br />

Nash, J. H. S2:4<br />

Neish, E. 61<br />

Nelson, E. 47, 52<br />

Ng, T. 98<br />

Ngeno, E. S10:6<br />

Ngom-Bru, C. S2:2<br />

Nitsche, A. 10, 21, 9,<br />

95, S10:4<br />

Nolan, S. M. 19, 28<br />

Norman, K. N. 51<br />

NT, J. S7:5<br />

Oberste, S. 98<br />

Octavia, S. 66, 67<br />

Ogunremi, D. 53<br />

Ojo, O. O. 68<br />

Olsen, C. 1, S7:2<br />

Olsen, G. J. S4:5<br />

Olson, R. S4:5<br />

Ong, A. 13<br />

Ong, A. C. 30, S9:5<br />

Onmus-Leone, F. 13, 14, 30,<br />

S9:5<br />

Osman, A. 95<br />

Overbeek, R. S4:5<br />

Ozer, E. A. 31<br />

Padilla, J. 14<br />

Pallen, M. J. S7:8<br />

Parikh, C. 29<br />

Parra, G. I. S10:3<br />

Parrello, B. S4:5<br />

Partridge, S. 4<br />

Payne, J. 54<br />

Pena-Gonzalez, A. 73<br />

Peng, Y. 82, 86<br />

Pereira, R. 44<br />

Perez, A. 12<br />

Peters, P. J. 12<br />

Peters, T. 83, S2:5<br />

Petkau, A. 7, 8, S7:10<br />

Pettengill, J. 39, 54,<br />

S4:6<br />

Pettengill, J. B. 2<br />

Peyrani, P. 12<br />

Pillatzki, A. 52<br />

Pimenova, M. 27<br />

Platone, I. 40<br />

Pontones, P. 12<br />

Posey, J. 85<br />

Pot, B. S7:3<br />

Pouseele, H. 71, 72<br />

Pouseele, H. 59, 65, 88,<br />

S7:3<br />

Prarat, M. 36<br />

Prentice, M. B. 68<br />

Priebe, G. P. S9:3<br />

Pruckler, J. 71, 72<br />

Purucker, T. 15<br />

Pusch, G. D. S4:5<br />

Qaadri, K. 1, S7:2<br />

Quinlan, T. 79<br />

Raangs, G. C. 22<br />

Radonic, A. 95, S10:4<br />

Ramachandran, S. 12<br />

Ramos, E. 98<br />

Rand, H.<br />

2, S3:3,<br />

S4:6<br />

Raphael, B. H. 89<br />

Ravnikar, M. 96<br />

Reed, E. 39<br />

Reimer, A. S7:10<br />

Renard, B. 10<br />

Ribot, E. M. 72<br />

Ribot, E. M. 65<br />

Ribot, E. R. 87<br />

Ribot, E. M. 88<br />

Rishishwar, L. 81<br />

Roache, K. 71<br />

Roache, K. 88<br />

Robbe-Austerman,<br />

S. 36<br />

Robinson, T. J. 16<br />

Rockweiler, T. 34<br />

Rodriguez, A. L. 11<br />

Rodwell, T. C. S9:6<br />

Rogers, M. 33<br />

Rojas, M. 27<br />

Rose, J. 94<br />

Roseberry, J. C. 12<br />

Rossen, J. W. 22<br />

Rossen, J. W. 23, 24<br />

Rossi, M. 37<br />

Rota, P. 98<br />

Rothgänger, J. S7:6<br />

Rupar, M. 96<br />

Rusconi, B. 11<br />

Index<br />

Sabol, A. 88<br />

Saeed, A. 91, 92<br />

Sahl, J. 57<br />

Saleem Haider, M. 45<br />

Sammons, S. 58<br />

Sandoval, M. 12<br />

Sapiro, V. 91, 92<br />

Scaria, J. 47, 50, 52<br />

Schauser, L. 56, 6, S7:4<br />

Scheutz, F. 65<br />

Schriml, L. 7, 8<br />

Schuenadel, L. 95, S10:4<br />

Schupp, J. 57<br />

Scott, H. M. 51<br />

Seemann, T. S7:7<br />

Sekizuka, T. 3<br />

Senturk, I. S3:5<br />

Sergeant, M. S7:8<br />

Shafiq, M. 45<br />

Shaheed, F. S6:3<br />

Shankar, A. 12<br />

Shaw, T. 60<br />

Shay, J. 7, 8<br />

Shea, J. 25<br />

Shearman, H. 1, S7:2<br />

Shendure, J. 100, S4:7<br />

Shudt, M. 25<br />

Shudt, M. 99<br />

Shukla, M. P. S4:5<br />

Shumway, M. S3:3<br />

Sieffert, C. S7:10<br />

Siler, J. 44<br />

Simmons, M. S3:3<br />

Sinnige, J. 33<br />

Sintchenko, V. 66, 67<br />

Sischo, W. 44<br />

Sjölund-Karlsson,<br />

M. 64<br />

Smith, A. M. 5<br />

Snesrud, E. 13, 14, 30,<br />

S9:5<br />

Sobral, B. W. S4:5<br />

Sosnovtsev, S. V. S10:3<br />

Sparks, M. 14<br />

Stanton-Cook, M. J. 16<br />

Stevens, E. L. 42<br />

Stevens, R. L. S4:5<br />

St. George, K. 29, 99<br />

Storey, D. B. S7:11<br />

Strain, E. 2, 39<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

115


Index<br />

Strino, F. 56<br />

Stripling, D. 71<br />

Stripling, D. 65<br />

Strockbine, N. 71<br />

Strockbine, N. 65, 81, 87<br />

Stroika, S. 88<br />

Stuber, T. 36<br />

Sun, Y. 29<br />

Sutton, G. G. S9:4<br />

Suzuki, H. 47, 50, 52<br />

Switzer, W. H. 12<br />

Sycuro, L. 100, S4:7<br />

Taboada, E. 7, 8, S7:10<br />

Taboada, E. N. S2:4<br />

Tan, W. S10:5<br />

Tanaka, M. 66, 67<br />

Tang, P. 8<br />

Tarr, C. 71<br />

Tarr, C. 88<br />

Tarr, C. L. 73<br />

Tausch, S. 10<br />

Tauxe, W. M. 61<br />

Thachil, A. 50<br />

Thai, H. 12<br />

Thiessen, J. 8<br />

Thobela, M. 5<br />

Thomas, M. 47, 50, 52<br />

Thompson, L. 74<br />

Tillman, G. S3:3<br />

Timme, R. E. S3:3<br />

Tondella, M. L. 82<br />

Tondella, M. L. 86<br />

Top, J. 33<br />

Torpdahl, M. S1:3<br />

Trees, D. 70<br />

Trees, E. 72<br />

Trees, E. 65, 81, 84,<br />

87, 88,<br />

S3:3<br />

Trees, E. K. S3:6<br />

Tsafnat, G. 4<br />

Turner, S. D. 80<br />

Turnsek, M. 71<br />

Underwood, A. S7:9<br />

Useh, N. 47<br />

Välimäki, N. 37<br />

Van Domselaar, G. 7, 8, S7:10<br />

van Duyne, S. 71<br />

Van Kessel, J. S. 49<br />

Van Roey, P. 25<br />

Vazquez, A. 57<br />

Vehkala, M. 37<br />

Vigre, H. 46<br />

Vinje, J. 98<br />

Vonstein, V. S4:5<br />

Vuyisich, M. 69<br />

Wagner, D. 72<br />

Wagner, D. D. 84<br />

Wagner, D. M. 57<br />

Waldram, A. S2:5<br />

Walker, G. T. 34<br />

Walker, T. 91, 92<br />

Wan, Q. 19<br />

Wang, C. 39<br />

Wang, G. 18, 19, 20,<br />

28<br />

Wang, L. 36<br />

Wang, Q. 66<br />

Wang, Y. S10:5<br />

Ward, A. 61<br />

Ward, G. 14<br />

Warnick, L. 44<br />

Warren, A. S4:5<br />

Waterman, P. 14<br />

Waterman, P. E. 30, S9:5<br />

Watt, J. 58<br />

Wattam, A. R. S4:5<br />

Weigand, M. R. 73, 82<br />

Weigand, M. R. 86<br />

Weimer, B. C. S7:11<br />

Weiss, D. S3:4<br />

Wengert, S. 94<br />

Whichard, J. 72<br />

White, J. 48<br />

White, J. R. 43<br />

Wiedmann, M. 44<br />

Will, R. S4:5<br />

Willems, R. 33<br />

Williams, G. 72<br />

Williams, G. 71<br />

Williams, M. M. 82, 86<br />

Winsor, G. 7, 8, S7:10<br />

Wirth, S. 74<br />

Wirth, S. E. 79<br />

Wiser, A. H. 100, S4:7<br />

Wolcott, M. 59<br />

Wolfgang, W. J. 74<br />

Wolfgang, W. J. 79<br />

Wolfgang, W. J. S3:5<br />

Wong, K. 15, 60<br />

Wright, M. S. S9:4<br />

Xia, F. S4:5<br />

Xia, G. 12<br />

Yamashita, A. 3<br />

Yeats, C. A. S7:5<br />

Yin, Y. S4:4<br />

Yoo, H. S4:5<br />

Yoo, Y. 50<br />

Yoshida, C. S2:4<br />

Yu, Q. 58<br />

Zhang, A. 62<br />

Zhang, J. 37<br />

Zhang, S. S4:4<br />

Zhang, T. 62, 63<br />

Zhang, Y. 36<br />

Zhang, Z. S4:4<br />

Zhao, S. 72<br />

Zhao, S. 54<br />

Zheng, J. 39<br />

Zhou, K. 22, 23<br />

Zhou, Z. S7:8<br />

Zhu, Q. 35<br />

Zhuge, J. 19, 28<br />

Zilli, K. 40<br />

Zong, Z. 26<br />

116<br />

ASM Conferences


American Society for Microbiology<br />

1752 N Street, N.W.<br />

Washington, DC 20036-2904

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!