01.06.2016 Views

Sequencing

SFAF2016%20Meeting%20Guide%20Final%203

SFAF2016%20Meeting%20Guide%20Final%203

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

11th Annual <strong>Sequencing</strong>, Finishing, and Analysis in the Future Meeting<br />

AVERAGE NUCLEOTIDE IDENTITY: A FAST WHOLE<br />

GENOME SEQUENCE-BASED METHOD FOR SPECIES<br />

IDENTIFICATION OF ESCHERICHIA COLI, E.<br />

ALBERTII AND E. FERGUSONII<br />

Wednesday, 1st June 20:00 La Fonda NM Room (1st floor) Poster (PS‐1b.19)<br />

Sung Im, Heather Carleton, Lee Katz, Andrew Huang, Rebecca Lindsey<br />

Centers for Disease Control and Prevention<br />

In the context of national reference laboratory strain testing and foodborne pathogen surveillance<br />

it is important to quickly and accurately identify the genus and species of unknown bacterial whole<br />

genome sequences (WGS). Average nucleotide Identity (ANI) is an in‐silico method that calculates<br />

the average similarity between two WGS by aligning and comparing the nucleotide bases between<br />

two genome assemblies. Two common algorithms are used to compute the ANI between two genomes:<br />

NCBI’s BLASTn and MUMmer’s DNAdiff. While the BLASTn method is computationally more<br />

expensive, it provides higher specificity as each WGS comparison is reverse analyzed to provide a<br />

two‐way reciprocal best hit similarity value. The MUMmer based method calculates the ANI more<br />

rapidly using preselected high quality reference genomes for comparison.<br />

In a previous ANI analysis, 170 strains from 5 Escherichia species (E. coli/Shigella, E. albertii, E.<br />

fergusonii, E. hermanii & E. vulneris) were compared in an all‐against‐all pairwise fashion using<br />

the BLASTn algorithm (ANIb). This experiment determined that an ANI value of >95% is required to<br />

correctly identify an unknown Escherichia genome to its species. The results of this anal‐ ysis were<br />

used to select three complete Escherichia reference strains for the MUMmer based ANI (ANIm)<br />

method: Escherichia albertii KF1 (NZ_CP007025.1), Escherichia fergusonii ATCC_35469<br />

(NC_011740.1) and Escherichia coli 08‐4006. The accuracy of ANIm was tested with WGS from<br />

100 Escherichia and 100 additional non‐Escherichia enteric bacteria previously characterized by<br />

traditional methods. Additionally, a down‐sampling experiment was conducted to test the coverage<br />

depth (from 40x 1x) at which species identification using ANIm became unreliable. For each<br />

Escherichia strain tested, reads were systematically removed from the original fastq sequence files.<br />

The down sampled read pairs were assembled and re‐analyzed against the reference strains.<br />

The initial ANIb analysis, averaging a run time of 4 minutes per comparison, yielded a defined<br />

genomic space from which Escherichia strains representing their respective genera were selected.<br />

Using the selected reference strains ANIm analysis of 100 Escherichia WGS correctly identified 81 E.<br />

coli, 12 E. albertii and 7 E. fergusonii. The 100 non‐Escherichia WGS all tested negative for the<br />

Escherichia genus while testing positive to their respective organisms. The down‐sampling<br />

experiment revealed that ANIm was accurate to

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!