18.09.2015 Views

Abstracts

ngsfinalprogram

ngsfinalprogram

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Oral Presentation <strong>Abstracts</strong><br />

all of them belonged to B. melitensis biovar<br />

2 str. 63/9. A neighbor-joining tree analysis<br />

identified one of the isolates as an outlier. Furthermore,<br />

variations (SNPs and indels) were<br />

spread all over the genome; but 138 SNPs<br />

were common among the 14 isolates, supporting<br />

the same ancestral origin. In addition,<br />

SNPs (2 - 478) unique to each isolate were<br />

also identified, which divided the B. melitensis<br />

biovar 2 into two major variant groups. In<br />

conclusion, this study suggest that biovar 2 is<br />

the most prevalent biovar of B. melitensis in<br />

Kuwait. Furthermore, at least two major variant<br />

groups exist within biovar 2. Supported<br />

by Kuwait University Research Sector grant<br />

SRUL02/13.<br />

n S6:4<br />

MICROBIAL GENOMIC TAXONOMY AT<br />

GENBANK<br />

S. Federhen;<br />

NCBI, Bethesda, MD.<br />

Incorrectly identified genomes at GenBank<br />

are a problem for users of the data. Some<br />

genomes are submitted with incorrect species<br />

identifications. Others were correctly identified<br />

when they were submitted but should now<br />

be updated based on a subsequent taxonomic<br />

publication, for example the description of a<br />

new species. GenBank has traditionally relied<br />

on the submitters to provide the correct<br />

taxonomic identifications for their sequence<br />

submissions. Two developments have combined<br />

to change this situation in the domain<br />

of microbial genomes. First, the curation of<br />

type material in the NCBI taxonomy database<br />

allows us to flag sequences from type in the<br />

nucleotide and genome domains of Entrez.<br />

Second, current sequencing technology makes<br />

it fast and easy to generate microbial genomes.<br />

It has been clear for some time that the current<br />

paradigm of species delimitation by 16S rRNA<br />

sequence and DNA-DNA hybridization (DDH)<br />

would eventually be replaced with a model<br />

based on whole genome analysis. We present<br />

a proposal to find and correct misidentified<br />

genomes based on average nucleotide identity<br />

(ANI) from type and proxytype. Sequences<br />

from type are reliably identified (by definition)<br />

once we have verified that they are free from<br />

contamination and are actually from the strain<br />

with which they are annotated. All other identifications<br />

are a matter of opinion, and will be<br />

subject to verification. We have genomes from<br />

type (both finished and WGS) for 4000 species,<br />

including 3500 bacteria. This represents<br />

25% of bacterial species with validly published<br />

names. The other 75% of bacterial species<br />

will generally have an assortment of short sequences<br />

from type in GenBank - at least a 16S<br />

sequence, but often more. These sequences are<br />

used to probe our existing genomes and predict<br />

where the genome from type will appear once<br />

we do get one. In many cases we can designate<br />

a proxy for the missing type from among<br />

the genomes that we do have - we call these<br />

‘proxytype’ genomes. Taken together, these<br />

genomes from type and proxytype represent a<br />

scaffold of reliably identified sequences that<br />

we can use in conjunction with some simple<br />

genome-wide comparison measures to validate<br />

the identifications in our other genomes.<br />

Once we have identified genomes that need<br />

taxonomic updates, we plan to correct the entries,<br />

add a structured comment detailing the<br />

evidence for the update, and notify the submitters<br />

of the change. This represents a significant<br />

change in policy for GenBank - a new genomic<br />

paradigm for validating taxonomic identifications,<br />

some new types of analysis, as well as<br />

a shift in the boundary for database-driven<br />

source feature updates. We convened a workshop<br />

to present the proposal, with representation<br />

from a broad spectrum of the bacterial<br />

taxonomic community (GenBank genomic<br />

taxonomy workshop, 12-13 May 2015). This<br />

group unanimously endorsed our genomic approach<br />

to validating taxonomic identifications<br />

in genomes at GenBank.<br />

ASM Conference on Rapid Next-Generation Sequencing and Bioinformatic<br />

Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens<br />

25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!