Sequencing
SFAF2016%20Meeting%20Guide%20Final%203
SFAF2016%20Meeting%20Guide%20Final%203
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
11th Annual <strong>Sequencing</strong>, Finishing, and Analysis in the Future Meeting<br />
ASAP: A CUSTOMIZABLE AMPLICON SEQUENCING<br />
ANALYSIS PIPELINE FOR HIGH-THROUGHPUT<br />
CHARACTERIZATION OF COMPLEX SAMPLES<br />
Friday, 3rd June 16:00 La Fonda Ballroom Talk (OS‐10.01)<br />
Darrin Lemmer 1 , Jolene Bowers 1 , Erin Kelley 1 , Rebecca Colman 1 , Matt Enright 1 , Elizabeth<br />
Driebe 1 , James Schupp 1 , David Engelthaler 1 , Paul Keim 2<br />
1 TGen North, 2 TGen/Northern Arizona University<br />
A novel technique, Universal Tail amplicon sequencing, allows for multiplexing numerous target<br />
amplicons for multiple bacterial samples together on the same sequencing run. Targeted, multiplexed,<br />
amplicon sequencing is useful for many applications, such as resistance gene detection, metagenomic<br />
sample characterization, biosurveillance, and forensics. For example, this technique is ideal for<br />
analyzing clinical samples, as tens to hundreds of different DNA‐based assays can be run directly<br />
on each sample without having to culture bacterial isolates. Human DNA contamination is limited,<br />
so the pathogen signal is not masked as it would be for full metagenomic sequencing. Using this<br />
technique, we have sequenced more than 200 targets for 100 samples at up to 10,000x coverage on<br />
a single MiSeq run, resulting in massive amounts of data to analyze and interpret.<br />
The Amplicon <strong>Sequencing</strong> Analysis Pipeline (ASAP) is a highly customizable, automated way to<br />
examine amplicon sequencing data. The important details of the amplicon targets are described in a<br />
text‐based input file written in JavaScript Object Notation (JSON). This data includes the target<br />
name, genetic sequence (or sequences in the case of gene variant assays), any known SNPs or regions<br />
of interest (ROIs) within the target, and what the presence of this target or SNP signifies, clinically.<br />
This file can be hand‐generated or created from an Excel spreadsheet using a provided template and<br />
Python script. The sequenced reads are processed by performing adapter, and optionally, quality<br />
trimming, and then aligned to the reference amplicon sequences extracted from the JSON file using<br />
one of several aligners. The resulting BAM files are analyzed with a custom Python script that<br />
combines the alignment data in the BAM file with the assay data in the JSON file and interprets<br />
the results. The output is an XML file with complete details for each assay against each sample.<br />
These details include number of reads aligning to each target, any SNPs found above a user‐defined<br />
threshold, and the nucleotide distribution at each of these SNP positions. For ROI assays, the output<br />
includes the sequence distribution at each of the regions of interest both the DNA sequences and<br />
translated into amino acid sequences. Also, each assay target is assigned a significance if it meets<br />
the requirements laid out in the JSON file (i.e. a particular SNP or amino acid change is present)<br />
To make this output easier for the user to interpret, a number of XSLT stylesheets are provided for<br />
transforming the XML output into other, more readable formats, including Excel spreadsheets, web<br />
pages and PDF documents. Additionally, the use of XSLT stylesheets allows for multiple different<br />
views of the same data, from clinical summaries showing only the most important or relevant results<br />
to full researcher summaries containing all of the data. While designed for analyzing amplicons,<br />
ASAP works just as well for finding any gene targets, specific SNPs, or other biomarkers in whole<br />
genome sequencing data.<br />
149