Sequencing

Recommendations

Info

11th Annual <strong>Sequencing</strong>, Finishing, and Analysis in the Future Meeting AN EXTENDED CORE GENE MLST TARGET IDENTIFICATION AND SUBSET SELECTION PIPELINE FOR CULTURE-INDEPENDENT PATHOGEN SUBTYPING Wednesday, 1st June 20:00 La Fonda NM Room (1st floor) Poster (PS‐1b.117) JoWilliams Newkirk 1 , Eija Trees 2 , John Besser 2 , Heather Carleton 2 1 IHRC, 2 Centers for Disease Control and Prevention While isolate‐based whole genome sequencing is being rapidly integrated into the US public health surveillance system, isolate availability for surveillance continues to decline as a result of the adoption of culture‐independent diagnostic tests by clinical laboratories. As affordable methods are not yet available for obtaining the same genome resolution directly from shotgun metagenomic sequencing of clinical samples, particularly microbially complex samples such as stool, alternative methods are needed to reliably capture genetic information relevant to pathogen subtyping. Targeted amplification and sequencing of informative genomic regions (i.e. multilocus sequence typing, MLST) is a well understood and robust typing method whose resolution is limited only by the number of sites used. Unfortunately, identifying large numbers of informative regions with conserved primer sites is labor intensive, particularly if hundreds or thousands of reference genomes are used for site selection. To facilitate the rapid development of extended MLST schemes for targeted pathogen groups, we developed a custom pipeline leveraging widely used open source programs to identify potential MLST targets with conserved primers sites and to find subsets of those targets that recapitulate a reference phylogeny (user provided or generated by the pipeline from concatenated core genes). Our pipeline accepts whole genome annotation files from the targeted pathogen group in GenBank (.gbk) format. Core genes are identified by protein BLAST of all annotated open reading frames (ORFs) from a single genome against the ORFs from all GenBank files submitted to the pipeline. Hits are filtered to retain only single copy ORFs which occur in all submitted genomes and are 50% similar across 50% of the query length. Hits found in multiple putative single copy ORF groups are also discarded. The nucleotide sequences for these core ORFs are aligned in Muscle and trimmed to remove end gaps. Up to ten conserved primer pairs producing amplicons of ~250 bp are designed for each alignment in Primer3. The primer pairs and amplicons are filtered to retain only those that do not overlap and capture polymorphisms between input genomes. Users may either retain all passing amplicons or use one of two methods to select an optimized subset for typing. The concordance of the subtyping provided by the selected amplicons to the reference phylogeny can be assessed using a variety metrics, including those that compare the resulting trees (e.g. Kendall‐Colijn metric) and those that compare cluster membership (e.g. adjusted Wallace coefficient). Scripts for the pipeline were written in Python 2.7 and R, and management is provided by bpipe with support for both standard multicore machines and cluster environments. We demonstrate the utility of this pipeline using a collection of 266 Salmonella bongori and enterica genomes representing 68 serotypes. 83
11th Annual <strong>Sequencing</strong>, Finishing, and Analysis in the Future Meeting ANALYSIS OF MICROBIAL FUNCTIONS USING CLC MICROBIAL GENOMICS MODULE Wednesday, 1st June 20:00 La Fonda NM Room (1st floor) Poster (PS‐1b.18) Marta Matvienko, Andreas Pedersen Qiagen Bioinformatics CLC Genomics Workbench is a user friendly and powerful application for the analysis of NGS data. The package can be extended with a few additional plugins that specifically support the analysis of microbial and metagenomics data. Microbial Genome Finishing Module (MGFM) automates steps like scaffolding, contig joining, and the ordering of contigs relative to each other or to a closely related reference genome. It also allows for easy and rapid error correction and assembly of PacBio data. Microbial Genomics Module (MGM) supports three major applications: analysis of microbial compositions using OTU clustering; typing and clustering of bacterial isolates with NGS‐MLST, and whole metagenome assembly and its functional analysis. MetaGeneMark plugin provides the gene and CDS annotations of de novo assembled contigs. In this presentation, we will cover the whole metagenome‐based analysis of functional profiles in drinking water using CLC Microbial Genomics Module and MetaGeneMark Plugin. For the data samples, we downloaded the publicly available NGS sequencing reads (Chao et al, 2013). These sequencing samples came from whole metagenome sequencing using Illumina reads. The microor‐ ganisms were collected from river water and treated drinking water. We de novo assembled each of the five sequencing samples using the Module’s metagenome assembler. The drinking water samples contained significantly fewer contigs with much longer (about 8kb) N50 values than the river water assemblies (N50 of ~870 nt). This may suggest that the metagenome complexity of the drinking water is simpler than the complexity of raw river water. The Gene and CDS annotation was performed using the MetaGeneMark plugin. For the functional annotations, GO database and Pfam2GO mappings were downloaded directly to Genomics Workbench from the Gene Ontology Consortium. Pfam domains were identified for ~30% of CDS in river samples, and for ~57% of CDS in treated water samples. GO terms were assigned for ~20% of CDS in the river water samples and 42% of CDS in drinking water samples. To estimate the abundance of functional categories, we remapped the reads to annotated assemblies, and then we built the GO functional profile for each sample. A similar analysis was done using Pfam domains counts. For the comparative analysis, the abundance tables for GO categories and Pfam domains counts were converted to experimental tables. After applying the statistical analysis tools available in the Genomics Workbench, we were able to identify the functions that were eliminated or enriched in the drinking water as compared to the river water. The analyzed data can be simultaneously viewed as a table, heat map, scatter plot, and volcano plot. The domains and functions can be extracted from any of these views. The GO biological process “pathogenesis” term was reduced to zero counts in drinking water. Many other GO functions such as “chemotaxis”, “conjugation”, and “signal transduction” were significantly more abundant in drinking water. A similar comparative analysis was performed for the Pfam domains abundance. The described tools provide the gateway to sophisticated functional analysis, and empower users at any level of bioinformatics experience. 84
Page 1 and 2:
Sequencing, Finishing, Analysis in
Page 3 and 4:
11th Annual Sequencing, Finishing,
Page 5 and 6:
xGen ® Exome Research Panel • Re
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34: 11th Annual Sequencing, Finishing,
Page 83: 11th Annual Sequencing, Finishing,
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Reliable solutions for focused NGS
Page 163 and 164:
Page 165 and 166:
Page 167:
166
show all

Sequencing

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?