03.12.2015 Views

bbc 2015

BBC2015_booklet

BBC2015_booklet

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

BeNeLux Bioinformatics Conference – Antwerp, December 7-8 <strong>2015</strong><br />

Abstract ID: 000 Category: Abstract template<br />

10th Benelux Bioinformatics Conference <strong>bbc</strong> <strong>2015</strong><br />

P27. DETECTING MIXED MYCOBACTERIUM TUBERCULOSIS INFECTION<br />

AND DIFFERENCES IN DRUG SUSCEPTIBILITY WITH WGS DATA<br />

Arlin Keo 1 & Thomas Abeel 1,2,* .<br />

Delft Bioinformatics Lab, Delft University of Technology , Delft, the Netherlands 1 ; Broad Institute of MIT<br />

and Harvard, Cambridge, MA, USA 2 . * t.abeel@ tudelft.nl<br />

Mycobacterium tuberculosis is a bacterial pathogen that causes tuberculosis and infects millions of people. When a<br />

person is infected with more than one distinct strain type of tuberculosis (TB), referred to as mixed infection, diagnosis<br />

and treatment is complicated. Due to difficulty of diagnosis the prevalence of mixed infections among TB patients<br />

remain uncertain. Whole genome sequencing (WGS) yields a great number of single nucleotide polymorphisms (SNPs)<br />

and offers increased resolution to distinguish distinct strains. Here, we present a tool that maps sample reads against 21<br />

bp cluster specific SNP markers to detect putative mixed infections and estimate the frequencies of the present<br />

subpopulations.<br />

INTRODUCTION<br />

Mycobacterium tuberculosis is a clonal, bacterial pathogen<br />

that causes the pulmonary disease tuberculosis (TB), and it<br />

infects and kills millions of people worldwide [1]. The<br />

study of genetic diversity within the M. tuberculosis<br />

complex (MTBC) is complicated by mixed TB infections,<br />

which happens when a person is infected with more than<br />

one distinct strain type of MTBC. This often results in<br />

poor diagnosis and treatment of patients as the bacterial<br />

subpopulation may have undetected differences in drug<br />

susceptibility [2]. A strain typing method should be able to<br />

distinguish closely related strains, to also allow the<br />

detection of a mixed infection at finer resolutions [3]. This<br />

study aims to detect a possible mixed TB infection at<br />

different levels in MTBC and to determine the frequencies<br />

of the present strains based on established tree paths in the<br />

MTBC phylogenetic tree.<br />

METHODS<br />

A global comprehensive dataset of 5992 MTBC strains<br />

was used for analysis, and 226570 SNPs were extracted<br />

from this set to construct a SNP-based phylogenetic tree<br />

with RAxML. In this bifurcating tree, each branch<br />

represents a cluster of strains and splits into two new<br />

monophyletic subclusters of genetically more closely<br />

related strain. These ¨splits¨ were used to define clusters<br />

and subclusters that contain more than 10. Global SNP<br />

association was done for each cluster to get clusterspecific<br />

SNPs, those for which the true positive rate, true<br />

negative rate, positive predictive value, and negative<br />

predictive value were >0.95. Markers were generated from<br />

these SNPs by extending them with 10 bp sequence on<br />

each side based on reference genome H37Rv. Each<br />

hierarchical cluster now has a set of specific SNP markers.<br />

By mapping sample reads against these 21 bp clusterspecific<br />

SNP markers the tool determines the presence of<br />

paths in the phylogenetic tree that start at the MTBC root<br />

node. Paths that split indicate the presence of multiple<br />

strains and thus a mixed infection.<br />

The read depth at the root node represents a frequency of 1<br />

of the present MTBC species. If the path splits further in<br />

the tree, the total read depth is divided over the two<br />

subpaths and determines the frequencies of those present<br />

subclusters (Figure 1).<br />

FIGURE 1. Detection of mixed TB infection with hierarchical clusters.<br />

The detected strains are combined with detected drug<br />

susceptibility profiles. A minimized reference genome<br />

consisting of drug resistance genes and 1000 bp flanking<br />

regions is used to map sample reads with BWA, and call<br />

variants with Pilon. Ambiguous variation calls may<br />

indicate that present strains in a mixed infection sample<br />

also have differences in drug susceptibility.<br />

RESULTS & DISCUSSION<br />

In the phylogenetic tree 308 clusters (MTBC root<br />

excluded) were defined and there are 14823 SNP markers<br />

in total that are specific to a cluster and unique within the<br />

cluster. The known MTBC lineages 1 to 6 have between<br />

355-614 markers.<br />

7661 TB samples were tested, present strain(s) and<br />

frequencies could be predicted for 7495 samples of which<br />

914 (~12%) are mixed infections (Table 1).<br />

# of subpopulations 1 2 3 >3<br />

# of samples 6581 798 95 21<br />

TABLE 1. 914 Out of 7495 samples is a mixed infection.<br />

REFERENCES<br />

1. World Health Organization. Global Tuberculosis Report. World<br />

Health Organization, Geneva, Switzerland, 2014.<br />

2. Zetola et al. Mixed Mycobacterium tuberculosis complex infections<br />

and false-negative results for rifampicin resistance by GeneXpert<br />

MTB/RIF are associated with poor clinical outcomes. Journal of<br />

Clin. Microb., 52:2422-2429, 2014.<br />

3. G. Plazzotta, T. Cohen, and C. Colijn. Magnitude and sources of bias<br />

in the detection of mixed strain M. tuberculosis infection. Journal of<br />

theoretical biology, 368:67–73, <strong>2015</strong>.<br />

71

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!