22.11.2012 Views

Interdisciplinary Journal of Contemporary Research in ... - Webs

Interdisciplinary Journal of Contemporary Research in ... - Webs

Interdisciplinary Journal of Contemporary Research in ... - Webs

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ijcrb.webs.com<br />

INTERDISCIPLINARY JOURNAL OF CONTEMPORARY RESEARCH IN BUSINESS<br />

6. Markov Cha<strong>in</strong> Monte Carlo (MCMC) Sampl<strong>in</strong>g<br />

Ultimately, we are <strong>in</strong>terested <strong>in</strong> the marg<strong>in</strong>al posterior probability <strong>of</strong> the state sequences,<br />

P(S|D), which requires a marg<strong>in</strong>alization over the model parameters accord<strong>in</strong>g to (eq. 8).<br />

7. Conclusion<br />

We applied both HMM-Bayes and RECPARS to the synthetic DNA sequence<br />

alignmentThe objective <strong>of</strong> this simulation study was to test the performance <strong>of</strong> both<br />

methods on different (a priori known) mosaic structures and for vary<strong>in</strong>g levels <strong>of</strong><br />

difficulty <strong>of</strong> the detection problem (which is related to the tree height, also discussed<br />

When the tree height is sufficiently large (0.3, 0.2), HMM-Bayes predict the true mosaic<br />

structure, but with two important differences. First sequence gives only an accurate<br />

prediction if the recomb<strong>in</strong>ation and substitute t, have been set "appropriately." Note that<br />

these parameters can not be <strong>in</strong>ferred from the data, but rather have to be chosen <strong>in</strong><br />

advance. It was suggested by Wiuf, Christensen, and He<strong>in</strong> that a ratio <strong>of</strong> the<br />

recomb<strong>in</strong>ation and substitution costs works f<strong>in</strong>e quite generally. However, this was not<br />

confirmed <strong>in</strong> our simulations, where for the largest tree height <strong>of</strong> 0.3 the predictions with<br />

this ratio were wrong, lead<strong>in</strong>g to a mosaic structure that is over-tessellated. Because<br />

HMM-Bayes <strong>in</strong>fers all the parameters from the data, it does not suffer from this<br />

shortcom<strong>in</strong>g . Second, even when predicts the nature <strong>of</strong> the mosaic structure correctly, it<br />

is less accurate than HMM-Bayes <strong>in</strong> locat<strong>in</strong>g the breakpo<strong>in</strong>ts:it can be seen that the<br />

breakpo<strong>in</strong>ts predicted with are typically misplaced by 20–30 nucleotides. This is a<br />

consequence <strong>of</strong> the fact that uses only the topology-def<strong>in</strong><strong>in</strong>g sites, and thus discards a<br />

considerable proportion <strong>of</strong> sites <strong>in</strong> the DNA sequence alignment.<br />

When the tree height is decreased to 0.1, not HMM-Bayes predicts the mosaic structure<br />

<strong>of</strong> the alignment correctly.f<strong>in</strong>ds only one recomb<strong>in</strong>ant region, which for the first<br />

alignment is even badly misplaced (fig. 3bottom right). HMM-Bayes detects both<br />

recomb<strong>in</strong>ant regions and even locates them rather accurately, but it misclassifies the<br />

topology change for one <strong>of</strong> these regions (fig. 4 bottom right; fig. 1 bottom right). This is<br />

most likely a consequence <strong>of</strong> the fact that for small tree heights, the number <strong>of</strong> mutations<br />

and, consequently, the number <strong>of</strong> polymorphic sites is small. Thus, there is less<br />

<strong>in</strong>formation <strong>in</strong> the data, and any <strong>in</strong>ference is <strong>in</strong>evitably less accurate.<br />

For a more quantitative comparison between and HMM-Bayes, recall that the detection <strong>of</strong><br />

recomb<strong>in</strong>ation is basically a classification problem: Each site <strong>in</strong> the sequence alignment is<br />

assigned to one <strong>of</strong> the three possible tree topologies. this is done directly. For HMM-<br />

Bayes, it is done by assign<strong>in</strong>g each site to the mode <strong>of</strong> the posterior probability. We use<br />

two criteria to rate the performance <strong>of</strong> the methods: The sensitivity, which is the<br />

percentage <strong>of</strong> correctly classified recomb<strong>in</strong>ant sites, and the specificity, which measures<br />

the percentage <strong>of</strong> correctly classified non-recomb<strong>in</strong>ant sites. Compar<strong>in</strong>g the performance<br />

<strong>of</strong> RECPARS and HMM-Bayes across all simulations, shown <strong>in</strong> figure 2 we found that<br />

HMM-Bayes gives a consistent and significant improvement on sequence the accuracy <strong>of</strong><br />

locat<strong>in</strong>g and classify<strong>in</strong>g the recomb<strong>in</strong>ant regions, as <strong>in</strong>dicated by a systematically<br />

<strong>in</strong>creased sensitivity score.<br />

COPY RIGHT © 2011 Institute <strong>of</strong> <strong>Interdiscipl<strong>in</strong>ary</strong> Bus<strong>in</strong>ess <strong>Research</strong><br />

JANUARY 2011<br />

VOL 2, NO 9<br />

537

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!