22.11.2012 Views

Interdisciplinary Journal of Contemporary Research in ... - Webs

Interdisciplinary Journal of Contemporary Research in ... - Webs

Interdisciplinary Journal of Contemporary Research in ... - Webs

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ijcrb.webs.com<br />

INTERDISCIPLINARY JOURNAL OF CONTEMPORARY RESEARCH IN BUSINESS<br />

4. Parameter Estimation with Maximum Likelihood: HMM-ML<br />

A solution to this problem, proposed by Husmeier and Wright (2001) is a proper<br />

maximum likelihood estimation <strong>of</strong> the parameters so as to maximize<br />

with respect to the vector <strong>of</strong> branch lengths w, the parameters <strong>of</strong> the nucleotide<br />

substitution model θ, and the recomb<strong>in</strong>ation parameter v. This requires a summation over<br />

all state sequences S = (S1,..., SN), that is, over K N terms, and seems to be <strong>in</strong>tractable for<br />

all but very short sequence lengths N. However, Husmeier and Wright (2001)showed that<br />

by apply<strong>in</strong>g the expectation maximization (EM) algorithm (Dempster, Laird, and Rub<strong>in</strong><br />

1977, the sparseness <strong>of</strong> the connectivity <strong>in</strong> the HMM could be exploited to reduce the<br />

computational complexity to the order <strong>of</strong> K separate tree optimizations. While the<br />

application <strong>of</strong> this scheme outperformed the heuristic approach <strong>of</strong> McGuire, Wright, and<br />

Prentice (2000)it suffers from the shortcom<strong>in</strong>g that the predicted state sequence does not<br />

only depend on the data, argmax P(S/D), but also on the parameters, argmax P(S|D,w,<br />

θ,v).The fact that these parameters are estimated from the data itself with maximum<br />

likelihood renders the approach susceptible to over-fitt<strong>in</strong>g. This calls for an <strong>in</strong>dependent<br />

hypothesis test with parametric bootstrapp<strong>in</strong>g, which, however, <strong>in</strong>curs prohibitively high<br />

computational costs, as demonstrated by Larget and Simon (1999)<br />

To rephrase this problem, note that hidden Markov models and phylogenetic trees have<br />

many similarities with neural networks; <strong>in</strong> fact, all three models are <strong>in</strong>stances <strong>of</strong> the more<br />

general class <strong>of</strong> graphical models (Heckermann 1999. Studies on neural networks and<br />

graphical models have shown that, for sparse data, maximum likelihood is susceptible to<br />

over-fitt<strong>in</strong>g, and that the generalization performance is significantly improved with the<br />

Bayesian approach. A detailed <strong>in</strong>vestigation <strong>of</strong> this approach can be found <strong>in</strong> Neal (1996).<br />

In a nutshell, maximum likelihood gives only a po<strong>in</strong>t estimate <strong>of</strong> the parameters, which<br />

ignores the more detailed <strong>in</strong>formation conta<strong>in</strong>ed <strong>in</strong> the curvature and (possibly)<br />

multimodality <strong>of</strong> the likelihood landscape. By sampl<strong>in</strong>g rather than optimiz<strong>in</strong>g<br />

parameters, the Bayesian approach captures more <strong>in</strong>formation about this landscape, and<br />

consequently gives improved and more reliable predictions.<br />

A Bayesian approach to phylogenetics without recomb<strong>in</strong>ation was proposed and tested by<br />

Yang and Rannala (1997), Mau, Newton, and Larget (1999) and Larget and Simon<br />

(1999). Generaliz<strong>in</strong>g this scheme to the presence <strong>of</strong> recomb<strong>in</strong>ation requires replac<strong>in</strong>g the<br />

s<strong>in</strong>gle topology-<strong>in</strong>dicat<strong>in</strong>g variable by the state sequence S, as discussed <strong>in</strong> the previous<br />

section. The prediction <strong>of</strong> this state sequence should be based on the posterior probability<br />

P(S|D), which requires <strong>in</strong>tegrat<strong>in</strong>g out the rema<strong>in</strong><strong>in</strong>g parameters:<br />

COPY RIGHT © 2011 Institute <strong>of</strong> <strong>Interdiscipl<strong>in</strong>ary</strong> Bus<strong>in</strong>ess <strong>Research</strong><br />

JANUARY 2011<br />

VOL 2, NO 9<br />

535

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!