02.05.2013 Views

Evolution__3rd_Edition

Evolution__3rd_Edition

Evolution__3rd_Edition

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

448 PART 4 / <strong>Evolution</strong> and Diversity<br />

Box 15.2<br />

Phylogenetic Inference by Maximum Likelihood<br />

Real sequence data consist of nucleotides at a long series of sites. In<br />

the calculations of maximum likelihood, each nucleotide site is<br />

subject to much the same calculation and we can look at any one<br />

site to see what the calculations are. Suppose we have one site and<br />

four species (called 1, 2, 3, and 4) and their nucleotides are:<br />

A 1<br />

C 2<br />

3 G<br />

4 G<br />

We now need a model of evolutionary change. The simplest is the<br />

model shown in Box 15.1, in which the chance of changing from one<br />

nucleotide to another is p. We can write out a matrix, with the<br />

chance of changing from one state to another (per time unit):<br />

Final state<br />

A C G T<br />

Initial state A 1 − 3p p p p<br />

C p 1 − 3p p p<br />

G p p 1 − 3p p<br />

T p p p 1 − 3p<br />

If the nucleotide is A, for instance, it has a chance 1 − 3p of<br />

staying A and p each of changing into C, G, and T. Suppose that<br />

each branch is one time unit long. We now calculate the probability<br />

of observing the data for all possible states of the internal nodes.<br />

We could start with:<br />

A<br />

C<br />

G G<br />

That is, we assume both internal nodes have G. The total chance<br />

of this is p 2 * (1 − 3p)3 . In two of the branches there has been a<br />

change (chance 1 − 3p). We calculate the same sort of probability<br />

for all 16 possible combinations of the two nucleotides at the two<br />

internal nodes. That gives us the total probability of observing the<br />

data at this one site, given the model of evolution. Probabilities of<br />

this sort tend to be very small and they are usually converted to<br />

natural logarithms to make the numbers more manageable (so<br />

21np + 3 ln (1 − 3p) can be written as ln p + 3 ln (1 − 3p).<br />

In practice, we may have nucleotide data for 100 sites. The same<br />

sort of calculation is performed for every site, to find the total<br />

likelihood for the tree. We then need to do the same calculation for<br />

all the other possible unrooted trees. The best estimate of the true<br />

tree is taken to be the one with the highest probability (or maximum<br />

likelihood) of being observed. With data such as we used for<br />

parsimony in Figure 15.14, the result would usually be the same<br />

with maximum likelihood. The trees that require more evolutionary<br />

events will also be less probable, provided the value of p in the<br />

model of evolutionary change is low.<br />

Further reading: Swofford et al. (1996), Page & Holmes (1998),<br />

Graur & Li (2000).<br />

advantages, for instance it gives an exact probability for each unrooted tree, and this<br />

makes quantitative comparisons between trees straightforward. We can say that one<br />

tree is so many percent more probable than another. Quantitative comparisons of this<br />

kind are not so easy with the technique of parsimony.<br />

G<br />

G<br />

..

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!