02.05.2013 Views

Evolution__3rd_Edition

Evolution__3rd_Edition

Evolution__3rd_Edition

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

444 PART 4 / <strong>Evolution</strong> and Diversity<br />

Box 15.1<br />

Models of Sequence <strong>Evolution</strong><br />

A DNA sequence is made up of four kinds of nucleotide. <strong>Evolution</strong><br />

consists of changes among the four nucleotide states. In the<br />

simplest model of evolution, we assume that the chance of any<br />

change, from one nucleotide to another, is the same, and has<br />

probability p. (p could be defined as the chance that a nucleotide<br />

at a site will change from one kind of nucleotide to another kind,<br />

per million years, in a population. In practice p is usually an<br />

instantaneous rate, rather than a rate per million years, but that<br />

does not matter here.) Figure B15.1 shows the evolutionary<br />

possibilities.<br />

An A, for example, can change to a C, G, or T. In all, there are<br />

12 kinds of change. The simplest model assumes that the chance<br />

of all 12 is the same, p. This model is a “one-parameter” model,<br />

called the Jukes–Cantor model after its originators. If two species<br />

have the same nucleotide at a site, it could be that the nucleotide<br />

has not changed (chance 1 − 3p). Or it could have changed and<br />

then changed back (A → C → A, for instance), which has chance p 2 .<br />

(The probabilities would need to be multiplied by an amount of time<br />

if they have been evolving apart for something other than 1 million<br />

years.) If the two species have different nucleotides (such as A in<br />

one species and C in the other) at a site, there could have been one<br />

change (chance p) or two (for instance A → G → C), with chance p 2 .<br />

We can think through all the possibilities, and calculate the total<br />

probabilities that a site will be identical, or different, in the two<br />

species, when we sum over all the ways that a site can end up<br />

identical, or different.<br />

The one-parameter Jukes–Cantor model is the simplest. In<br />

practice, the chance of transitions differs from the chance of<br />

transversions. This leads to the “two-parameter” model, first<br />

discussed by Kimura. We assume that the four transitions in<br />

Figure B15.1 have one chance, p 1 , and the eight transversions<br />

have some other chance, p 2 . More complex models allow for<br />

the possibility that some transitions are more likely than other<br />

transitions. Figure B15.1 has 12 arrows, and a complex model<br />

could have 12 parameters, one for each kind of nucleotide<br />

change. Models for maximum likelihood (see Box 15.2) usually<br />

also take account of differences in the rate of evolution between<br />

different sites.<br />

For any given model of sequence evolution, we can use the<br />

sequence data to estimate the value of p (or of p 1 and p 2 ). Several<br />

statistical procedures are used, which can be found in an advanced<br />

text. The estimated value of p can then be used for various<br />

Transitions<br />

Transversions<br />

Transitions<br />

A G<br />

C T<br />

Figure B15.1<br />

Possible kinds of evolutionary change between the four<br />

kinds of nucleotide.<br />

purposes, such as correcting for multiple hits, or the calculation<br />

of maximum likelihood.<br />

Inferences that employ models of sequence evolution are more<br />

or less accurate, depending on how good the model is and how<br />

well the parameters are estimated. For instance, if transition and<br />

transversion frequencies differ, then the use of the one-parameter<br />

Jukes–Cantor model would give misleading results, and might lead<br />

to a faulty phylogenetic inference. Also, the parameters (such as p)<br />

are estimated from sequence data, using a statistical model such<br />

as the Poisson or gamma distribution. The quality of the estimate<br />

depends on how good the data are a whether the stretch of<br />

sequence is long enough, for instance a and on whether the<br />

correct statistical model has been picked. Controversies in<br />

molecular phylogenetics can turn on details of these statistical<br />

models. In general, a trade-off exists between the quantity of data<br />

needed to estimate the parameters, and the accuracy of the model<br />

that can be used. A model with two parameters should be better<br />

than a model with one parameter, but requires more sequence data<br />

to estimate the parameters.<br />

Further reading: Swofford et al. (1996), Page & Holmes (1998),<br />

Graur & Li (2000).<br />

..

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!