15.12.2012 Views

Bioinformatics, Volume I Data, Sequence Analysis and Evolution

Bioinformatics, Volume I Data, Sequence Analysis and Evolution

Bioinformatics, Volume I Data, Sequence Analysis and Evolution

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

270 Liò <strong>and</strong> Bishop<br />

the complementary set of small residues, or residues with a different<br />

charge. There is a rate parameter, l, <strong>and</strong> an equilibrium<br />

frequency parameter, p A (p A + p a = 1), such that the instantaneous<br />

rate of substituting state j for a different state i is equal to lp j .<br />

The matrix of transition probabilities at time t is then:<br />

⎧exp(<br />

− lt) + piexp( 1 − lt)<br />

i = j<br />

Pij () t = ⎨<br />

⎩ piexp( 1 −lt) i ≠ j<br />

This substitution process is reversible, i.e., p i P ij (t) = p j P ji (t).<br />

A further extension is to model correlated change in pairs of<br />

sites. This was first introduced by Pagel (49) for comparative<br />

analysis of discrete characters. Consider a second site with<br />

two states, B <strong>and</strong> b, with equilibrium frequencies p B <strong>and</strong> p b<br />

(where p B + p b = 1). Then the matrix of instantaneous transition<br />

rates is:<br />

∑<br />

⎡ − lp<br />

AB B Ab/ pA lp A aB/ pb<br />

0 ⎤<br />

⎢<br />

⎥<br />

⎢lp<br />

B AB/ pA −∑<br />

0 lp<br />

Ab A ab/ pb<br />

⎥<br />

Q = ⎢<br />

⎥<br />

⎢lApAB/<br />

pB 0 −∑<br />

l<br />

aB<br />

Bpab/ pa<br />

⎥<br />

⎢<br />

⎥<br />

⎣<br />

⎢ 0 lApAb/ pb lBpaB/ pa<br />

−∑<br />

ab ⎦<br />

⎥<br />

where Σ ij is the sum of off-diagonal elements for row ij <strong>and</strong><br />

l A <strong>and</strong> l B are the two rate parameters governing substitution<br />

at the two loci, A <strong>and</strong> B. Rows <strong>and</strong> columns are ordered as<br />

AB, Ab, aB, ab. The number of free parameters is five: two<br />

rate parameters, <strong>and</strong> because the p ij sum to one, three independent<br />

values of p ij . There is an extra degree of freedom<br />

that can be represented by the quantity RD = p AB p ab − p Ab p aB ;<br />

this quantity is analogous to the linkage disequilibrium. If<br />

the quantity RD is different from zero, there is some degree<br />

of dependence between the two sites. RD can be negative<br />

or positive <strong>and</strong> this corresponds to either compensation or<br />

anti-compensation of the residues. Again, the substitution<br />

probabilities for the co-evolving model can be calculated<br />

using P(t) = exp[Qt]. Rather than using this model to construct<br />

a phylogenetic tree (which would be possible in principle),<br />

if there is a given phylogenetic tree, it is possible to<br />

use it to test the evolutionary model based on likelihood<br />

calculations.<br />

As a final target, the underst<strong>and</strong>ing of protein evolution<br />

may allow one to distinguish between analogous <strong>and</strong> homologous<br />

proteins, i.e., detect similarities in those proteins that<br />

have very low sequence homology <strong>and</strong> have probably diverged<br />

from a common ancestor into the so-called twilight zone.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!